Back to Articles

LLM: The Swiss Army Knife for Language Models on the Command Line

[ View on GitHub ]

LLM: The Swiss Army Knife for Language Models on the Command Line

Hook

While most developers juggle multiple API clients and authentication schemes across OpenAI, Anthropic, and Google, one CLI tool treats them all as interchangeable backends—and automatically logs every interaction to a queryable SQLite database.

Context

The explosion of large language models created a new problem: API fragmentation. OpenAI has one interface, Anthropic another, Google yet another. Each requires different authentication, different request formats, and different mental models. For developers who want to experiment with multiple models or incorporate LLM capabilities into shell scripts and data pipelines, this fragmentation is paralyzing.

Simon Willison’s LLM tool solves this by providing a unified command-line interface across dozens of models—both cloud-based APIs and locally-running models via Ollama or MLX. More importantly, it treats prompts and responses as first-class data by automatically storing everything in SQLite databases. This isn’t just a convenience; it fundamentally changes how you can work with LLMs, enabling conversation history, prompt analytics, and audit trails without writing any logging code yourself.

Technical Insight

parse args

load plugins

log interactions

API call

API call

API call

API call

store

output

authenticate

authenticate

CLI Interface

llm command

Core Library

Model Abstraction

Plugin System

SQLite Database

logs.db

OpenAI Plugin

built-in

Anthropic Plugin

llm-anthropic

Ollama Plugin

local models

Other Providers

via plugins

Response Handler

streaming

User Output

Key Storage

~/.config

System architecture — auto-generated

LLM’s architecture separates concerns cleanly: a core library handles model abstraction, a plugin system extends provider support, and SQLite provides persistent storage for all interactions. The design philosophy is evident from basic usage—every prompt automatically becomes a database record.

After installing with pip install llm and setting your OpenAI key with llm keys set openai, the simplest invocation looks like this:

llm "Ten fun names for a pet pelican"

This single command handles API authentication, makes the request, streams the response, and stores both prompt and completion in ~/.local/share/llm/logs.db. The SQLite logging happens transparently, which means you can immediately start querying your prompt history with standard SQL tools or Datasette.

The real architectural elegance appears in the plugin system. LLM ships with OpenAI support built-in, but other providers are first-class citizens through plugins. Want to use Anthropic’s Claude? Install the plugin and the interface remains identical:

llm install llm-anthropic
llm keys set anthropic
llm -m claude-3-opus 'Impress me with wild facts about turnips'

The -m flag switches models, but the command structure stays constant. This works because LLM defines abstract interfaces for prompts, responses, and streaming that plugins implement. Whether you’re hitting a cloud API or running a local model via Ollama, the interface doesn’t change:

llm install llm-ollama
ollama pull llama3.2:latest
llm -m llama3.2:latest 'What is the capital of France?'

For multi-modal use cases, LLM uses the -a flag to attach files. The tool handles reading images, audio, or video and formatting them appropriately for the model:

llm "extract text" -a scanned-document.jpg

The system prompt feature via -s enables powerful shell pipeline compositions. You can process code, logs, or any text through LLM filters:

cat myfile.py | llm -s "Explain this code"

Beyond simple prompts, LLM supports structured data extraction using schemas, tool execution for function calling, and embeddings generation for semantic search. The fragment system addresses long-context scenarios by letting you compose prompts from reusable pieces. The interactive chat mode with llm chat -m gpt-4o maintains conversation state and offers special commands like !multi for multi-line input and !edit to open your editor for complex prompt crafting.

The SQLite storage isn’t just logging—it’s the foundation for features like conversation history and the ability to programmatically analyze your prompting patterns. Because everything is in a local database, you can build custom tools on top of LLM’s data without depending on vendor-specific APIs.

Gotcha

The plugin ecosystem is both a strength and a weakness. While dozens of models are supported through the plugin system, plugin quality varies. Some plugins may lag behind in implementing newer features like tool calling or vision support, potentially creating feature parity gaps across providers. You might find that a capability works perfectly with OpenAI but behaves differently with a lesser-maintained plugin.

The SQLite logging, while powerful, adds overhead to every request. For high-throughput scenarios or when you’re just testing quick throwaway prompts, writing to disk on every invocation may feel heavy-handed. The logging can be disabled, but then you lose one of LLM’s key differentiators. The Homebrew installation also has documented PyTorch compatibility issues, suggesting potential dependency management friction on some systems. If you’re working in environments with strict dependency control or need guaranteed consistency across a team, the Python packaging situation may require extra attention.

Verdict

Use LLM if you’re a command-line power user who works with multiple LLM providers and wants to eliminate API fragmentation. It’s especially valuable if you’re building shell scripts that incorporate AI, need automatic conversation logging for audit trails or analysis, or want to experiment across models without rewriting integration code. The SQLite storage makes it uniquely suited for developers who treat prompts as data worth querying. Skip it if you only use one LLM provider and are satisfied with their native tools, prefer graphical interfaces over terminal workflows, or need zero-dependency deployments where even SQLite feels too heavy. Also skip if you require cutting-edge features across all providers—you’ll be at the mercy of plugin maintainers. LLM shines brightest for developers who live in the terminal and want AI as composable as grep or jq.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/llm-engineering/simonw-llm.svg)](https://starlog.is/api/badge-click/llm-engineering/simonw-llm)