LLM: The Unix Philosophy Meets Large Language Models

Hook

Every prompt you've ever sent to ChatGPT, Claude, or GPT-4 is gone unless you manually saved it. What if your terminal automatically logged every LLM interaction in a queryable SQLite database?

Context

The explosion of commercial LLM APIs created a new form of vendor lock-in. Developers who build workflows around OpenAI's CLI tools face friction when switching to Anthropic's Claude or Google's Gemini. Each provider ships its own SDK with different conventions, authentication patterns, and feature sets. Meanwhile, the rise of local models through projects like Llama and Ollama introduced another integration challenge: how do you maintain consistent tooling when some models run in the cloud and others on your laptop?

Simon Willison's LLM emerged from this fragmentation as a deliberate answer to a simple question: what would LLM access look like if it followed Unix principles? The tool provides a single command-line interface that abstracts away provider differences, stores all interactions in SQLite for later analysis, and uses a plugin architecture to support new models without modifying core code. It's not trying to be a hosted service or a GUI application—it's infrastructure for developers who live in the terminal and want LLM capabilities to integrate seamlessly with pipes, scripts, and existing workflows.

Technical Insight

At its core, LLM uses a plugin-based abstraction layer that separates the interface from implementation. The main llm command delegates to provider-specific plugins that implement a common Model interface. This design lets you switch between GPT-4, Claude, or a local Llama model by changing a single flag.

Here's the simplest invocation:

llm "Explain Python decorators"

By default, this uses OpenAI's GPT-3.5, but you can specify any installed model:

llm -m claude-3-opus "Explain Python decorators"
llm -m llama2 "Explain Python decorators"

Every interaction gets logged to ~/.config/io.datasette.llm/logs.db, a SQLite database containing prompts, responses, timestamps, token counts, and model identifiers. This creates a searchable archive of your LLM usage. You can query it directly with SQL or use LLM's built-in log viewer:

llm logs --truncate 100
llm logs -q "python decorators"

The conversation system demonstrates how LLM builds on this foundation. Conversations are stored as linked messages in SQLite, letting you maintain context across multiple prompts:

# Start a conversation
llm chat -m gpt-4
> What are Python metaclasses?
[GPT-4 explains metaclasses]
> Show me an example of when I'd use one
[GPT-4 provides example with context from previous message]

The plugin architecture extends beyond model providers. The llm-embed plugin adds embedding generation and similarity search. Install it, and LLM gains new commands:

llm install llm-embed-jina
cat documents.txt | llm embed jina-small -m notes
llm similar notes -c "machine learning concepts"

For programmatic use, LLM exposes a Python API that mirrors the CLI's abstractions:

import llm

model = llm.get_model("gpt-4")
response = model.prompt("Explain async/await")
print(response.text())

# Streaming responses
for chunk in model.prompt("Write a haiku").text_stream():
    print(chunk, end="")

The template system handles prompt reuse and parameterization. Templates are stored in ~/.config/io.datasette.llm/templates and support Jinja2-style variables:

# Save a template
llm --save-template review "Review this {{language}} code: {{code}}"

# Use it
llm -t review -p language python -p code "def foo(): pass"

One of LLM's most clever features is the fragments system for managing long-context scenarios. Instead of repeatedly pasting the same documentation or code into prompts, you define named fragments that get automatically included:

# Save a fragment of API documentation
llm fragments set api-spec - < openapi.json

# Reference it in prompts
llm "Generate a client for {api-spec}"

The tool-calling implementation shows how LLM bridges provider differences. Different models expose function calling with varying APIs, but LLM normalizes this:

import llm

def get_weather(location: str, unit: str = "celsius") -> str:
    """Get weather for a location."""
    return f"Sunny, 22{unit[0]} in {location}"

model = llm.get_model("gpt-4")
response = model.prompt(
    "What's the weather in Paris?",
    tools=[get_weather]
)
print(response.text())  # Model calls function and uses result

Multi-modal support (images, audio, video) works through attachments, with automatic handling of file paths and URLs:

llm "What's in this image?" -a photo.jpg
llm "Transcribe this" -a meeting.mp3 -m whisper-1

Gotcha

The abstraction layer that makes LLM powerful also introduces constraints. Not all features work with all models—tool calling requires specific provider support, vision capabilities exist only in certain models, and structured output extraction varies in reliability. You'll discover these limitations at runtime rather than through type checking, which can frustrate workflows that assume feature parity.

The Homebrew installation path has a documented PyTorch compatibility issue that breaks embedding functionality. The recommended workaround is using pipx instead, but this isn't obvious until you hit the error. Additionally, local model performance through Ollama depends entirely on your hardware and model size choices—running Llama 70B on a laptop with 16GB RAM will teach you patience or bankruptcy through cloud API usage when you switch back in frustration. The SQLite logging is fantastic for history but provides no built-in privacy controls; every API key, prompt, and response lives in plaintext in logs.db, which matters if you're handling sensitive data or sharing your home directory across backup systems.

Verdict

Use LLM if you work with multiple LLM providers and want a unified interface that doesn't lock you into a single vendor's ecosystem, need searchable history of all your prompts and responses for analysis or retrieval, build shell scripts or automation that integrates LLM capabilities with existing Unix tools, or experiment with both cloud APIs and local models without maintaining separate toolchains. Skip it if you exclusively use one provider's official SDK and don't value abstraction overhead, need a GUI application rather than command-line tooling, require enterprise features like team collaboration or access controls that aren't part of LLM's single-user design, or want guaranteed feature parity across providers without checking model-specific limitations.

LLM: The Unix Philosophy Meets Large Language Models

LLM: The Unix Philosophy Meets Large Language Models

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

LLM: The Unix Philosophy Meets Large Language Models

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

ds4: The SSD-Streaming Inference Engine That Treats Your Mac's NVMe Like RAM

Nanocoder: The Terminal Coding Agent That Lets You Switch Models Mid-Conversation

Harness-1: Training Search Agents with State Externalization

makemore: Understanding Language Models by Implementing Them Seven Different Ways

ds4: The SSD-Streaming Inference Engine That Treats Your Mac's NVMe Like RAM

Nanocoder: The Terminal Coding Agent That Lets You Switch Models Mid-Conversation

Harness-1: Training Search Agents with State Externalization

// CODEBASE INTELLIGENCE

Best for

Skip when