Nanocoder: The Terminal Coding Agent That Lets You Switch Models Mid-Conversation

Hook

Most AI coding assistants make it impossible to switch models without losing your entire conversation context. Nanocoder lets you swap from Claude to a local Llama instance mid-session—and that architectural choice reveals everything about who actually controls your development workflow.

Context

The explosion of AI coding assistants created a new form of vendor lock-in that developers barely noticed. Tools like Cursor and GitHub Copilot Workspace deliver impressive results, but they've architected dependency into every layer: your code flows through their servers, your context lives in their databases, and switching providers means starting over. For enterprises with data residency requirements, security researchers working air-gapped, or engineers who simply refuse to send proprietary code to third-party APIs, these tools are non-starters.

Nanocoder emerged from the Nano Collective—a community group rather than a VC-backed company—with a different set of incentives. Built in TypeScript as a terminal-based REPL, it treats model providers as swappable infrastructure rather than the product itself. You can run it entirely locally with Ollama, point it at OpenAI's API, or switch between providers without losing conversation state. The daemon architecture keeps context alive across terminal sessions, and a checkpoint system lets you rollback multi-file changes atomically. It's deliberately unsexy: no slick IDE integration, no proprietary context retrieval, just a well-architected agent loop that respects the principle that your code should stay on your machine unless you explicitly decide otherwise.

Technical Insight

System architecture — auto-generated

The core architectural insight in Nanocoder is the provider adapter pattern that genuinely abstracts LLM differences. Most tools claim multi-model support but leak provider-specific details everywhere. Nanocoder implements a unified interface that handles streaming, function calling format translation, and context window management across Ollama, OpenAI-compatible endpoints, and the emerging Model Context Protocol standard.

Here's how the provider abstraction works in practice. When you initialize a session, you specify a provider and model in your config:

// ~/.nanocoder/config.json
{
  "provider": "ollama",
  "model": "codellama:13b",
  "fallback": {
    "provider": "openai",
    "model": "gpt-4"
  }
}

The provider interface normalizes requests into a common format before sending to the LLM. When the agent needs to edit a file, it generates a tool call. Ollama expects function definitions in one schema, OpenAI in another, and local models often struggle with structured output entirely. Nanocoder's adapter layer translates:

interface ProviderAdapter {
  async chat(messages: Message[], tools?: Tool[]): StreamingResponse;
  supportsStructuredOutput(): boolean;
  maxContextTokens(): number;
}

class OllamaAdapter implements ProviderAdapter {
  async chat(messages, tools) {
    // Translate tools to Ollama's function format
    const ollamaTools = tools?.map(t => ({
      type: 'function',
      function: {
        name: t.name,
        description: t.description,
        parameters: convertToOllamaSchema(t.parameters)
      }
    }));
    
    return this.client.chat({
      model: this.model,
      messages: messages.map(convertMessageFormat),
      tools: ollamaTools,
      stream: true
    });
  }
  
  supportsStructuredOutput() {
    // Most Ollama models need prompt engineering for JSON
    return false;
  }
}

This abstraction isn't just academic. During a coding session, you can switch providers mid-conversation if your local model can't handle a complex refactoring. The conversation history gets serialized in a provider-agnostic format, so context transfers cleanly. Commercial tools can't offer this because switching models undermines their business model—they need you locked into their API tier.

The daemon architecture is the second clever piece. Instead of running as a one-shot CLI command, Nanocoder spawns a background process per project directory. This daemon maintains conversation state, file change history, and checkpoint snapshots across terminal sessions:

# Start daemon for current project
$ nanocoder daemon start

# In another terminal, attach to running session
$ nanocoder attach
> /edit src/api/handler.ts "add rate limiting"
[Agent applies changes...]
> exit

# Later, resume without losing context
$ nanocoder attach
> /status
Last checkpoint: 3 files modified, 45 minutes ago
Conversation: 12 messages, 8k tokens

The daemon stores state in a local SQLite database, not in cloud storage. When you request a file edit, the agent doesn't just apply it immediately—it creates a checkpoint with full file snapshots before and after. If the change breaks tests, you can rollback atomically:

> /edit src/auth/middleware.ts "refactor to use async/await"
[Changes applied to 3 files]
> /run npm test
Tests failed: 4 errors

> /rollback
[Reverted 3 files to checkpoint]
> /run npm test
All tests passing

This checkpoint system treats code changes as transactions, which sounds obvious but most coding assistants don't implement it. They'll undo individual file edits, but if the agent modified five files to implement a feature and you realize it's wrong, you're manually reverting each one.

The skill system exposes agent capabilities through slash commands and allows extension without forking the codebase. Built-in skills like /edit, /run, and /search are defined in configuration files, not hardcoded:

{
  "skills": [
    {
      "name": "edit",
      "trigger": "/edit",
      "description": "Modify files using LLM-generated diffs",
      "requiresApproval": true,
      "tool": {
        "name": "apply_diff",
        "parameters": {
          "filepath": {"type": "string"},
          "changes": {"type": "string"}
        }
      }
    },
    {
      "name": "custom_deploy",
      "trigger": "/deploy",
      "description": "Deploy to staging with our custom workflow",
      "requiresApproval": false,
      "command": "./scripts/deploy.sh ${args}"
    }
  ]
}

Power users can add domain-specific workflows—custom deployment commands, specialized refactoring tools, integration with internal APIs—without contributing back to the main repo. The community governance model means no one's trying to upsell you on a Team tier where these features get unlocked.

Development modes control the approval workflow explicitly. Most coding assistants hide their safety rails, making it unclear when the AI will execute commands versus asking permission. Nanocoder exposes this as configuration:

normal: Requires approval for file edits and shell commands
auto-accept: Auto-applies file changes but confirms shell execution
yolo: Executes everything without confirmation
plan: Only generates plans, never applies changes

Yolo mode is dangerous—the agent will run rm -rf / if the LLM hallucinates it—but making the trade-off explicit is more honest than tools that claim to be safe while still executing arbitrary code behind approval dialogs developers click through reflexively.

Gotcha

The biggest limitation is that Nanocoder has no built-in code understanding beyond what fits in the LLM's context window. It lacks AST parsing, semantic search, or static analysis. When you ask it to refactor a function used across 50 files, it can't automatically find all call sites—it only knows what you've explicitly shown it or what it discovers by reading files sequentially. This burns tokens inefficiently and means the agent can't navigate large codebases intelligently. Tools like Aider solve this with tree-sitter integration and RepoMap context compression, generating compact summaries of code structure that fit in context. Nanocoder assumes you'll manually specify which files matter, which works for small projects but breaks down at scale.

The checkpoint system stores full file snapshots rather than git-style deltas, so disk usage explodes during long sessions with frequent rollbacks. A day-long coding session making 50 checkpoints across a Next.js project with node_modules can easily consume gigabytes. There's no automatic garbage collection, so you're manually clearing old checkpoints or filling your disk.

Shell command execution has no sandboxing. Even in normal mode with approval required, the safety depends entirely on human review catching malicious commands. LLMs occasionally hallucinate destructive operations—running npm install with wrong flags, executing database migrations against production, whatever. Nanocoder shows you the command and waits for confirmation, but if you're in flow state and not reading carefully, you'll approve something dangerous. Docker-based sandboxing or shell command allowlists would mitigate this, but they're not implemented.

Verdict

Use if: You're running Ollama locally and want coding assistance without sending code to third-party APIs, work under data residency requirements that prohibit cloud-based AI tools, need genuine model flexibility without vendor lock-in, or you're a security researcher doing air-gapped work where nothing touches the internet. The daemon mode and checkpoint system are legitimately useful for multi-session coding tasks, and the community governance means privacy-hostile features won't creep in over time. Skip if: You need production-grade code intelligence for large codebases—the lack of AST parsing and semantic search makes it inefficient compared to Aider with tree-sitter integration. Also skip if you want polished IDE integration; Continue's VSCode extension delivers better UX for developers who don't live in the terminal. Finally, skip if you're non-technical or need support guarantees—the collective governance model means no customer service, just Discord and pull requests. This is a tool built by engineers who trust you to understand the trade-offs, not one that holds your hand.

Nanocoder: The Terminal Coding Agent That Lets You Switch Models Mid-Conversation

Nanocoder: The Terminal Coding Agent That Lets You Switch Models Mid-Conversation

Hook

Context

Technical Insight

Gotcha

Verdict

// AI Provenance

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

Nanocoder: The Terminal Coding Agent That Lets You Switch Models Mid-Conversation

Hook

Context

Technical Insight

Gotcha

Verdict

// AI Provenance

// KNOWLEDGE GRAPH

// RELATED

ds4: The SSD-Streaming Inference Engine That Treats Your Mac's NVMe Like RAM

Harness-1: Training Search Agents with State Externalization

makemore: Understanding Language Models by Implementing Them Seven Different Ways

JARVIS: The LLM-Orchestrated AI System That Pioneered Multi-Model Task Automation

ds4: The SSD-Streaming Inference Engine That Treats Your Mac's NVMe Like RAM

Harness-1: Training Search Agents with State Externalization

makemore: Understanding Language Models by Implementing Them Seven Different Ways

// CODEBASE INTELLIGENCE

Best for

Skip when