Letta Code: Building a Coding Agent That Actually Remembers Your Last Conversation

Hook

Most coding assistants treat every conversation like meeting a stranger with amnesia. Letta Code flips this: the agent remembers everything, but you get fresh conversation threads whenever you need them.

Context

The current generation of AI coding assistants—from GitHub Copilot to Claude—operates on a fundamental limitation: they’re stateless. Each session starts from scratch. You can feed them context through your IDE or conversation history, but once you close the session, that accumulated understanding evaporates. This works fine for one-off tasks, but it creates a peculiar inefficiency for long-term projects. You find yourself re-explaining your testing conventions, re-describing your architecture decisions, and re-teaching the same patterns every time you start a new conversation.

Letta Code takes a different architectural bet: what if the agent persisted while the conversations didn’t? Instead of treating sessions as the atomic unit, it separates a persistent agent (which accumulates memory and learned skills) from ephemeral threads (individual conversations). The agent becomes something more like a team member who learns your codebase over time, while threads give you the clean slate of a new conversation when you need it. Built in TypeScript as a CLI wrapper around the Letta API, it’s designed for developers who work on the same projects for months, not consultants bouncing between codebases daily.

Technical Insight

System architecture — auto-generated

The core architectural insight is the agent-thread separation. When you run letta-code, you’re not starting a new AI session—you’re resuming a relationship with a persistent agent. That agent has its own memory store, independent of any particular conversation. Threads are just interaction contexts that reference the agent.

Here’s what initializing an agent looks like:

# Create a persistent agent with a specific model
letta-code --init --model gpt-4

# The agent now exists independently of any conversation
# Start a new thread (conversation) with that agent
letta-code --new-thread

# Later, start another thread with the same agent
# The agent remembers patterns from previous threads
letta-code --new-thread

Under the hood, Letta Code maintains a .letta directory in your project root. This stores agent configuration, learned skills, and memory state. The skills system is particularly clever—as you work with the agent, it can extract patterns into reusable modules stored in .skills/. For example, if you repeatedly ask it to write tests in a particular style, it can codify that pattern:

// .skills/write-vitest-test.ts
export async function writeVitestTest(componentPath: string) {
  // Agent-learned pattern for your team's test structure
  const template = `
import { describe, it, expect } from 'vitest'
import { render } from '@testing-library/react'

// Always include accessibility checks
// Always test error states
// Always use data-testid for selectors
  `
  return template
}

The agent learns to call this skill automatically when you ask for tests, rather than regenerating the pattern from scratch. This is fundamentally different from prompt engineering or few-shot examples—the skill is stored as executable code, versioned with your project.

Memory management is explicit and user-controlled. The /remember command adds facts to the agent’s long-term memory:

> /remember We use Zod for all runtime validation
> /remember API responses should always be wrapped in Result<T, E> types
> /remember Never use `any` type - use `unknown` and narrow

These memories persist across all threads. When you start a new conversation three weeks later, the agent still knows these conventions. The /memory command shows what the agent remembers, and /forget removes outdated information.

Letta Code supports multiple backends—OpenAI, Anthropic, Gemini—but the critical piece is that switching models doesn’t reset the agent. Your GPT-4 agent can become a Claude agent while retaining its learned skills and memories. This is possible because Letta maintains the agent state separately from the LLM provider.

For deployment, you can use the hosted Letta API or run a Docker container locally:

# Self-hosted option
docker run -d -p 8000:8000 letta/letta-server

# Point letta-code to your instance
export LETTA_API_URL=http://localhost:8000
letta-code

The CLI handles authentication, thread management, and skill synchronization. Behind the scenes, it’s making structured calls to the Letta API, which manages the agent’s memory embeddings, retrieval, and state persistence. The TypeScript implementation keeps the tool fast and the output parsing reliable—no wrestling with Python virtual environments or dependency conflicts.

What makes this architecture powerful for long-term projects is the compounding knowledge effect. After a month of working with the same agent, it knows your import patterns, your error handling conventions, your preferred libraries, and your team’s code review feedback patterns. New threads start fresh conversationally, but they benefit from all that accumulated context automatically.

Gotcha

The memory-first approach introduces two significant challenges. First, agents can learn incorrect information and propagate it indefinitely. If you work with an agent during an experimental refactor that you later abandon, it may have ‘learned’ patterns you don’t actually want. Unlike stateless assistants where closing the session erases mistakes, Letta agents retain them until you explicitly use /forget or /init to reset. You need discipline around memory hygiene—periodically reviewing /memory output and pruning outdated or incorrect information.

Second, the dependency on Letta API infrastructure adds operational complexity. You can’t just npm install and start coding—you need either a Letta cloud account or a self-hosted Docker instance. This introduces latency (network calls to the API), potential downtime (if the API is unavailable), and coordination challenges (if multiple team members want to share agent state, you need shared Letta infrastructure). For quick prototypes or exploratory projects, this setup overhead isn’t justified. You’ll spend more time managing the agent than benefiting from its memory.

There’s also an unpredictability trade-off. With stateless assistants, you get consistent behavior—same prompt, similar output. With Letta agents, behavior evolves based on accumulated memory. This is valuable for learning your preferences, but it means you can’t easily reproduce previous sessions or predict exactly how the agent will respond. If you’re debugging why the agent suggested a particular approach, you’re tracing through its memory state, not just the current conversation.

Verdict

Use if: You’re working on a multi-month project with consistent patterns, you want an assistant that improves as it learns your codebase conventions, you’re comfortable managing agent memory as a persistent resource, or you need an assistant that maintains context across multiple conversation sessions without re-explaining architecture. Also consider it if you switch between LLM providers frequently but want consistent assistant behavior. Skip if: You work on many short-term projects, you prefer predictable and reproducible assistant behavior, you don’t want to manage API infrastructure or Docker containers, or you need zero-setup tooling that works immediately. Also skip it if your team isn’t aligned on memory management practices—shared agents with inconsistent memory hygiene become more confusing than helpful.

Letta Code: Building a Coding Agent That Actually Remembers Your Last Conversation

Letta Code: Building a Coding Agent That Actually Remembers Your Last Conversation

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Letta Code: Building a Coding Agent That Actually Remembers Your Last Conversation

Hook

Context

Technical Insight

Gotcha

Verdict

// RELATED

Goose: Building an Autonomous AI Agent That Actually Executes Code

oh-my-claudecode: Building Multi-Agent Teams That Actually Ship Code

Building a Game Studio in Your Terminal: 48 AI Agents, Zero Employees

Open Multi-Agent: Auto-Orchestrated AI Teams in Three Dependencies

Goose: Building an Autonomous AI Agent That Actually Executes Code

oh-my-claudecode: Building Multi-Agent Teams That Actually Ship Code

Building a Game Studio in Your Terminal: 48 AI Agents, Zero Employees

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]