Citadel: Building an Operating System for Claude Code Agents
Hook
Every Claude Code session starts from zero—re-explaining architecture, rediscovering fragile modules, manually splitting work between agents. What if your AI assistant could remember last week’s refactoring decisions and coordinate three parallel agents without merge conflicts?
Context
Claude Code provides coding capabilities but no infrastructure for multi-session work or parallel coordination. You explain your auth layer’s quirks on Monday, then explain them again on Thursday. You identify that the API module needs careful handling in one session, only to watch the next session make the same risky changes. When a refactoring task spans multiple components, you’re stuck manually splitting the work and losing context between pieces.
Citadel treats this as an infrastructure problem, not a prompting problem. It’s an orchestration harness that augments Claude Code with campaign persistence (durable state across sessions), intelligent routing (automatic selection of the cheapest execution path), and parallel coordination (multiple agents in isolated git worktrees). The value proposition is compounding knowledge—skills, guardrails, and architectural decisions persist across every future session instead of evaporating when you close your laptop.
Technical Insight
The routing system is the architectural centerpiece. Citadel implements a four-tier cascade that resolves most commands without burning tokens on LLM classification. Tier 1 uses regex pattern matching for trivial commands. Tier 2 checks session state to resume active campaigns. Tier 3 performs keyword lookup against installed skills. Only when all three fail does Tier 4 invoke an LLM for structured complexity analysis (~500 tokens). The system routes most commands at tiers 1-3 for zero token cost, with Tier 4 as the exception.
Once routed, commands execute through one of four orchestration tiers. Skills are single-purpose specialists—the /do review command triggers a 5-pass structured code review that remembers previous failure patterns. Marshals coordinate session-scoped work with multi-step orchestration. Archons manage multi-session campaigns with persistent state that survives context window compression. Fleets spawn parallel agents in isolated git worktrees, complete with a discovery relay system for real-time knowledge sharing.
The campaign persistence architecture solves what the README calls ‘agent amnesia.’ Multi-session projects maintain structured state files that capture decisions, progress, and learned constraints. Here’s how you’d resume a multi-day refactoring:
# Day 1: Start an architecture overhaul
/do overhaul the API layer
# Agent analyzes, makes progress, saves campaign state
# Day 3: Resume exactly where you left off
/do continue
# Citadel loads campaign state, decisions, and context
# Agent picks up mid-task with full historical knowledge
Fleet mode provides true parallel coordination using git worktrees for filesystem isolation. When you run /do overhaul all three services, Citadel spawns separate agents in isolated worktrees, implements a discovery relay to share learnings between agents in real-time, and handles the merge coordination. This prevents the merge conflicts and context loss that would occur if you manually split the work across multiple sessions.
Cost transparency comes from parsing Claude Code’s native session files for exact token counts rather than estimates. The /cost command computes real spend from API pricing, and a real-time tracker provides configurable spend threshold alerts. This is useful—you see what every session, campaign, and parallel agent actually costs, not a vague approximation.
Safety infrastructure runs through 15 lifecycle hooks. A consent system gates external actions (pushes, pull requests, comments) with granular control—always-ask, session-allow, or auto-allow per action type. Protected branch enforcement prevents deletions. Path traversal blocking and secrets exfiltration guards are automatic. A circuit breaker detects failure spirals and stops them before they burn tokens. All of this is project-configurable through harness.json, giving you fine-grained control over agent autonomy versus safety.
Gotcha
The tight coupling to Claude Code is both the strength and the fatal limitation. Citadel is explicitly a plugin for Claude Code’s architecture—it reads Claude Code’s session file format, uses Claude Code’s plugin interface, and depends on Claude Code’s specific agent implementation. This means zero portability to Cursor, Aider, Continue.dev, or any other AI coding assistant. You’re locked into Anthropic’s proprietary product as a prerequisite, which creates dependency chain risk if Claude Code’s pricing changes, features shift, or availability becomes restricted.
The README FAQ acknowledges this isn’t for beginners: ‘If you’re just starting out with Claude Code, get a few sessions in first and come back when the friction shows up.’ That’s honest advice. The orchestration ladder (Skills, Marshals, Archons, Fleets) and campaign persistence system introduce significant cognitive overhead. You need to understand when to use /do review versus /do overhaul versus letting the router decide. The learning curve is justified for teams hitting real coordination pain, but it’s overkill for solo developers on greenfield projects where session continuity doesn’t matter yet.
Verdict
Use Citadel if you’re already deep in Claude Code and experiencing agent amnesia (re-explaining the same architectural constraints every session), coordination friction (manually splitting work that should run in parallel), or cost anxiety (no visibility into what sessions actually cost). The campaign persistence and Fleet mode represent a distinctive architectural approach—the system maintains durable state across sessions while coordinating parallel agents in isolated worktrees. The four-tier routing that resolves most commands at zero tokens is an efficient approach. Skip it if you’re new to AI coding assistants (start with vanilla Claude Code and add orchestration when you feel the pain), using a different agent framework (this is Claude Code-exclusive with zero portability), or working on small projects where you’re not hitting session boundaries or parallel coordination needs. The ROI scales directly with codebase complexity and session frequency—greenfield solo work won’t justify the infrastructure overhead, but teams managing large codebases with ongoing AI-assisted refactoring may see value in agents that remember previous migration decisions.