Overstory: When Git Worktrees Become Agent Isolation Boundaries
Hook
What if the solution to preventing AI agents from clobbering each other’s work wasn’t sophisticated distributed locking, but just giving each one its own git worktree?
Context
The dream of autonomous coding agents has a dirty secret: they don’t play well together. When you spin up multiple AI agents to work on the same codebase, you immediately hit the same problems that plague human developers—merge conflicts, race conditions, and the chaos of concurrent file modifications. Most multi-agent frameworks handle this by adding coordination layers: message queues, task locks, or elaborate state machines that try to keep agents from stepping on each other’s toes.
Overstory takes a radically different approach by embracing git’s native isolation mechanism: worktrees. Instead of building coordination from scratch, it leverages the fact that git worktrees are essentially parallel checkouts of the same repository, each with its own working directory but sharing the underlying git history. Each agent gets its own worktree, its own tmux session, and its own isolated workspace. When work is complete, changes flow through a formal merge queue. It’s infrastructure-as-isolation: using the filesystem and version control system to enforce boundaries that software coordination can’t reliably maintain.
Technical Insight
The architecture is deliberately hierarchical. At the root sits a persistent Coordinator agent that never touches code—it only decomposes incoming tasks and dispatches them to Supervisors. Supervisors spawn specialized workers: Scouts for exploration, Builders for implementation, Reviewers for validation, and Mergers for integration. Each agent type has mechanical restrictions enforced through PreToolUse hooks, which intercept tool calls before execution. A Reviewer, for instance, is mechanically prevented from writing files—it can only read and comment.
Communication happens through a custom SQLite-based mail system that implements eight typed message primitives: TASK_DISPATCH, STATUS_UPDATE, MERGE_REQUEST, CONFLICT_ALERT, REVIEW_RESULT, TRIAGE_ESCALATION, HEARTBEAT, and SHUTDOWN. This isn’t agents dumping JSON into a shared file—it’s a proper protocol with sender/receiver addressing, message acknowledgment, and type safety:
interface TaskDispatchMessage {
type: 'TASK_DISPATCH';
from: AgentId;
to: AgentId;
payload: {
taskId: string;
description: string;
context: string[];
worktreePath: string;
parentBranch: string;
};
timestamp: number;
}
// Agents poll the mailbox and process typed messages
const message = await mailbox.receive<TaskDispatchMessage>('TASK_DISPATCH');
if (message) {
await spawnWorkerInWorktree(message.payload.worktreePath);
await mailbox.send({
type: 'STATUS_UPDATE',
from: this.agentId,
to: message.from,
payload: { status: 'STARTED', taskId: message.payload.taskId }
});
}
The git worktree isolation is enforced at spawn time. When a Supervisor dispatches a Builder to implement a feature, it creates a new worktree from the current branch, launches a tmux session with the working directory set to that worktree, and starts Claude Code inside it. The Builder can only see and modify files in its worktree—it literally cannot access other agents’ workspaces because they exist in different filesystem paths. There’s no clever locking algorithm, just Unix directory isolation.
When work completes, changes enter a FIFO merge queue with four-tier conflict resolution: (1) automatic merge if no conflicts, (2) Merger agent attempts resolution, (3) escalation to original Builder for rework, (4) human intervention flag. The queue is a SQLite table with status tracking, and agents don’t merge directly—they submit merge requests that get processed sequentially to prevent race conditions.
The watchdog system has three tiers. Tier 0 is a mechanical daemon that checks tmux session liveness and process IDs every 30 seconds—pure shell scripts, no AI. Tier 1 is an AI triage agent that analyzes stuck or failed agents by reading their logs and conversation history, attempting recovery through targeted messages. Tier 2 is a dedicated monitor agent that tracks systemic patterns like repeated failures or resource exhaustion. This tiered approach means simple crashes get caught immediately without burning tokens, while complex failure modes get AI attention.
The agent definition system separates the HOW from the WHAT through layering. Base agent definitions specify capabilities and constraints (a Reviewer can read but not write), while task overlays inject specific goals and context. This means you can have a reusable “Reviewer” template that gets instantiated with “review the authentication module for security issues” or “check test coverage in the API layer”—same behavioral constraints, different focal points.
Gotcha
The repository’s README includes an unusually honest section titled ‘Swarm Characteristics & Warnings’ that should give you pause. It explicitly states that agent swarms ‘amplify costs and error rates’ and that you should ‘assume debugging complexity and merge conflicts are normal.’ This isn’t defensive documentation—it’s a red flag that the system’s core assumption is that multi-agent coordination is inherently messy.
The dependency chain is brittle: you need Bun (not Node), tmux (with specific session management), and Claude Code with API access. If any link breaks—Claude API goes down, tmux sessions get orphaned, SQLite locks—you’re debugging a distributed system built on tools that weren’t designed for this use case. The worktree isolation is elegant but means your filesystem becomes the state store for agent coordination, and cleaning up failed runs requires manual worktree pruning. With only 15 GitHub stars and no documented production usage, you’re adopting an experimental system where every failure mode is a learning opportunity you didn’t budget for. The sophisticated architecture papers over the fundamental question: do the tasks you’re attempting actually benefit from parallel agent execution, or are you just distributing the complexity of sequential work across multiple failure points?
Verdict
Use if: You have genuinely parallelizable software tasks (like ‘implement six independent API endpoints’ or ‘refactor five isolated modules’), you’re comfortable treating this as infrastructure that requires maintenance, and you have the budget for both the token costs of running multiple agents and the human time to debug swarm failures. The git worktree isolation is genuinely clever for teams already using sophisticated git workflows. Skip if: You’re expecting a production-ready system, your tasks are primarily sequential, you don’t have expertise in distributed systems debugging, or you’re trying to save time rather than explore multi-agent coordination patterns. For most projects, using Claude Code sequentially with manual decomposition will be faster, cheaper, and dramatically easier to debug than orchestrating an agent swarm.