Agent Orchestrator: Git Worktrees Are the Secret to Parallel AI Coding
Hook
Most multi-agent coding frameworks fail at the filesystem level—try running three AI agents on different features simultaneously and watch them clobber each other's branches. Agent Orchestrator solves this with a git feature most developers don't even know exists.
Context
The promise of AI coding assistants hit a scaling problem fast. Tools like Aider, Claude Code, and Cursor are brilliant for single-task workflows—give them an issue, watch them code, review the PR. But the moment you want to parallelize across ten backlog items, you run into fundamental constraints: your IDE can only show one branch at a time, git checkout wipes your working directory, and spawning multiple instances in the same repo creates filesystem race conditions.
Teams tried workarounds. Some developers manually clone the repo ten times into separate directories, wasting gigabytes of disk space and creating .git database drift. Others serialize the work, running agents one after another, negating any parallelism benefits. A few brave souls attempted to build coordination layers with Docker containers or separate VMs, adding operational complexity that killed the productivity gains. Agent Orchestrator takes a different approach: it uses git worktrees—a native git feature that creates multiple working directories sharing the same repository database—and wraps them in an event-driven orchestration layer that handles CI failures, code reviews, and merge conflicts as first-class reactive events.
Technical Insight
The architecture hinges on git worktrees, which let you check out multiple branches simultaneously without separate clones. When the orchestrator spawns an agent for a new issue, it creates a dedicated worktree with git worktree add ../worktree-issue-123 -b feature/issue-123. Each agent gets its own filesystem view, but they all share the same .git database, hooks, and remote configuration. This is vastly lighter than full clones and eliminates branch conflicts at the filesystem level.
The plugin system uses TypeScript interfaces to define seven extension points. Here's how you'd implement a custom agent runtime that wraps an existing CLI tool:
import { AgentPlugin, Task, WorkspaceContext } from '@agent-orchestrator/types';
import { spawn } from 'child_process';
export class CustomAgentPlugin implements AgentPlugin {
name = 'my-custom-agent';
async executeTask(task: Task, context: WorkspaceContext): Promise<void> {
const proc = spawn('my-agent-cli', [
'--task', task.description,
'--worktree', context.worktreePath,
'--branch', context.branchName
], { cwd: context.worktreePath });
return new Promise((resolve, reject) => {
proc.on('exit', (code) => {
code === 0 ? resolve() : reject(new Error(`Agent failed: ${code}`));
});
});
}
}
The orchestrator reads a YAML config that defines how to route events:
orchestrator:
runtime: tmux
agent: claude-code
maxConcurrent: 10
reactions:
- trigger: ci-failed
action: send-to-agent
prompt: "CI failed with errors: {{errors}}. Fix the issues."
maxRetries: 3
- trigger: changes-requested
action: send-to-agent
prompt: "Address review comments: {{comments}}"
escalate: true
This reaction system is where the real power lives. When a PR's CI fails, GitHub webhooks notify the orchestrator, which routes the failure back to the original agent's tmux session with context. The agent sees the error logs, modifies code, and force-pushes—all without human intervention. It's a feedback loop that models the entire PR lifecycle as a state machine.
Platform-specific runtime selection shows production polish. On Unix systems, agents run in tmux sessions you can attach to for debugging. On Windows, it uses node-pty with ConPTY (the Windows pseudoterminal interface) to spawn native processes. The abstraction is clean:
const runtime = process.platform === 'win32'
? new ConPTYRuntime()
: new TmuxRuntime();
The orchestrator itself is LLM-powered. You can give it a natural language goal like 'fix all TypeScript strict mode errors' and it will query your issue tracker, generate task descriptions, and spawn agents accordingly. This recursive design—an AI agent managing other AI agents—means the coordination logic evolves with the underlying models rather than being frozen in hardcoded rules.
One subtle but critical detail: the orchestrator holds a macOS caffeinate assertion to prevent sleep during long-running operations. Most frameworks ignore power management, leading to mysterious failures when laptops suspend after 30 minutes and kill all background processes. Agent Orchestrator explicitly prevents idle sleep, keeping the web dashboard accessible and agents running overnight.
Gotcha
Everything runs on localhost, which creates a hard scaling ceiling. Spawning 50 agents means 50 CLI processes hitting the same API endpoints, and you'll hit rate limits or context window exhaustion long before system resources fail. There's no distributed execution model—no way to spread agents across multiple machines or offload to cloud workers. For teams, this means one person's laptop becomes the bottleneck.
Git worktrees share .git/config and hooks, so isolation isn't absolute. If one agent's commit triggers a pre-commit hook that reformats files or generates code, it can modify the .git state in ways that affect other worktrees. Similarly, if you have git hooks that modify the working directory based on branch names, you can create unexpected side effects. Full repo clones give stronger isolation, but at the cost of disk space and sync overhead. The reactions retry logic is also naive—just a counter, no exponential backoff. A flaky test suite will burn through all three retries in rapid succession instead of waiting for transient infrastructure issues to resolve.
Verdict
Use if: You have a backlog of 20+ low-risk issues (dependency updates, lint fixes, test additions) where code review is the bottleneck, not the writing. You're comfortable supervising AI-generated PRs and your CI/test coverage is strong enough to catch mistakes. You work on a standard polyglot codebase where agents can operate semi-independently without deep architectural knowledge. Skip if: You need deterministic behavior for regulated industries, work in a massive monorepo where 10 simultaneous PRs create merge conflict hell, or your issues require deep system understanding that current LLMs can't handle. Also skip if you're looking for a team-wide SaaS solution—this is a power-user tool for individual developers or small teams willing to run infrastructure locally.