GSD-2: Building Autonomous Coding Agents That Don’t Forget What They’re Doing
Hook
Most AI coding agents start strong but gradually drift off-task as their context window fills with noise. GSD-2 solves this by treating amnesia as a feature, not a bug—it wipes the agent’s memory between tasks and surgically re-injects only what matters.
Context
The promise of autonomous coding agents has been tantalizing since GPT-4 launched: write a specification, walk away, come back to working code. The reality has been messier. Tools like Aider and Cursor excel at pair programming—short, focused edits with human oversight—but fall apart on multi-hour builds. The agent starts implementing a new API endpoint and three hours later you discover it’s refactoring unrelated database schemas because something in the conversation history triggered a tangent.
The core problem is context degradation. LLMs have fixed context windows, and as they fill with code, file trees, error logs, and their own previous outputs, the signal-to-noise ratio collapses. The original goal—buried somewhere in message 47 of a 200-message thread—becomes a vague memory. Early autonomous agent frameworks tried to solve this with better prompting: “Stay focused on the task!” or “Review the original requirements before each step!” GSD-2’s creators took a different approach: they built a system that programmatically controls the agent’s context at the SDK level, treating the LLM as a component in a larger state machine rather than hoping prompt discipline would save the day.
Technical Insight
GSD-2’s architecture revolves around three core ideas: git worktree isolation, programmatic context injection, and living documentation sync. Let’s unpack how these work together.
The worktree isolation strategy is surprisingly elegant. When you kick off a GSD-2 project, it breaks your specification into discrete milestones and stores them in .gsd/ directories. Each milestone executes in its own git worktree—a separate working directory pointing to a dedicated branch. This means Milestone A (“Set up authentication”) and Milestone B (“Build REST API”) can literally run in parallel on different branches without interfering. When a milestone completes, its branch squash-merges back to main, keeping history clean. If an agent crashes mid-milestone, the worktree is still there with all partial progress and lock files for recovery.
Here’s what a typical dispatch looks like in the TypeScript SDK:
import { GSDAgent, MilestoneConfig } from '@gsd-build/sdk';
const milestone: MilestoneConfig = {
id: 'auth-setup',
description: 'Implement JWT authentication with refresh tokens',
context: {
files: ['src/types/user.ts', 'docs/DECISIONS.md'],
knowledge: ['We use jose library, not jsonwebtoken'],
maxTokens: 50000
},
supervision: {
timeout: 7200, // 2 hours
stuckThreshold: 600, // Flag if no progress for 10min
recoveryPrompt: 'You seem stuck. Review REQUIREMENTS.md and pick the simplest next step.'
}
};
const agent = new GSDAgent({
model: 'claude-3-5-sonnet-20241022',
fallbackModel: 'gpt-4o' // Complexity-based routing
});
await agent.dispatch(milestone);
Notice what’s happening: you’re not writing a better prompt, you’re configuring a supervised execution environment. The context.files array is pre-injected—GSD-2 reads those files and stuffs them into the agent’s starting context before it writes a single line of code. The agent never has to cat files or build its own mental model of the codebase from scratch. The supervision block is even more interesting: if the agent makes no commits for 10 minutes, GSD-2 doesn’t just let it spin—it injects a recovery prompt to steer it back on track.
The living documentation system is where GSD-2 gets clever about knowledge transfer. You maintain four canonical files: DECISIONS.md (architectural choices), REQUIREMENTS.md (what to build), PROJECT.md (current state), and KNOWLEDGE.md (domain facts). These sync across all worktrees automatically. When Milestone B starts, it gets the updated DECISIONS.md from Milestone A’s work, even though they’re on different branches. This is how the system maintains “big picture” awareness without bloating every agent’s context with the entire project history.
The complexity-based model routing deserves attention too. GSD-2 can analyze a task and decide whether to use Claude 3.5 Sonnet for heavy lifting or GPT-4o for simpler edits. This isn’t just cost optimization—it’s about matching model capabilities to task difficulty. A typo fix doesn’t need the most expensive model; a database migration does.
Here’s a peek at the HTML report output after a multi-milestone run:
// Generated report includes:
{
milestones: [
{
id: 'auth-setup',
status: 'completed',
duration: 4200, // seconds
tokensUsed: 47332,
cost: 0.94,
commits: 8,
filesChanged: 12,
dependsOn: [],
blockedBy: []
},
{
id: 'api-endpoints',
status: 'in-progress',
tokensUsed: 23441,
dependsOn: ['auth-setup']
}
],
totalCost: 2.37,
dependencyGraph: '...' // Mermaid diagram
}
This isn’t post-hoc logging—GSD-2 is building an audit trail as it runs. You can see which milestones are bottlenecks, where token budgets exploded, and whether your task breakdown makes sense (if everything depends on one milestone, your parallelization strategy is broken).
The skill discovery system is the wild card. During an initial research phase, GSD-2 can detect what kind of project you’re building (Next.js app? Rust CLI? Python data pipeline?) and automatically install relevant “skills”—predefined patterns for testing, deployment, or framework-specific boilerplate. This means the system theoretically improves its capabilities over time without manual prompt tuning.
Gotcha
The Node.js 24 LTS requirement is stricter than it sounds. If you’re on a Mac and installed Node via Homebrew, you might hit issues with development releases that haven’t propagated through package managers yet. The recommended approach is to use nvm, but that’s an extra setup step that can be annoying if your team standardizes on Homebrew for everything.
Migration from GSD v1 is advertised as smooth, but it’s really only smooth if your v1 project followed the ROADMAP.md convention religiously. If you were using the older phases/ directory structure without clear milestone boundaries, GSD-2’s migration script makes its best guess—and those guesses can be wrong. You’ll end up manually reorganizing milestones in .gsd/ before you can actually run anything. Also, the system is genuinely new despite the 2K+ stars. The v2 rewrite is a fundamental paradigm shift, so edge cases are still being discovered. Expect to read GitHub issues and potentially patch things yourself if you’re doing something non-standard. The observability features (HTML reports, dependency graphs) are fantastic when they work, but if a milestone crashes hard enough, you might get incomplete reporting that makes debugging harder, not easier.
Verdict
Use GSD-2 if you’re a solo developer or small team executing multi-day builds from detailed specifications where the cost of human interruption is high—think migrating a legacy codebase, scaffolding a new microservice architecture, or building out a feature with 20+ interrelated files. The crash recovery and cost tracking make it viable for overnight runs without babysitting. Skip it if you’re working in a regulated environment that requires commit-level human approval (the squash-merge strategy hides intermediate steps), if your team hasn’t bought into autonomous agents yet (this requires trust in the automation), or if you need something battle-tested for production-critical work—GSD-2’s paradigm is powerful but still maturing. Also skip if your development environment is locked down in ways that make the Node.js 24/TypeScript SDK requirements a bureaucratic nightmare.