GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Hook

Your AI coding assistant doesn't forget things because it's stupid—it forgets because you're asking it to hold 50,000 tokens of context while writing code. GSD Core throws away the context window after every task, on purpose.

Context

If you've used Claude, Cursor, or Copilot for more than twenty minutes on a real feature, you've watched the quality curve bend downward. The AI starts sharp—refactoring functions, catching edge cases, remembering your architecture decisions. Then it suggests the same fix twice. Forgets the database schema you discussed ten minutes ago. Proposes changes that contradict code it just wrote. This isn't a model quality problem. It's a context window management problem.

Most AI coding tools treat the conversation as sacred. They try to keep one long-running session alive, pruning old messages, summarizing context, hoping the model remembers what matters. GSD Core does the opposite: it kills the context after every meaningful operation and starts fresh. Instead of fighting context decay, it architects around disposable 200k-token sessions orchestrated by Git-tracked markdown files. The AI doesn't remember your project across sessions—the filesystem does.

Technical Insight

System architecture — auto-generated

GSD Core is a file-based state machine where markdown documents are both memory and instruction set. When you run /gsd-plan, you're not calling a JavaScript function—you're spawning a fresh AI session, passing it a markdown file that says "read STATE.md, generate a plan, write it to PLAN.md", then capturing the result back into your Git repository. Every command follows this pattern: read state from files, launch clean AI context, do one job, write result, exit.

The core artifact is STATE.md, which lives in each phase folder (.gsd/phases/001-feature-name/STATE.md). It's a structured document tracking progress through the discuss→plan→execute→verify→ship lifecycle:

# Phase 001: Add User Authentication

## Status
Current Stage: execute
Wave: 2/3
Tasks Completed: 4/9

## Context
Adding JWT-based auth to existing Express API.
Decision: Use bcrypt for password hashing, not argon2 (discussed 2024-01-15)

## Active Tasks
- [DONE] Create User model schema
- [DONE] Implement bcrypt password hashing
- [IN PROGRESS] Add login endpoint
- [BLOCKED] Add token refresh endpoint (waiting on login)

When you run /gsd-execute, the system doesn't ask the AI "what should we build next?" It reads this STATE.md, identifies the next incomplete task, spawns a fresh context with just that task's scope, generates code, applies it, updates STATE.md, then exits. The next task gets a completely new context window. This is the "spawn fresh context" pattern—every substantial operation gets a pristine 200k tokens to work with.

The installer generates runtime-specific command wrappers based on your AI environment. For Claude Code, /gsd-plan becomes a bash script that runs claude /gsd/commands/gsd-plan.md. That markdown file is the actual instruction set:

# GSD Plan Command

You are in PLAN mode. Your job:

1. Read .gsd/phases/[current]/STATE.md
2. Read CONTEXT.md for project architecture
3. Break down the phase goal into executable tasks
4. Write tasks to .gsd/phases/[current]/PLAN.md

Tasks must be:
- Small enough to complete in one AI session
- Dependency-ordered (mark blockers)
- Specific enough to execute without discussion

Output format:
## Wave 1 (no dependencies)
- [ ] Task description with clear acceptance criteria

## Wave 2 (depends on Wave 1)
- [ ] Next task...

The "waves" pattern is how GSD Core parallelizes AI work without a job queue or worker pool. Tasks within a wave have no interdependencies, so you can run them in separate AI sessions simultaneously. Wave 2 tasks see the Git state after Wave 1 commits—causal consistency through filesystem ordering, not distributed systems complexity.

The verify phase closes the loop that most AI tools leave open. After execution, /gsd-verify spawns a fresh context that:

Reviews what was built against the plan
Identifies gaps, bugs, or architectural drift
Generates fix tasks
Auto-executes fixes in a new wave
Repeats until verification passes

This isn't CI integration or automated testing—it's AI-reviewing-AI-output, which inherits the same hallucination risks. But because verification happens in a fresh context with the actual codebase state, it catches different failure modes than the original execution context would.

The phase loop itself is a finite state machine encoded in folder structure and markdown conventions. Phases are numbered directories (.gsd/phases/001-auth, .gsd/phases/002-api-versioning), each with its own STATE.md tracking progress. When you "ship" a phase, you're just merging its branch back to main—Git is the state store, not a database. This makes the entire system auditable through git log and debuggable with git diff.

What makes this architecture non-obvious is the inversion of control. You're not managing the AI—the artifacts are managing both you and the AI. The AI reads STATE.md to know its job, writes results back, exits. You review the Git diff between sessions and approve or reject. If you reject, you edit STATE.md to provide clarification and re-run the command. The human-in-the-loop happens at Git commit boundaries, not during AI execution.

Gotcha

The verify step is AI checking AI's work, which means you can get hallucination stacking—the verifier confidently approves broken code, or worse, "fixes" working code into a broken state. There's no actual test execution, no CI runner, no compiler feedback in the loop. You're trusting fresh-context review to catch what in-context execution missed, but both are using the same model with the same blind spots. If you're working in a strongly-typed language with good tooling, you'll catch type errors immediately. If you're in Python or JavaScript without comprehensive tests, verification theater gives false confidence.

The parallel execution model assumes perfect dependency specification in your plan. If Task A in Wave 2 actually depends on Task B from Wave 1 but you missed that relationship, both tasks spawn fresh contexts, work in parallel, then produce conflicting changes that only surface as merge conflicts after execution. There's no static analysis, no dependency graph validation—just trust that the planning phase got it right. For complex features with subtle ordering requirements, you'll spend more time debugging parallel execution failures than you saved from parallelization.

Verdict

Use if: You're building features in regulated environments where audit trails matter, refactoring legacy codebases where decisions need documentation, or working on projects where AI context degradation has burned you repeatedly. This shines when discipline and reproducibility matter more than velocity, and when you're comfortable managing an AI team member through written instructions rather than conversational refinement. Skip if: You're prototyping rapidly and need tight feedback loops, working outside Git, or expect the AI to infer context from your codebase automatically. Also skip if you need real automated testing in the loop—verification here is review, not execution. If you're building anything where correctness matters more than documentation, pair this with actual CI or use aider for test-running capability.

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Ant Design CLI: How Offline Metadata Snapshots Power Agent-Driven Development

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]