gstack: Garry Tan's Role-Based AI Development Framework
Hook
Garry Tan claims 810× productivity gains using AI coding tools. The secret isn't prompt engineering—it's treating Claude like a company org chart with 23 specialized employees.
Context
AI coding assistants have a blank-slate problem. You open Claude Code, Cursor, or Copilot, stare at an empty prompt, and type something vague like 'build a login system.' The AI responds with code, but who's checking for security issues? Who's ensuring the UI matches modern design patterns? Who's validating the release notes are accurate?
Garry Tan's gstack emerged from this chaos. As YC President and a prolific founder who ships code himself, Tan needed structure around AI-assisted development—not just code generation, but the entire workflow from feature spec to deployment. The insight: instead of prompting an AI to be everything at once, give it explicit roles. Make it CEO for strategic decisions (/office-hours), a designer for UI review (/design-review), an engineering manager for code review (/review), a QA engineer for testing (/qa), and a release manager for documentation (/ship). The result is 23 specialized slash commands that transform Claude Code from a general-purpose chatbot into a structured development team.
Technical Insight
At its core, gstack is a curated prompt library with workflow orchestration, but the architecture reveals sophisticated opinions about how AI should assist development. The system builds on Claude Code's MCP (Model Context Protocol) and implements skills as structured command templates. Each slash command is a JavaScript module that exports a persona definition, system prompts, and workflow logic.
Here's how a typical gstack skill is structured:
export const reviewSkill = {
name: 'review',
description: 'Engineering Manager code review',
persona: 'Senior Engineering Manager',
systemPrompt: `You are a senior engineering manager conducting code review.
Focus on:
- Architecture decisions and maintainability
- Security vulnerabilities and edge cases
- Performance implications
- Code clarity and documentation
- Testing coverage
Provide specific, actionable feedback with file/line references.`,
hooks: ['pre-commit'],
async execute(context) {
const diff = await context.git.getDiff();
const analysis = await context.claude.analyze(diff);
return formatReviewComments(analysis);
}
};
The skill-based architecture ensures consistency—every time you type /review, you get the same engineering manager persona with the same evaluation criteria. No prompt drift, no forgetting to ask about security, no inconsistent review quality.
What sets gstack apart from simple prompt templates is its integration depth. The system uses Bun as its runtime (not Node.js), which provides faster startup times crucial for interactive workflows. It hooks into Git via pre-commit and post-merge hooks, enabling 'team mode' where the latest skill definitions automatically sync across collaborators. When someone updates the /qa skill with better test coverage checks, everyone on the team gets it on their next git pull.
The /qa skill showcases the system's most sophisticated capability: browser automation. Instead of just analyzing code statically, it uses Playwright to actually run your application:
export const qaSkill = {
name: 'qa',
description: 'QA Engineer with browser automation',
persona: 'QA Engineer',
async execute(context) {
const browser = await playwright.chromium.launch();
const page = await browser.newPage();
// Load the app and run user flows
await page.goto('http://localhost:3000');
const testResults = await context.claude.generateTests({
currentPage: await page.content(),
screenshot: await page.screenshot()
});
// Execute AI-generated test scenarios
for (const test of testResults.scenarios) {
await test.execute(page);
}
return compileQAReport(testResults);
}
};
This moves beyond static analysis into runtime validation—Claude sees your actual UI, generates test scenarios, and executes them in a real browser. The QA persona knows to check for broken links, accessibility issues, mobile responsiveness, and functional correctness.
The productivity metric controversy deserves technical scrutiny. Tan's 810× claim is calculated using 'logical lines of code' (statements that perform work) rather than raw LOC. The rationale: AI-generated code tends toward verbosity—more explicit error handling, more comments, more type annotations. Measuring raw lines would inflate productivity metrics artificially. The system tracks commits, features shipped, and bug rates over time, normalizing for code density. While self-reported and context-specific, the transparency around measurement methodology is unusual among AI coding tool claims.
Integration with OpenClaw (an autonomous agent framework) happens via ACP (Agent Control Protocol) spawning. When gstack detects tasks requiring sustained autonomous work—like refactoring an entire module or generating comprehensive test suites—it can spawn OpenClaw agents that operate independently while gstack maintains the human-in-loop review workflow. This hybrid approach balances automation with oversight.
Gotcha
gstack's tight coupling to the Claude Code ecosystem is its primary limitation. The framework requires Bun (not just Node.js), assumes Claude Code's MCP protocol, and its Git hooks expect specific directory structures (.claude/ configuration). If you're using Cursor, GitHub Copilot, or other AI coding tools, porting these skills would require significant adapter code. The slash commands are Claude Code-native; there's no universal interface.
The role-based structure, while powerful for consistency, can feel constraining if you're used to freeform AI assistance. Want Claude to simultaneously act as designer and engineer while considering business constraints? You'll need to invoke multiple skills sequentially (/office-hours, then /design-review, then /review) rather than having a single fluid conversation. Some developers find this workflow compartmentalization helpful; others find it interrupts their flow state. The framework optimizes for reproducible quality over conversational flexibility.
The productivity metrics, while transparently documented, come from Tan's own repositories and workflow patterns. A founder shipping MVPs quickly will see different results than a team maintaining a legacy enterprise application. The 23 skills are opinionated toward rapid iteration and startup workflows—there's no '/compliance-check' for regulated industries, no '/legacy-migration' for brownfield refactoring, no '/performance-profiling' for optimization-heavy work. You can add custom skills, but the out-of-box personas reflect a specific development philosophy.
Verdict
Use gstack if you're already invested in Claude Code and want structured, repeatable workflows for AI-assisted development—especially valuable for technical founders who ship code solo but need systematic review processes. The role-based approach shines when you're context-switching between strategic decisions, implementation, and quality assurance. Team mode's auto-update mechanism via Git hooks solves real version drift problems for small teams sharing AI tooling. The browser automation in /qa provides unique runtime validation capabilities missing from most AI coding assistants. Skip it if you're not using Claude Code (porting is nontrivial), prefer freeform AI conversations over structured commands, or work in heavily regulated domains where the out-of-box personas don't match your compliance requirements. Also skip if you're evaluating AI coding tools across multiple providers—gstack's value proposition is Claude-specific, and you'll want ecosystem flexibility while the AI coding landscape remains volatile. The learning curve is minimal (slash commands are discoverable), but the payoff requires embracing the full methodology: treating AI assistance as a team of specialists rather than a single general-purpose assistant.