Superpowers: Teaching AI Agents to Stop Cowboy Coding
Hook
What if the biggest problem with AI coding agents isn’t that they can’t write code, but that they code exactly like a caffeinated junior developer who’s never heard of tests? Superpowers doesn’t make agents smarter—it makes them more disciplined.
Context
AI coding agents promised autonomous software development, but early adopters hit a wall: agents would enthusiastically generate code for hours, only to deliver an unmaintainable mess with zero tests, no architectural coherence, and implementation details that drifted miles from the original requirements. The core issue wasn’t capability—it was methodology. Left to their own devices, agents skip the boring parts: writing tests first, planning before coding, reviewing their own work. They optimize for token efficiency, not code quality.
Superpowers emerged from this frustration as a prompt-engineering framework that enforces software development best practices through structured workflows. Created by obra, it treats AI agents not as magical code generators but as “enthusiastic junior engineers with poor taste, no judgment, no project context, and an aversion to testing.” The framework consists of composable ‘skills’—markdown documents containing detailed instructions that trigger automatically based on context—turning chaotic agent behavior into a disciplined, TDD-driven development process. With over 52,000 GitHub stars, it represents a shift in thinking: instead of improving the agents themselves, we can improve the processes they follow.
Technical Insight
Superpowers’ architecture is deceptively simple: markdown files containing structured prompts that compose into workflows. But the sophistication lies in how these skills enforce methodology. Each skill is a self-contained instruction set that agents can invoke automatically based on project context, creating a seamless overlay of best practices without manual intervention.
The flagship innovation is subagent-driven development. Rather than letting a single agent context window accumulate hours of decisions and drift, Superpowers spawns fresh agent instances for each discrete task. Here’s how it works in practice:
# Skill: subagent-task-execution
When assigned a task from the implementation plan:
1. Spawn fresh agent instance with:
- Original requirements document
- Specific task description
- Current codebase snapshot
- Zero conversational history
2. Agent executes task in isolation
3. Two-stage review before merge:
- Stage 1: Spec compliance check
Does implementation match requirements?
Are acceptance criteria met?
- Stage 2: Code quality review
Is code maintainable?
Are tests comprehensive?
Does it follow project conventions?
4. Only after both reviews pass: merge to main branch
This pattern solves context drift—the phenomenon where agents gradually lose sight of original requirements as conversations extend. Each subagent starts with perfect clarity about its narrow task, executes it, and terminates. The multi-stage review ensures quality without relying on the agent’s own judgment about “good enough.”
The TDD enforcement mechanism is equally clever. Agents naturally resist writing tests first because it’s less token-efficient to generate tests for nonexistent code. Superpowers makes RED-GREEN-REFACTOR mandatory through active enforcement:
#!/bin/bash
# Skill: enforce-tdd-workflow
# Before any implementation:
function red_phase() {
echo "Write failing test first"
# Agent must create test file
# Agent must run tests and confirm failure
# Agent must commit: 'RED: test for feature X'
}
function green_phase() {
echo "Implement minimum code to pass"
# Agent writes implementation
# Agent runs tests until passing
# Agent must commit: 'GREEN: implement feature X'
}
function refactor_phase() {
echo "Improve code quality"
# Agent refactors with tests as safety net
# Agent must commit: 'REFACTOR: clean up feature X'
}
# If implementation code exists without prior test: DELETE IT
if git diff --name-only | grep -v "_test\." | grep -v "\.test\." > /dev/null; then
if ! git log -1 | grep "RED:"; then
echo "Code written before tests detected. Reverting."
git reset --hard HEAD~1
exit 1
fi
fi
This isn’t polite suggestion—it’s enforcement through automation. Write implementation before tests? The skill deletes it. The agent learns quickly.
The git worktree integration deserves special attention. Traditional branching leaves artifacts in the working directory that confuse agents about baseline state. Superpowers uses worktrees to give each development thread its own isolated filesystem:
# Each feature gets its own directory via worktree
git worktree add ../feature-authentication feature/authentication
cd ../feature-authentication
# Agent works here with zero contamination from other branches
# Tests run against clean baseline
# No "works on my branch" issues
The skills themselves are composable. A high-level “implement feature” skill might invoke “create-implementation-plan,” which invokes “brainstorm-approaches,” which invokes “analyze-existing-codebase.” Each skill is a markdown document with clear inputs, outputs, and success criteria. The agent doesn’t need to remember the methodology—the skills encode it.
What makes this genuinely powerful is the mental model shift. Traditional agent prompting treats the AI as a senior developer who needs high-level guidance. Superpowers treats it as a junior who needs detailed process, explicit acceptance criteria, and automatic guardrails. The implementation plans it generates are almost comically detailed—and that’s precisely why they work for extended autonomous sessions.
Gotcha
Platform integration is fragmented and constraining. Superpowers ships as a plugin for Claude Code’s marketplace, direct instructions for Codex, and separate configuration for OpenCode. There’s no universal integration layer, which means you’re either on a supported platform or you’re manually copying markdown skills and hoping your agent follows them consistently. If you’re using Cursor, GitHub Copilot, or a custom agent setup, you’ll be adapting the methodology by hand—and at that point, you’re maintaining your own fork of the philosophy.
The framework’s effectiveness depends entirely on the underlying agent’s instruction-following capability. Superpowers can’t make a poor agent good; it can only structure a capable agent’s workflow. If your agent struggles with multi-step instructions or doesn’t reliably parse markdown-formatted skills, the entire methodology collapses. You’re also locked into the git-worktree-based branching model, which works beautifully for standard git workflows but breaks down for monorepos with complex build dependencies, projects using Perforce or other VCS, or codebases where filesystem isolation isn’t sufficient (think databases, microservices with tight coupling, or embedded development with hardware dependencies). The shell-script-heavy implementation also shows its age for language-specific workflows—there’s no deep integration with language servers, debuggers, or IDE-native refactoring tools.
Verdict
Use if: You need AI agents to work autonomously for hours on feature development where code quality and testing are non-negotiable, you’re on a supported platform (Claude Code, Codex, OpenCode), your project uses git with standard branching workflows, and you’re willing to treat agents as junior developers who need process enforcement rather than creative partners. This shines for greenfield projects, refactoring efforts with comprehensive test suites, and teams that want TDD compliance without constant human oversight. Skip if: You’re doing exploratory prototyping where speed matters more than structure, working with custom agent setups that don’t support the plugin model, maintaining legacy codebases where TDD retrofitting isn’t feasible, or your toolchain involves non-git VCS, complex monorepo builds, or hardware dependencies that break the isolation model. Also skip if you’re an experienced developer who prefers tighter control—Superpowers optimizes for autonomous agent work, not interactive pair programming.