Back to Articles

Coordinating AI Agent Swarms: How ralph-parallel Orchestrates Concurrent Test Generation

[ View on GitHub ]

Coordinating AI Agent Swarms: How ralph-parallel Orchestrates Concurrent Test Generation

Hook

A single developer used 4 AI agents working in parallel to generate 339 tests from a 52-task specification—and the code actually compiled. Here’s how the orchestration layer prevented the chaos you’d expect from four AIs editing the same codebase simultaneously.

Context

The AI coding assistant landscape has evolved rapidly from simple autocomplete to tools that can write entire features. Claude Code’s Agent Teams feature lets you spin up multiple AI agents that can work alongside you, each with access to the full IDE environment. But there’s a problem: when you have a large specification—say, 50 interconnected tasks to implement—running them serially through a single agent is painfully slow. The obvious solution is parallelization, but coordinating multiple AI agents editing the same codebase simultaneously is a nightmare of merge conflicts, race conditions, and cascading failures.

ralph-parallel emerged from this coordination challenge. Built as a shell-based orchestration layer, it sits between ralph-specum (a tool that generates structured development specifications in markdown) and Claude Code’s Agent Teams API. Its job is to parse those specifications, figure out which tasks can run in parallel without stepping on each other, dispatch them to separate AI agents, and ensure the resulting code actually works. The core innovation isn’t in the AI itself—it’s in the coordination primitives that make multi-agent development practical rather than theoretical.

Technical Insight

Execution Phase

Isolation Layer

Clear Boundaries

Complex Dependencies

Specification Markdown

Dependency Parser

Topological Sort

Phase Orchestrator

Isolation Strategy

File-Ownership Mode

Worktree Mode

PreToolUse Hooks

Agent Execution

Git Worktree Creation

Agent Execution

Quality Gates

Branch Merging

Task Completion

System architecture — auto-generated

The architecture revolves around three key systems: dependency analysis, isolation strategies, and quality gates. When you invoke ralph-parallel with a specification markdown file, it first parses the task definitions and constructs a dependency graph. Tasks are organized into phases where each phase contains work that can execute concurrently because there are no interdependencies within the phase. This is classic topological sorting, but applied to AI agent coordination rather than build systems.

The real engineering challenge is isolation. Two strategies handle different scenarios. The file-ownership strategy assigns specific files to each agent and uses PreToolUse hooks to intercept filesystem operations. If Agent A tries to write to a file owned by Agent B, the hook blocks the operation. This is low-overhead and works beautifully for tasks with clear file boundaries—imagine one agent generating authentication tests while another handles payment tests, each in their own test files. Here’s a simplified view of the hook logic:

# PreToolUse hook intercepts file operations
function check_file_ownership() {
  local agent_id=$1
  local file_path=$2
  local assigned_files=$(get_assigned_files $agent_id)
  
  if ! echo "$assigned_files" | grep -q "$file_path"; then
    echo "ERROR: Agent $agent_id attempted to modify $file_path"
    echo "Assigned files: $assigned_files"
    return 1
  fi
  return 0
}

The worktree strategy takes a different approach for tasks with complex interdependencies. It creates git worktrees—separate working directories pointing to different branches—and assigns each agent its own worktree. Agents can modify anything in their sandbox, but they’re physically isolated. The tradeoff is that you need to manually merge branches when tasks complete, which is why this strategy is reserved for cases where file-level isolation is too restrictive.

Quality gates are where ralph-parallel separates itself from naive parallelization. Each task goes through a 6-stage validation pipeline: typecheck, build, test, lint, integration test, and final review. These gates run at configurable “cadence intervals”—you might run typechecking after every task completion but only run the full test suite every 5 tasks. The coordinator uses PostToolUse hooks to trigger validations:

# Quality gate validation (simplified)
def run_quality_gates(task_id, cadence_count):
    gates = [
        ('typecheck', 1),      # Every task
        ('build', 1),          # Every task  
        ('test', 2),           # Every 2 tasks
        ('lint', 5),           # Every 5 tasks
        ('integration', 10),   # Every 10 tasks
    ]
    
    for gate_name, interval in gates:
        if cadence_count % interval == 0:
            result = subprocess.run([f'./gates/{gate_name}.sh'])
            if result.returncode != 0:
                raise QualityGateFailure(f'{gate_name} failed for task {task_id}')

This staged validation catches failures early. If Agent 2 generates code that breaks typechecking, the system halts that agent’s work immediately rather than letting it cascade into 10 more broken tasks. The coordinator tracks which gates have passed and maintains a “known good” state that agents can roll back to if validation fails.

Lifecycle management handles the messy realities of AI agent coordination. Agents crash. Network requests timeout. You might accidentally close a browser tab mid-execution. ralph-parallel detects stale dispatches by monitoring heartbeat signals—if an agent hasn’t reported progress in N minutes, its tasks are marked as failed and can be reclaimed by other agents. The session reclamation system lets you restart a partially-completed spec run without re-executing successful tasks. It stores completion state in a simple JSON file:

{
  "spec_id": "feature-auth-tests",
  "completed_tasks": ["task-001", "task-003", "task-007"],
  "failed_tasks": ["task-005"],
  "in_progress": {"task-009": "agent-2"},
  "phase": 3
}

When you restart, it reads this state and only dispatches incomplete work. This is critical for large specs that might take hours—you don’t want a single agent failure to invalidate all previous work.

Gotcha

The most significant limitation is the hard coupling to Claude Code’s Agent Teams API. This isn’t a general-purpose agent coordination framework—it’s specifically built for Claude Code’s hook system, team management API, and execution model. If you’re using Cursor, Aider, or any other AI coding tool, ralph-parallel is essentially unusable without rewriting the entire integration layer. Even within Claude Code, you’re limited to 8 parallel agents maximum, which appears to be an infrastructure constraint rather than a design choice. For truly massive specifications (100+ tasks), you’ll still hit serialization bottlenecks.

The shell scripting implementation creates maintenance challenges. Error handling in bash is notoriously fragile, and the mix of shell scripts calling Python analysis scripts creates debugging headaches. When something goes wrong—say, a quality gate fails in an unexpected way—tracing the error through multiple script layers is painful. There’s no structured logging, no debugger support, and stack traces are essentially useless. A proper implementation in Python or Go would be more maintainable, but that would require significantly more development effort. The worktree isolation strategy also has sharp edges: merge conflicts require manual resolution, and if agents make conflicting architectural decisions in their separate branches, reconciling them can be harder than just running tasks serially in the first place.

Verdict

Use if: You’re already using Claude Code and ralph-specum to generate structured development specifications, you regularly work with specs containing 10+ interdependent tasks where parallelization could save hours, you need production-quality output with automated validation rather than draft code that needs extensive manual review, and you can work within the file-ownership or worktree isolation models. The quality gates and lifecycle management show this was built by someone who hit real coordination problems rather than building a demo. Skip if: You’re not locked into the Claude Code ecosystem, you prefer solo AI workflows where you’re directly prompting and reviewing each change, your typical features are small enough (under 5 tasks) that coordination overhead exceeds any parallelization benefit, or you need custom isolation strategies—maybe you want agents coordinating at the function level rather than file level, which ralph-parallel doesn’t support. Also skip if you’re uncomfortable with shell scripting as critical infrastructure or if you need portability across different AI coding tools.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/kavanaghpatrick-ralph-parallel.svg)](https://starlog.is/api/badge-click/ai-agents/kavanaghpatrick-ralph-parallel)