Back to Articles

Comanda: Orchestrating Multi-LLM Agent Workflows with Declarative YAML

[ View on GitHub ]

Comanda: Orchestrating Multi-LLM Agent Workflows with Declarative YAML

Hook

Every developer has a directory full of hacky Python scripts calling GPT-4, each slightly different, none reproducible, and all abandoned after the first run. Comanda makes LLM workflows as maintainable as your CI/CD pipelines.

Context

The explosion of LLM APIs created a new problem: workflow chaos. Teams interact with Claude for code review, GPT-4 for documentation, and Gemini for analysis, but each interaction is typically ad-hoc—either through web interfaces, one-off scripts, or increasingly tangled Python notebooks. There's no standardization, no version control, and certainly no way to compare outputs systematically across providers.

This matters because LLM-assisted development is moving from experimentation to production workflows. Code reviews, documentation generation, and refactoring suggestions are becoming routine engineering tasks. But unlike other automation in the development lifecycle, LLM interactions remain stubbornly manual and irreproducible. Comanda addresses this by treating agent workflows as declarative infrastructure: write YAML, execute workflows, version everything alongside your code.

Technical Insight

Outputs

LLM_Providers

Dependency graph

Parallel execution

Parallel execution

Parallel execution

$VAR substitution

Inputs

File/URL/DB Input

STDIN Input

Codebase Index

YAML Workflow Definition

Workflow Parser

Variable Resolver & Dependency Graph

Parallel Execution Orchestrator

Claude API

OpenAI API

Gemini API

STDOUT

File Output

Variable Store

System architecture — auto-generated

At its core, Comanda is a workflow orchestration engine written in Go that parses YAML pipeline definitions and executes them against multiple LLM providers. The architecture separates workflow definition from execution, allowing you to describe what you want to happen rather than scripting the procedural steps.

A basic workflow looks like this:

workflow:
  name: code-review
  steps:
    - name: analyze
      provider: claude
      model: claude-3-5-sonnet
      input: file://src/auth.go
      prompt: "Review this code for security issues"
      output: $SECURITY_REVIEW
    
    - name: compare
      provider: openai
      model: gpt-4
      input: file://src/auth.go
      prompt: "Review this code for security issues"
      output: $GPT_REVIEW
    
    - name: synthesize
      provider: gemini
      model: gemini-pro
      input: |
        Claude's review: $SECURITY_REVIEW
        GPT-4's review: $GPT_REVIEW
      prompt: "Compare these reviews and identify consensus concerns"
      output: stdout

The variable substitution system ($VAR syntax) creates a data flow graph between steps. Comanda tracks dependencies and can execute independent steps in parallel—the analyze and compare steps above run concurrently since neither depends on the other's output. This parallel multi-provider execution is Comanda's killer feature: you get comparative analysis across models without manually coordinating API calls.

The tool supports multiple input sources beyond files: stdin://, url://, and even database queries through connection strings. This means you can pipe git diffs directly into workflows, fetch remote documentation, or analyze production data:

steps:
  - name: analyze-pr
    provider: claude
    input: stdin://
    prompt: "Summarize the changes in this diff"
    output: $SUMMARY

Run it with: git diff main | comanda run review.yaml

Comanda's codebase indexing feature solves a critical problem with context windows. Rather than jamming your entire repository into each prompt (expensive and limited), you can build a persistent index:

comanda index create my-project ./src

Then reference it in workflows:

steps:
  - name: refactor
    provider: claude
    context: index://my-project
    input: "We need to migrate from REST to gRPC"
    prompt: "Suggest a migration plan based on our current codebase structure"

The index gives the LLM contextual awareness without burning tokens on code it doesn't need to see. Under the hood, Comanda uses embeddings to retrieve relevant code sections based on the prompt, essentially building a RAG (Retrieval Augmented Generation) pipeline without you writing one.

Perhaps most interesting is the git worktree integration for parallel experimentation. When you enable it, Comanda creates isolated git worktrees for each workflow execution, allowing parallel runs to modify code without conflicts:

workflow:
  name: parallel-refactor
  worktree: true
  steps:
    - name: refactor-style-a
      provider: claude
      prompt: "Refactor using functional approach"
      tool: write_file
    
    - name: refactor-style-b
      provider: openai
      prompt: "Refactor using object-oriented approach"
      tool: write_file

Each step executes in its own worktree branch. You can then compare the results, cherry-pick changes, or discard experiments—all through standard git workflows. This transforms LLM-assisted refactoring from a risky manual process into a safe, parallelizable operation.

Gotcha

Comanda's provider limitation is real. It supports Claude, Gemini, and OpenAI—period. No Llama, no Mistral, no local models through Ollama. For teams running self-hosted LLMs or using alternative providers, this is a complete blocker. The codebase shows provider-specific logic hardcoded rather than a plugin architecture, so adding custom providers means forking and maintaining your own version.

The YAML configuration model, while version-controllable, becomes unwieldy for complex workflows with conditional logic. There's no native if/else branching or loop constructs. You can achieve iteration through multi-step workflows that reference previous outputs, but sophisticated decision trees require external orchestration. If your workflow needs dynamic branching based on LLM outputs ("if the code review finds security issues, run additional analysis"), you'll end up writing scripts that invoke Comanda workflows conditionally, partially defeating the declarative purpose. For truly complex agentic systems with dynamic planning, frameworks like LangChain or AutoGen provide more flexibility at the cost of significantly more complexity.

Verdict

Use Comanda if you're building reproducible LLM workflows for code analysis, review, or generation where comparing multiple model outputs adds value, and you want those workflows version-controlled alongside your code. It's particularly strong for teams standardizing AI-assisted development processes—turning "I asked ChatGPT to review this" into "I ran our standard review workflow." The parallel execution and codebase indexing make it ideal for systematic code evaluation tasks. Skip it if you need local or self-hosted LLM support, require complex conditional workflow logic beyond sequential steps, prefer interactive chat interfaces over batch processing, or work primarily with providers outside the Claude/GPT/Gemini ecosystem. Also skip if you're building customer-facing AI features rather than internal development tools—Comanda is designed for engineering workflows, not product integration.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/kris-hansen-comanda.svg)](https://starlog.is/api/badge-click/ai-agents/kris-hansen-comanda)