> your AI agent picks dependencies from memory; give it dated facts — try starlog.dev ↗ vet your agent's deps ↗ vibe-coding is fine. vibe-importing isn’t. — try starlog.dev ↗ vibe-importing isn’t fine ↗ your agent has never seen your private packages — try starlog.dev ↗ facts for private packages ↗ a linter for the dependencies your AI agent picks — try starlog.dev ↗ a linter for agent deps ↗

Back to Articles

Leda: The Meta-Agent That Writes Multi-Agent Systems For You

[ View on GitHub ]

Leda: The Meta-Agent That Writes Multi-Agent Systems For You

Hook

What if building a multi-agent system required writing zero orchestration code? Leda doesn't just plan your agent architecture—it generates the entire executable pipeline from a single English sentence.

Context

Multi-agent systems have exploded in popularity since ChatGPT demonstrated that LLMs could be more than chat interfaces. Developers quickly realized that complex tasks benefit from specialized agents working in concert—one agent for research, another for synthesis, a third for formatting. But orchestrating these agents is tedious: you're writing coordination logic, managing state handoffs, debugging prompt chains, and dealing with API retries.

The current landscape offers two paths: frameworks like LangGraph and AutoGen that give you powerful primitives but require significant setup, or opinionated tools like CrewAI that structure everything for you but limit flexibility. Leda takes a third approach—meta-programming. Instead of being a runtime framework, it's a code generator that analyzes your goal and produces both the agent architecture and the Python script to execute it. You describe what you want in natural language, and Leda scaffolds the entire system using Google's Gemini as both the planner and the agent runtime.

Technical Insight

Leda operates in two distinct phases that separate planning from execution. In the first phase, you provide a high-level prompt like "analyze customer feedback and generate a product roadmap." Leda uses Gemini to design a multi-agent architecture, determining how many agents you need, what role each plays, and in what sequence they should execute. It's doing prompt engineering on your behalf, but at the architectural level.

The second phase is where it gets interesting: Leda generates a complete Python script that orchestrates the agents it just designed. Each agent receives a specialized system prompt, and the script chains them sequentially—agent A's output becomes agent B's input, which becomes agent C's input, forming a pipeline. Here's what a simplified generated script might look like:

import google.generativeai as genai
import os

genai.configure(api_key=os.environ.get('GEMINI_API_KEY'))

# Agent 1: Feedback Analyzer
analyzer_prompt = """You are a customer feedback analyst. 
Extract key themes, pain points, and feature requests.
Provide structured output with categories and frequency."""

model = genai.GenerativeModel('gemini-pro')
analyzer_response = model.generate_content(
    f"{analyzer_prompt}\n\nFeedback Data: {user_input}"
)

# Agent 2: Priority Scorer
scorer_prompt = """You are a product strategist.
Score each feature request by impact and effort.
Use the RICE framework (Reach, Impact, Confidence, Effort)."""

scorer_response = model.generate_content(
    f"{scorer_prompt}\n\nAnalysis: {analyzer_response.text}"
)

# Agent 3: Roadmap Generator
roadmap_prompt = """You are a product manager.
Create a quarterly roadmap from scored features.
Format as a markdown table with timelines."""

final_response = model.generate_content(
    f"{roadmap_prompt}\n\nPriorities: {scorer_response.text}"
)

print(final_response.text)

The architecture is deliberately simple: linear chain prompting where each agent is a Gemini API call with a specialized system prompt. There's no complex state management, no branching logic, no parallel execution. This simplicity is both the strength and limitation of the approach—it reduces cognitive overhead for the 80% of use cases that fit sequential workflows, but offers no escape hatch for the 20% that don't.

What makes Leda genuinely useful is that it handles the meta-problem of agent design. When you're staring at a blank file wondering "how many agents do I need for this task?" or "what should each agent's role be?", Leda provides an opinionated starting point. The generated code is readable, uses standard libraries, and follows predictable patterns. You can immediately run it, see results, then iterate on the prompts or agent sequence.

The trade-off is complete Gemini lock-in. There's no abstraction layer for swapping LLM providers, no configuration for using local models, no fallback mechanisms. Every agent is a genai.GenerativeModel call. If you need Claude for reasoning tasks or GPT-4 for creative tasks, you're manually refactoring. The tool assumes Gemini is the right hammer for every nail, which works until you encounter a nail that needs a different tool.

Leda also reveals an interesting tension in the multi-agent ecosystem: is the hard part designing the architecture or maintaining the runtime? Leda bets on the former—that developers struggle more with "what agents should I build?" than "how do I make agents talk to each other?" For rapid prototyping and exploration, that bet pays off. You get from idea to working code faster than writing orchestration logic manually. But for production systems, the lack of error handling, retry logic, observability, and testing infrastructure means the generated code is a starting point, not a finish line.

Gotcha

The sequential-only execution model is the biggest limitation. If your workflow requires agents to work in parallel (e.g., simultaneously researching multiple topics), conditional branching (agent C runs only if agent B finds something), or cyclic patterns (agents iterating until convergence), Leda doesn't support it out of the box. The generated code is a straight pipeline, period. You'd need to manually refactor the output, at which point you're doing the orchestration work Leda promised to eliminate.

Generated code quality varies with prompt clarity, and there's no validation or testing layer. Leda produces Python that runs, but whether it handles edge cases, retries failed API calls, or validates agent outputs is up to what Gemini decides to generate. I've seen generated scripts that assume API calls always succeed and outputs are always well-formed—assumptions that break immediately in production. There's also no iterative refinement: if the first generation isn't quite right, you're either re-running with a modified prompt (hoping for different results) or manually editing the output. The tool doesn't learn from failures or offer a feedback loop to improve generated code quality over multiple attempts.

Verdict

Use Leda if you're prototyping multi-agent workflows on Gemini, need a fast way to explore agent architectures without writing boilerplate, or want to learn multi-agent patterns by examining generated code. It's excellent for proof-of-concepts, hackathons, or early-stage projects where speed to first result matters more than production readiness. The code generation approach genuinely accelerates the "blank page problem" of agent design. Skip Leda if you need production-grade systems with error handling and observability, require complex agent topologies beyond sequential chains, want multi-provider flexibility or local model support, or need fine-grained control over orchestration logic. Treat the generated code as an educational scaffold or MVP foundation, not a deployment-ready solution. For anything beyond linear workflows or outside the Gemini ecosystem, you'll quickly outgrow what Leda provides and need a more robust framework like LangGraph or AutoGen.