Plano: How Envoy Contributors Built an AI Proxy That Routes Agents with a 4B-Parameter Model
Hook
While most AI frameworks force you to write intent classifiers and routing logic, Plano uses a 4B-parameter model to route requests between agents based purely on natural language descriptions you write in YAML.
Context
The gap between agentic demos and production systems is wider than most teams expect. You prototype a travel assistant with weather and flight agents in an afternoon, but then spend weeks building the middleware: routing logic to determine which agent handles each request, guardrails for safety, observability hooks to understand what’s actually happening, and abstraction layers to swap LLM providers without rewriting code. These concerns get scattered across application code and framework abstractions, creating brittle coupling that makes iteration painful.
Plano addresses this by treating agent orchestration as an infrastructure problem rather than an application concern. Built on Envoy by its core contributors, Plano acts as a data plane that sits between client requests and your agents, handling routing, model management, and telemetry through declarative configuration. Your agents become simple HTTP servers exposing OpenAI-compatible endpoints, while Plano manages the complexity of multi-agent coordination, provider abstraction, and distributed tracing.
Technical Insight
The architecture centers on an out-of-process design where Plano runs as a standalone proxy, completely decoupled from your application framework. You define agents in a YAML configuration file with natural language descriptions of what each agent does, and Plano’s routing model (referenced as plano_orchestrator_v1 in configuration) analyzes incoming requests to determine which agent should handle them. Here’s how you configure a multi-agent system:
version: v0.3.0
agents:
- id: weather_agent
url: http://localhost:10510
- id: flight_agent
url: http://localhost:10520
model_providers:
- model: openai/gpt-4o
access_key: $OPENAI_API_KEY
default: true
- model: anthropic/claude-3-5-sonnet
access_key: $ANTHROPIC_API_KEY
listeners:
- type: agent
name: travel_assistant
port: 8001
router: plano_orchestrator_v1
agents:
- id: weather_agent
description: |
Gets real-time weather and forecasts for any city worldwide.
Handles: "What's the weather in Paris?", "Will it rain in Tokyo?"
- id: flight_agent
description: |
Searches flights between airports with live status and schedules.
Handles: "Flights from NYC to LA", "Show me flights to Seattle"
tracing:
random_sampling: 100
This declarative approach eliminates the need to write intent classifiers or routing logic yourself. The routing model reads the natural language descriptions and matches user requests to the appropriate agent. Your agent implementations become remarkably simple—just HTTP servers that implement the /v1/chat/completions endpoint:
from fastapi import FastAPI, Request
from openai import AsyncOpenAI
app = FastAPI()
llm = AsyncOpenAI(base_url="http://localhost:12001/v1", api_key="EMPTY")
@app.post("/v1/chat/completions")
async def chat(request: Request):
body = await request.json()
# Agent logic here - Plano handles routing, tracing, and provider abstraction
Plano’s smart LLM routing layer provides another key architectural benefit: model aliasing and automatic fallbacks. The system supports routing by model name or alias (semantic names), enabling you to change underlying models via configuration. This enables A/B testing different models, switching providers to optimize costs, or implementing automatic fallbacks when a provider experiences downtime—all without touching application code.
The system leverages Envoy’s Filter Chain architecture for extensibility. You can inject guardrails for jailbreak protection, add moderation policies, or implement memory hooks consistently across all agents by configuring filters in the chain. This is where Plano’s Envoy foundation shines: you get battle-tested proxy infrastructure adapted for AI workloads.
Observability comes through “Agentic Signals”—Plano’s term for the structured data it captures automatically. Beyond standard OpenTelemetry metrics and distributed traces, Plano appears to record additional agentic context without requiring instrumentation code in your agents. The random_sampling: 100 configuration captures complete traces for evaluation and continuous learning, addressing a key gap in multi-agent observability.
Gotcha
The biggest operational consideration is that Plano’s core routing models (including Plano-Orchestrator-4B and Arch-Router models) are hosted externally in US-central by default. While this provides an excellent first-run developer experience—you can spin up the travel agent demo immediately—production deployments will need either API keys for scaled hosted access or self-hosting infrastructure for these routing models. This introduces a dependency on external services or additional operational overhead that teams need to plan for, especially if you have data sovereignty requirements.
The Envoy foundation, while providing robustness, adds complexity. If your team lacks experience with service mesh concepts, there’s a learning curve around understanding Envoy’s configuration model, filter chains, and operational characteristics. Debugging routing issues requires familiarity with Envoy’s admin interface and metrics. For teams building simple single-agent applications or early prototypes, this infrastructure overhead may outweigh the benefits. The project has significant community interest (nearly 6,000 stars) but teams should evaluate maturity for their specific production requirements and test edge cases in complex multi-agent orchestration scenarios.
Verdict
Use Plano if you’re building production multi-agent applications where orchestration complexity, framework independence, and enterprise-grade observability justify the infrastructure investment. It’s ideal when you have multiple specialized agents, need to avoid coupling to specific AI frameworks, or want declarative configuration over writing routing logic. The smart LLM routing is particularly valuable for teams that need provider flexibility or plan to A/B test different models. Skip it if you’re building simple single-agent demos, have deep investment in framework-specific orchestration like LangGraph, or need complete data isolation without any external dependencies for routing models. Also consider alternatives if your team lacks Rust/Envoy expertise and requires extensive customization beyond filter chain capabilities—you’ll be fighting the abstraction rather than benefiting from it.