Nerve: Building AI Agents as Infrastructure Code
Hook
What if you could define, version, and install an AI agent the same way you'd install a Kubernetes chart or Terraform module? Nerve makes agents look less like application code and more like infrastructure manifests.
Context
The current landscape of agent development is dominated by programmatic frameworks that treat agents as runtime objects constructed through Python or JavaScript APIs. LangChain, AutoGen, and CrewAI all require developers to write imperative code to define agent behavior, wire up tools, and orchestrate interactions. This approach works well for complex applications but creates friction for simpler use cases: automation scripts that need LLM reasoning, reproducible benchmarks for model evaluation, or shareable agent templates that non-developers can customize.
Nerve takes a radically different approach inspired by infrastructure-as-code principles. Instead of writing Python classes and callback handlers, you define agents in YAML files that specify their system prompts, available tools, and behavioral parameters. These agent definitions become portable artifacts that can be committed to Git, shared via GitHub repositories, and installed with a single CLI command. The framework's creator, Simone Margaritelli (known for security tools like Bettercap), designed Nerve to make agent development feel more like configuring a service than building an application—a paradigm shift that significantly lowers the barrier to entry while maintaining extensibility for power users.
Technical Insight
At its core, Nerve is a YAML interpreter that bridges declarative agent definitions with LiteLLM's model-agnostic inference layer. An agent definition looks deceptively simple:
name: code-reviewer
system_prompt: |
You are a senior code reviewer focused on security and performance.
Analyze code for common vulnerabilities and optimization opportunities.
model: gpt-4o-mini
temperature: 0.3
tools:
- name: read_file
type: shell
command: cat {filepath}
parameters:
filepath:
type: string
description: Path to the file to review
- name: run_linter
type: python
code: |
import subprocess
def run(filepath):
result = subprocess.run(['pylint', filepath], capture_output=True)
return result.stdout.decode()
parameters:
filepath:
type: string
This YAML compiles into a fully functional agent with type-checked tool invocation. The tools system supports three execution modes: shell commands for quick scripting, embedded Python functions for more complex logic, and remote tools via HTTP endpoints. The parameter schema with type annotations enables the LLM to understand tool contracts without brittle prompt engineering—the framework automatically generates function-calling schemas in the format expected by the underlying model provider.
What sets Nerve apart architecturally is its implementation of the Model Context Protocol (MCP) in both client and server modes. While most frameworks treat MCP as a client-side concern (consuming tools from MCP servers), Nerve allows agents themselves to expose MCP endpoints. This bidirectional capability enables true agent composition:
name: orchestrator
model: claude-3-5-sonnet-20241022
mcp_servers:
- name: researcher
type: stdio
command: nerve
args: ["mcp", "--agent", "./research-agent.yaml"]
- name: writer
type: stdio
command: nerve
args: ["mcp", "--agent", "./writing-agent.yaml"]
workflow:
- agent: researcher
task: "Find recent developments in {topic}"
- agent: writer
task: "Write a summary based on the research findings"
This orchestrator agent treats other Nerve agents as MCP tools, invoking them through the standardized protocol. The workflow directive enables linear agent pipelines where context flows from one agent to the next—the researcher's output becomes available to the writer without manual plumbing.
The evaluation system demonstrates Nerve's "agents as artifacts" philosophy. Test cases are stored as YAML files or Parquet datasets alongside agent definitions:
# tests/code-review-tests.yaml
- input:
filepath: "vulnerable_app.py"
expected_patterns:
- "SQL injection"
- "parameterized queries"
- input:
filepath: "slow_loop.py"
expected_patterns:
- "list comprehension"
- "performance"
Running nerve eval --agent code-reviewer.yaml --tests tests/ produces structured JSON logs that can be versioned and tracked over time, enabling regression detection when you modify prompts or switch models. This makes systematic agent improvement tractable—you can A/B test different system prompts or compare GPT-4 versus Claude outputs using the same reproducible test suite.
The CLI's package management semantics are particularly clever. Running nerve install evilsocket/researcher-agent clones a GitHub repository, validates the agent definition, and makes it available system-wide. Agent definitions can include dependencies on other agents, creating a dependency graph similar to npm or pip packages. This transforms agent sharing from "copy this code snippet" into proper distribution with version pinning and namespace management.
Gotcha
The linear workflow model is Nerve's most significant architectural constraint. While you can chain agents sequentially, there's no native support for conditional branching ("if the research fails, try a different source"), loops ("keep refining until quality threshold met"), or dynamic agent selection based on runtime state. Complex orchestration patterns require dropping down to Python code that wraps the Nerve API, which defeats the declarative purity that makes the framework appealing.
The YAML-first approach also shows strain with sophisticated logic. Embedded Python functions in tool definitions work for small snippets, but larger functions become awkward to maintain in YAML strings without syntax highlighting or linting. There's an implicit complexity ceiling—simple automation and linear pipelines work beautifully, but once you need stateful agents, error recovery strategies, or multi-turn negotiations between agents, you'll find yourself fighting the declarative model. The GPL3 license is another practical consideration: if you're building a commercial SaaS product, the copyleft terms may require open-sourcing your entire application or obtaining a commercial license, which doesn't appear to be offered. For internal tools or open-source projects this is fine, but enterprise adoption may be limited.
Verdict
Use if: You're building automation workflows where LLM reasoning adds value but you don't want to manage a heavy framework—think CI/CD integrations, data processing pipelines, or internal tools that need natural language interfaces. Nerve excels when agent definitions should be versioned artifacts that non-developers can understand and modify, or when you're establishing systematic evaluation practices for prompt engineering and model selection. The MCP server capability makes it ideal for building agent teams where specialized agents compose through standardized protocols. Skip if: Your use case demands complex orchestration with conditional logic, persistent state management, or dynamic workflow graphs—LangGraph or AutoGen provide the necessary primitives. Also skip if you need battle-tested production features like automatic retry policies, distributed tracing, or comprehensive error handling, as Nerve is younger and less proven at scale. Finally, avoid if GPL3 licensing conflicts with your commercial requirements or if you're uncomfortable with a framework that has a single primary maintainer rather than organizational backing.