Leash: Runtime Guardrails for AI Coding Agents Using eBPF and Cedar Policies
Hook
Your AI coding assistant just tried to exfiltrate your entire repository to an unknown domain. Did your security tools catch it? Leash assumes your AI agent is already compromised—and builds guardrails accordingly.
Context
AI coding agents like Claude Code, GitHub Copilot Workspace, and Cursor have crossed the Rubicon from autocomplete helpers to autonomous executors. They don't just suggest code anymore—they run shell commands, modify files across your project, install packages, and make network requests. The problem isn't theoretical: agents hallucinate file paths, misinterpret context, and sometimes generate code that phones home to unexpected domains.
Traditional security models break down here. You can't rely on code review when the agent writes and executes between your coffee refills. Static analysis doesn't help when the threat vector is runtime behavior. Manual oversight defeats the productivity gains that justified adopting these tools in the first place. StrongDM built Leash to solve this trust paradox: how do you gain the velocity benefits of autonomous AI agents without handing them unrestricted access to your codebase, credentials, and network? Their answer combines container isolation, kernel-level monitoring via eBPF, and declarative policy enforcement using Amazon's Cedar language.
Technical Insight
Leash's architecture revolves around separation of concerns between execution and observation. When you launch an AI agent through Leash, it spins up two containers: one for the agent itself (running Claude, Codex, or your tool of choice), and a second container running the Leash monitoring stack. The agent container gets your project directory bind-mounted and inherits API keys from your environment, creating a seemingly normal execution context. The magic happens at the kernel boundary.
The monitoring container uses eBPF (Extended Berkeley Packet Filter) probes and seccomp (secure computing mode) filters to intercept every system call the agent makes. When the AI agent tries to open a file, make a network connection, or spawn a subprocess, Leash captures that syscall before it completes. This interception is non-cooperative—the agent doesn't need special instrumentation or awareness. You get complete observability whether the agent is benign, buggy, or actively malicious.
Here's where Cedar policies provide the control layer. Cedar is Amazon's open-source policy language, designed for fine-grained authorization decisions. In Leash, you write policies that evaluate against real-time telemetry. A basic policy might look like this:
// Block network access to non-approved domains
forbid (
principal,
action == Network::Connect,
resource
) when {
!resource.domain.in(["api.anthropic.com", "api.openai.com", "github.com"])
};
// Prevent modification of CI/CD configuration
forbid (
principal,
action == File::Write,
resource
) when {
resource.path.matches(".github/workflows/*") ||
resource.path.matches(".gitlab-ci.yml")
};
// Allow read-only access to secrets directory
permit (
principal,
action == File::Read,
resource
) when {
resource.path.matches("config/secrets/*")
};
forbid (
principal,
action == File::Write,
resource
) when {
resource.path.matches("config/secrets/*")
};
These policies evaluate in real-time. When the eBPF probe intercepts a connect() syscall to malicious-domain.com, the Leash evaluator checks the Cedar policy, finds no matching permit rule, and blocks the connection before any data leaves the container. The agent receives an error as if the network were unreachable—no special exception handling required on the agent side.
The Model Context Protocol (MCP) integration adds a second telemetry layer. MCP is an emerging standard for AI agents to interact with external tools and data sources. Leash monitors MCP tool calls alongside system calls, correlating high-level agent intentions with low-level OS operations. If an agent invokes an MCP "file_search" tool and then attempts unusual filesystem traversal, that pattern mismatch triggers policy evaluation and potentially alerts in the Control UI.
The Control UI runs at localhost:18080 and surfaces three critical views: real-time telemetry showing every syscall and policy decision, audit logs for compliance reporting, and a configuration interface for managing Cedar policies and project-specific permissions. When an agent attempts a blocked operation, the UI can prompt for manual approval, creating an interactive break-glass mechanism.
Leash's configuration system uses TOML files stored per-project directory. On first run, Leash asks which directories to mount and which environment variables to forward. Those decisions persist:
[project]
path = "/Users/dev/my-app"
[mounts]
config_dirs = [".claude", ".cursor"]
[environment]
forward = ["ANTHROPIC_API_KEY", "OPENAI_API_KEY"]
[policies]
default = "policies/standard.cedar"
overrides = ["policies/project-specific.cedar"]
This approach balances convenience and security. You're not re-authorizing the same safe operations every session, but you're also not granting blanket trust. The policy layer ensures that even remembered configurations undergo runtime enforcement.
Gotcha
The container requirement is non-negotiable and platform-specific. Leash needs Docker, Podman, or OrbStack, and only supports macOS and Linux. Windows users must run WSL2, which adds complexity and potential performance penalties. If your team runs heterogeneous development environments or doesn't allow Docker for Desktop license reasons, deployment becomes fragmented.
Performance overhead is the second hard truth. Every syscall interception, Cedar policy evaluation, and telemetry logging adds latency. For agents that make thousands of file operations during a refactoring task, the cumulative delay becomes noticeable. Leash's documentation doesn't publish overhead benchmarks, but eBPF monitoring typically adds 5-15% CPU overhead and microsecond-scale latencies per syscall. For latency-sensitive agents or low-powered development machines, this matters.
Cedar policy complexity can also become a footgun. The declarative syntax is elegant for simple rules, but modeling nuanced security requirements—like "allow writes to generated code directories but only if the agent previously read the corresponding template file"—requires sophisticated policy design. Overly permissive policies negate Leash's value; overly restrictive policies create friction that developers route around. There's no policy linter or test framework in the current release, so you're validating correctness through trial and error in production.
Finally, Leash can't defend against kernel-level exploits or container escape vulnerabilities. If an agent (or malicious code it generates) finds a privilege escalation bug in the kernel or container runtime, it bypasses eBPF monitoring entirely. Leash provides defense-in-depth, not a silver bullet. You still need OS patching, least-privilege principles, and network segmentation.
Verdict
Use if: You're running AI agents in shared infrastructure, working with regulated codebases (HIPAA, SOC2, PCI), managing sensitive intellectual property, or need audit trails for compliance. Leash is essential when the blast radius of agent mistakes or compromise extends beyond your local machine—CI/CD environments, containerized dev setups, or anywhere trust boundaries matter. It's also valuable for teams establishing governance patterns before AI agents become ubiquitous in their workflow. Skip if: You're doing casual local development where agent errors can't escape your laptop, your organization forbids container runtimes, or the 5-15% performance overhead breaks your workflow. Also skip if you lack the security engineering capacity to design and maintain Cedar policies—misconfigured policies create false security. For solo developers on greenfield projects with no sensitive data, the operational complexity outweighs the risk reduction. The tool shines when consequences matter and you need enforceable guardrails, not just logging.