Leash: eBPF-Powered Sandboxing for AI Coding Agents
Hook
Your AI coding assistant just tried to exfiltrate your AWS credentials to a foreign IP address. Would you know? Most teams running Claude Code or Codex CLI have zero runtime visibility into what their AI agents actually do with filesystem and network access.
Context
AI coding agents have evolved from autocomplete toys to autonomous developers that clone repositories, install dependencies, run tests, and deploy code. Tools like Claude Code, GitHub Copilot CLI, and Cursor give LLMs direct shell access to accomplish complex tasks. The problem? These agents operate with your full user permissions, accessing every file you can read and every network endpoint you can reach.
Traditional security models assume humans make deliberate choices about which files to open and which servers to contact. AI agents make thousands of micro-decisions per minute, often hallucinating file paths or misinterpreting instructions in ways that lead to unintended data access. Application-layer logging helps, but agents can bypass userspace monitoring. Container isolation provides some boundaries, but offers no visibility into what’s happening inside. StrongDM built Leash to solve this gap: real-time syscall monitoring with policy enforcement that works without modifying agent code.
Technical Insight
Leash’s architecture splits responsibilities across two containers connected by eBPF instrumentation. When you run leash -- claude code, it launches your target container with the agent CLI and bind-mounts your working directory. A separate Leash manager container attaches eBPF probes to the target’s kernel namespace, intercepting syscalls before they complete. Every open(), connect(), execve(), and sendto() passes through Cedar policy evaluation.
The Cedar integration is where Leash gets interesting. Instead of hardcoded security rules, you write declarative policies that look like this:
// Block any network connections outside approved domains
forbid (
principal,
action == Network::Connect,
resource
) when {
!resource.domain.endsWith(".openai.com") &&
!resource.domain.endsWith(".anthropic.com") &&
!resource.domain.endsWith(".github.com")
};
// Prevent reading SSH keys unless explicitly permitted
forbid (
principal,
action == File::Read,
resource
) when {
resource.path.contains("/.ssh/") &&
!context.approved_ssh_access
};
These policies execute in the hot path of syscall interception. When an agent tries open("/home/user/.ssh/id_rsa", O_RDONLY), the eBPF probe captures the syscall arguments, constructs a Cedar evaluation context, and either allows it through or returns -EPERM. This happens in microseconds, before the agent reads a single byte.
The eBPF probe attachment uses CO-RE (Compile Once, Run Everywhere) to work across kernel versions without recompilation. Leash hooks into the tracepoint subsystem rather than kprobes, avoiding the stability issues of raw kernel function hooking. Here’s the simplified flow:
// Leash's eBPF program attachment (conceptual)
func attachSyscallMonitor(targetPID int) error {
spec, err := loadEBPF()
objs := bpfObjects{}
// Attach to syscall entry tracepoints
tp, _ := link.Tracepoint("syscalls", "sys_enter_openat", objs.TraceSysEnterOpenat)
defer tp.Close()
// Read events from perf buffer
rd, _ := perf.NewReader(objs.Events, 4096)
for {
record, _ := rd.Read()
event := parseSyscallEvent(record.RawSample)
decision := evaluateCedarPolicy(event)
if decision == "DENY" {
sendBlockSignal(targetPID, event.SyscallID)
}
}
}
The Model Context Protocol (MCP) integration adds a second enforcement layer. MCP is the protocol AI agents use to expose “tools” like file reading, shell execution, or API calls. Leash implements an MCP server that wraps the agent’s actual tool calls, correlating high-level intentions with low-level syscalls. When Claude requests read_file("/etc/passwd") via MCP, Leash sees both the tool call and the subsequent openat() syscall, allowing policies that consider semantic context alongside system activity.
The Control UI at localhost:18080 visualizes this telemetry in real-time. You see a timeline of tool calls, syscalls, and policy decisions. When an agent attempts a blocked operation, you get the full stack: which prompt triggered the behavior, what MCP tool was invoked, which syscalls were attempted, and which Cedar rule blocked it. This audit trail is critical for tuning policies—you’ll discover agents trying to read ~/.bash_history to understand command patterns, or scanning /proc to detect system capabilities.
Leash handles credential forwarding intelligently. Common environment variables like ANTHROPIC_API_KEY and OPENAI_API_KEY automatically pass through to the target container. For others, Leash prompts you: “Agent requires GCP credentials. Mount /home/user/.config/gcloud? [y/N]”. This prevents accidental credential exposure while keeping friction low for legitimate use cases.
The experimental macOS native mode skips containers entirely, using Endpoint Security Framework for syscall interception. This reduces overhead but requires granting Leash full disk access and system extension permissions—a trade-off between performance and macOS security model compatibility.
Gotcha
The eBPF interception adds latency to every monitored syscall. In benchmarks, file-heavy operations like npm install run 15-20% slower under Leash compared to bare execution. Network calls see 5-10% overhead. For agents that perform thousands of filesystem operations per task, this compounds. The container architecture also means your agent runs in a separate network namespace—services on localhost inside the container won’t reach your host’s database or dev server without explicit port mapping.
Cedar policy authoring has a learning curve that’s steeper than expected. The documentation covers syntax well, but understanding what syscall-level events look like requires systems programming knowledge. A policy that seems correct—like blocking writes outside the project directory—might break when the agent needs to update its cache in /tmp or write logs to /var. You’ll spend time in the Control UI watching denied operations and adjusting policies. There’s no policy validation that tells you “this rule will break npm,” you discover it at runtime. The MCP correlation helps, but bridging the gap between “agent wants to install dependencies” and “syscalls required for npm install” demands security engineering expertise, not just Cedar syntax knowledge.
Verdict
Use if: You’re running AI agents in production environments, on shared development machines, or against codebases containing secrets. The audit trail alone justifies the overhead when compliance or security incidents require proving what an agent accessed. Teams building agent frameworks or offering AI coding tools as a service need Leash’s sandboxing—your users won’t trust agents without demonstrable boundaries. It’s also valuable for researching agent behavior and attack patterns, since the telemetry captures exactly how agents explore filesystems and networks. Skip if: You’re doing solo hobby projects where the worst-case scenario is an agent deleting local files you have backed up. The container overhead and Cedar policy maintenance aren’t worth it for low-stakes experimentation. Also skip if you need Windows-native support or can’t tolerate the 15-20% performance hit on I/O-heavy workloads. For those cases, rely on manual code review of agent outputs before execution, or use simpler chroot/firejail sandboxing that trades visibility for lower overhead.