Cagent: A Pragmatic Sandbox for Keeping AI Agents on a Leash
Hook
Your AI coding agent just tried to exfiltrate your AWS credentials. Again. While the industry debates theoretical alignment problems, developers shipping agent-based tools face a more immediate crisis: how do you let an LLM execute arbitrary commands without watching it fork-bomb your CI pipeline or POST your .env file to a random API?
Context
The explosion of autonomous AI agents has created an awkward security problem that traditional tooling wasn’t built to solve. These agents need real filesystem access to be useful—they refactor code, run tests, analyze repositories—but they’re fundamentally unpredictable. Unlike human developers who understand organizational boundaries and have reputational skin in the game, agents follow token probabilities that occasionally lead to spectacularly bad decisions. You can’t just run them in your production environment, but heavyweight security solutions designed for multi-tenant clouds feel like overkill for what’s essentially ‘I want Claude to refactor my Python project without accidentally committing secrets or spinning up cryptocurrency miners.’
Cagent emerged from this pragmatic middle ground: a shell-script-orchestrated Docker sandbox that implements just enough isolation to sleep at night while keeping complexity low enough that you can audit the entire security boundary in an afternoon. It’s not trying to be a general-purpose container orchestration platform or a zero-trust architecture. Instead, it solves the specific problem of ‘I have an AI agent that needs to explore a codebase, and I need filesystem + network controls that work out of the box.’ Think of it as the security equivalent of a dog run: a defined perimeter where your agent can move freely without wandering into traffic.
Technical Insight
Cagent’s architecture is deceptively simple, which is precisely its strength. At its core, the run.sh script orchestrates three security layers: Docker containerization for process isolation, selective filesystem mounting with read-only enforcement, and iptables-based network filtering. The brilliance is in how these primitives compose without requiring complex configuration management or learning a new DSL.
The filesystem strategy uses Docker volume mounts with a twist borrowed from version control systems. Instead of mounting your entire workspace as read-write (dangerous) or fully read-only (useless), Cagent implements an .agentignore pattern matching system. Your sensitive files stay protected while the agent gets a writable scratch space. Here’s how you’d configure it:
# .agentignore - uses gitignore syntax
.env*
*.key
*.pem
secrets/
node_modules/ # Large dirs you don't want the agent thrashing
# The mount logic in run.sh does something like:
docker run \
-v "$WORKSPACE:/workspace:ro" \
-v "$WORKSPACE/.agent-scratch:/workspace/.agent-scratch:rw" \
--mount type=tmpfs,destination=/tmp \
cagent:latest
This means your agent can read most of your codebase, write to a designated scratch area, but can’t modify protected files or directories. The read-only mount serves as a forcing function: if the agent needs to modify code, it must explicitly write to the scratch space, making potentially destructive operations visible.
The network isolation layer is where things get interesting. Rather than a blanket network lockdown (which breaks package installers, API calls to LLM providers, and other legitimate use cases), Cagent uses iptables to implement allowlist-based egress filtering:
# Simplified version of the iptables rules
iptables -P OUTPUT DROP # Default deny outbound
iptables -A OUTPUT -d api.anthropic.com -j ACCEPT
iptables -A OUTPUT -d registry.npmjs.org -j ACCEPT
iptables -A OUTPUT -d pypi.org -j ACCEPT
iptables -A OUTPUT -o lo -j ACCEPT # Allow localhost
This is crude but effective: the agent can talk to its LLM backend and common package repositories, but can’t exfiltrate data to arbitrary endpoints. The limitation is clear—you need to know your allowlist domains upfront—but for most development workflows, this is a reasonable tradeoff. An agent that needs to call your internal APIs would require adding those domains, which is actually a feature: it forces you to consciously expand the trust boundary.
The Docker layer provides the foundational isolation: separate PID namespace (the agent can’t see or kill host processes), separate network namespace (the iptables rules can’t affect your host), and resource limits via cgroups. Cagent doesn’t reinvent these wheels; it just wires them together with sensible defaults:
docker run \
--cpus="2.0" \
--memory="4g" \
--pids-limit=100 \
--security-opt=no-new-privileges \
--cap-drop=ALL \
--cap-add=NET_ADMIN # Only for iptables setup
--read-only \
cagent:latest
The --read-only flag makes the container filesystem immutable except for explicitly mounted volumes, while capability dropping removes dangerous privileges. The agent gets NET_ADMIN briefly during setup to configure its own iptables rules, then that capability could theoretically be dropped post-setup with a more sophisticated entrypoint script.
What makes this architecture audit-friendly is its lack of magic. There’s no custom kernel module, no proprietary runtime, no complex policy language. It’s Docker commands and shell scripts that a competent sysadmin can review. The security boundary is exactly as strong as Docker’s isolation plus your iptables rules—no better, no worse. For many use cases, that’s perfectly adequate.
Gotcha
The biggest gotcha is that Cagent is explicitly experimental, and the repository acknowledges significant hardening gaps. Docker container escapes are a real threat class—CVEs like the runc vulnerability (CVE-2019-5736) showed that even mature container runtimes can have privilege escalation bugs. If your threat model includes ‘the agent is actively trying to escape and compromise the host system,’ Docker alone isn’t sufficient. You’d need something like gVisor or Kata Containers that use virtualization-based isolation, though that comes with significant performance overhead.
The network filtering is also brittle in practice. Domain allowlists break when services move infrastructure, use dynamic CDNs, or require third-party integrations you didn’t anticipate. An agent trying to install a Python package might fail because PyPI uses a different S3 bucket this week. And if you’re too permissive with the allowlist (‘just allow all of AWS’), you’ve effectively neutered the protection. The current implementation also lacks egress logging, so you won’t know if the agent is attempting blocked connections unless you instrument iptables yourself. For any serious deployment, you’d want to fork Cagent and add connection attempt logging to at least see what the agent is trying to access. Finally, the .agentignore pattern matching happens at mount time, not runtime, so there’s no protection against the agent reading a sensitive file that wasn’t matched by your patterns—the old ‘forgot to ignore .env.production’ problem that has bitten everyone at least once.
Verdict
Use if: You’re prototyping AI agent workflows on local machines or in CI environments where the worst-case scenario is ‘contained nuisance’ rather than ‘production data breach.’ Cagent shines when you need something working today, understand its limitations, and can treat it as a starting template rather than a finished product. It’s particularly valuable for small teams building agent tools who need practical sandboxing without dedicating sprints to security infrastructure. Skip if: You’re deploying agents in production environments with access to customer data, need audit trails and compliance documentation, or have threat models that include motivated attackers. Also skip if you need Windows support (this is Linux-specific Docker orchestration) or require sub-second agent startup times (the container bootstrap overhead is noticeable). For those scenarios, invest in proper solutions like gVisor-backed runtimes or commercial agent platforms with security SLAs. Cagent is a great 80/20 solution—it gives you most of the safety with minimal complexity, but that last 20% matters immensely in high-stakes environments.