IronCurtain: Compiling English Security Policies Into Deterministic Agent Guardrails
Hook
What if instead of hoping your AI agent won’t rm -rf /, you wrote security policies in plain English that get compiled into deterministic enforcement rules—no LLM in the runtime loop?
Context
AI agents are graduating from chat interfaces to autonomous executors with real system access. GitHub Copilot Workspace pushes commits. Devin writes production code. Claude Code runs terminal commands. The paradigm shift is exhilarating and terrifying in equal measure.
The security model breaks down immediately. Traditional sandboxing operates at syscall boundaries—you can block unlink() but can’t distinguish between deleting a test fixture and wiping production configs. The agent sees “file_delete” as a semantic operation; the OS sees raw syscalls. This impedance mismatch means you’re either running agents with ambient authority (hope and pray) or crippling them with coarse-grained restrictions that break legitimate workflows. IronCurtain attacks this problem by implementing semantic interposition: intercepting high-level tool calls where context still exists, then enforcing policies derived from natural language constitutions that humans can actually reason about.
Technical Insight
IronCurtain’s architecture hinges on two controversial decisions: treating the LLM as fundamentally untrustworthy, and using another LLM to compile security policies. The first means all enforcement happens at architectural boundaries—V8 isolates for code execution, TLS-MITM proxies for Docker containers, MCP interception for tool invocations. The second leverages LLMs where they excel (understanding human intent) while keeping them out of the hot path.
The constitution-to-policy pipeline is the intellectual centerpiece. You write security rules in plain English—“Allow git operations only on repositories under /workspace” or “Block API calls to external services except for approved domains.” The system uses an LLM to generate test scenarios covering edge cases, then compiles your constitution into deterministic policies that must pass all tests. Here’s what a simple constitution looks like:
const constitution = `
# File System Policy
- Allow read operations anywhere
- Allow write operations only under /workspace
- Require human approval for any delete operations
- Block all access to /etc, /sys, /proc
# Network Policy
- Allow HTTPS requests to api.github.com
- Block all other outbound connections
- Require approval for requests with API keys in headers
`;
const policy = await compileConstitution(constitution, {
validateScenarios: true,
strictMode: true
});
At runtime, every tool invocation flows through MCP (Model Context Protocol) servers. When the agent calls git_clone or file_write, IronCurtain intercepts at the MCP layer—after the agent has expressed intent semantically but before execution. The policy engine evaluates against compiled rules without touching an LLM. An approval decision takes microseconds; an escalation to a human happens over a separate PTY channel.
The Docker Agent Mode solves a thornier problem: how do you secure agents you don’t control? Claude Code expects to run in its native environment with full terminal access. IronCurtain wraps it in a container, routes all API calls through a TLS-MITM proxy, and presents a PTY interface that feels completely native to the agent while enforcing policies transparently:
await ironCurtain.runDockerAgent({
image: 'claude-code:latest',
constitution: policyRules,
ptyMode: true,
escalationListener: 'unix:///tmp/escalations.sock',
networkPolicy: {
allowedDomains: ['api.anthropic.com'],
interceptAll: true
}
});
The escalation listener deserves special attention. When the policy engine can’t make a deterministic decision, it doesn’t block or log—it sends a structured request to a separate process where a human reviews the context and approves/denies with rationale. That rationale feeds back into constitution refinement. It’s a closed-loop system: policies get smarter as humans correct edge cases.
Code Mode takes a different approach for agents that generate TypeScript. Instead of containers, it uses V8 isolates—lightweight execution contexts with zero system access. The agent’s generated code runs in an isolate where all I/O happens through MCP calls:
const isolate = await v8.Isolate.createSnapshot({
prelude: mcpClientShim,
maxMemory: 128 * 1024 * 1024
});
const result = await isolate.run(agentGeneratedCode, {
timeout: 30000,
inspector: debugMode
});
The MCP client shim provides APIs like fs.readFile and git.commit, but they’re RPC stubs. Every call crosses the isolate boundary, hits the policy engine, and either executes in the parent process (approved), escalates to a human, or rejects immediately. The agent can’t distinguish this from native APIs—it’s architectural security through capability-based design.
What makes this semantic interposition powerful is context preservation. When an agent calls file_write('/workspace/config.yml', newContent), the policy engine sees the full path, the content, and can reason about intent. A syscall-level sandbox sees open(), write(), close()—bytes without meaning. IronCurtain policies can express rules like “allow YAML writes if they don’t change the credentials key” because they operate where semantics still exist.
Gotcha
The research prototype warning isn’t pro forma—this is legitimately unstable software. APIs change between commits, and the Node.js version constraint (22-25 only) stems from isolated-vm’s V8 compatibility requirements. You will hit dependency conflicts in real projects.
More fundamentally, security rests on two trust assumptions that may not hold. First, MCP server implementations must be correct. If your filesystem MCP server has a path traversal bug, IronCurtain’s policies are worthless—garbage in, garbage out. Second, the constitution compilation pipeline must produce faithful policies. An LLM misinterpreting “allow git operations in /workspace” as “allow all git operations” creates a silent bypass. The test scenario validation helps, but it’s not formal verification. You’re replacing “trust the agent LLM” with “trust the policy compilation LLM and trust your MCP servers”—better, but not bulletproof. Performance overhead is real too: V8 isolate context switches and TLS-MITM proxying add latency that compounds with chatty agents.
Verdict
Use IronCurtain if you’re building autonomous agent workflows where giving the agent real capabilities (filesystem writes, git commits, API calls) is non-negotiable, but ambient authority makes you nauseous. It’s ideal for research teams exploring safe agent architectures, or for developer tooling where humans are in the loop and can handle escalations. The constitution-to-policy abstraction is genuinely novel—if you can articulate security rules in English, you can enforce them deterministically. Skip it if you need production stability today, can’t stomach the Node.js version lock-in, or if your security model requires formal verification rather than LLM-derived policies. Also skip if your agents need bare-metal performance or you’re allergic to running separate escalation listeners. This is a telescope pointed at the future of agent security, not a shipping product.