IronCurtain: Treating AI Agents Like Untrusted Binaries
Hook
Most AI agent frameworks ask the model nicely to follow rules. IronCurtain assumes your agent is hostile malware and enforces policy at the kernel boundary—except the 'syscalls' are git commits and API requests.
Context
The rise of autonomous AI agents—systems that can read files, execute code, commit to repositories, and call APIs without human intervention—has created a glaring security gap. Traditional sandboxing operates at the syscall level (think seccomp, Docker capabilities), which works for untrusted binaries but becomes absurdly restrictive for agents that need legitimate filesystem access. You can't just block all file writes when the agent's job is to refactor your codebase.
Meanwhile, existing agent frameworks like LangChain or OpenAI Assistants rely on prompt-based guardrails or manual approval flows. The former is trivially bypassed through prompt injection. The latter doesn't scale—nobody wants to click 'approve' fifty times while an agent debugs a test suite. IronCurtain emerged from this tension: how do you let an agent perform real operations while maintaining cryptographic-grade certainty that it won't exfiltrate your AWS keys or rm -rf your home directory? The answer is semantic interposition—enforcing policy at the level of high-level operations ("commit to git," "read secrets") rather than low-level syscalls ("open file descriptor").
Technical Insight
IronCurtain's architecture treats the AI agent as fundamentally untrusted, with security enforced through three layers: execution isolation, protocol interposition, and policy compilation. Unlike frameworks that trust the model to follow instructions, IronCurtain assumes adversarial behavior and enforces boundaries mechanically.
For its builtin agent mode, IronCurtain runs LLM-generated TypeScript inside a V8 isolate—the same sandboxing primitive that powers Chrome's site isolation. The agent code has zero access to Node.js APIs, the filesystem, or network sockets. Every operation must go through Model Context Protocol (MCP) tool calls. Here's what agent code looks like:
// This runs in a V8 isolate with no host access
const result = await mcpClient.callTool({
server: "filesystem",
tool: "read_file",
arguments: { path: "./config.json" }
});
// Trying to bypass via Node.js fails—no globals available
// require('fs').readFileSync(...) -> ReferenceError
// process.env.API_KEY -> ReferenceError
The genius is forcing everything through MCP, which wasn't designed as a security protocol but happens to create perfect interposition points. Every tool call—whether reading files, running git commands, or sending messages—gets routed through IronCurtain's policy engine before reaching the actual MCP server implementation.
The policy engine itself uses LLMs to compile plain-English "constitutions" into deterministic rules. You write policies like "Allow git commits to the docs/ directory but require approval for main branch merges," and IronCurtain's compilation pipeline turns this into executable TypeScript with auto-generated test cases. The system validates that the compiled policy actually matches your intent before deploying it:
// Example compiled policy output
export function evaluateGitCommit(params: {
branch: string,
path: string,
message: string
}): PolicyDecision {
if (params.path.startsWith('docs/')) {
return { action: 'allow', reason: 'Documentation changes permitted' };
}
if (params.branch === 'main') {
return { action: 'escalate', reason: 'Main branch requires review' };
}
return { action: 'allow' };
}
For external agents like Claude Code or Goose that can't run in isolates, IronCurtain switches to Docker mode. It launches the agent in a network-isolated container, then runs a TLS man-in-the-middle proxy that intercepts all HTTPS traffic. When the agent tries to call the Anthropic or OpenAI API, the proxy swaps placeholder API keys for real ones and enforces host allowlists—the agent literally cannot make network requests to unauthorized domains. Simultaneously, MCP tool calls get intercepted through a Unix socket mount, routing them through the same policy engine as builtin agents.
The most clever piece is trusted user input in multiplexed mode. IronCurtain captures your terminal input before the containerized agent sees it, marking specific keystrokes as "verified human intent." Policies can then auto-approve operations that match user commands: if you type "fix the bug in auth.ts" and the agent wants to edit auth.ts, that's permitted without approval prompts. But if the agent spontaneously tries to edit ~/.ssh/config, policy blocks it. This solves the approval fatigue problem while maintaining security—the agent can't social-engineer approvals because policy verification happens outside its control.
The defense-in-depth approach means even if an agent somehow escapes the V8 isolate (via a VM bug) or bypasses the proxy (through a protocol flaw), the policy engine still evaluates every MCP call. And if the policy engine itself has a bug, the underlying isolation layers provide backstop protection. Security doesn't depend on any single mechanism being perfect.
Gotcha
IronCurtain labels itself a research prototype, and that warning deserves attention. The entire policy compilation pipeline—where LLMs turn English into executable security rules—is unproven territory. You're trusting an AI to correctly translate "don't delete important files" into TypeScript that actually prevents deletion. The system includes test generation and validation, but there's no formal verification. A subtle policy bug could greenlight destructive operations.
The Node.js version constraint (22-25 only) is a deployment headache. It stems from isolated-vm's native dependency requirements, which break on older or newer runtimes. If your infrastructure standardizes on Node 20 LTS or you need to upgrade to Node 26, you're stuck. The MCP server ecosystem is also immature—many tool implementations haven't been audited for security boundaries, so a buggy filesystem MCP server could leak data even if IronCurtain's policy is perfect. You're only as secure as the weakest MCP wrapper.
Finally, the TLS MITM approach for Docker mode requires trusting IronCurtain with your API keys and certificate authority. If the proxy has a vulnerability or logs sensitive data, you've created a new attack surface. The architecture is sound, but the implementation maturity isn't there yet for production secrets.
Verdict
Use IronCurtain if you're building experimental AI agents that need real-world tool access (git, filesystem, APIs) and you want programmatic safety rails beyond crossing your fingers and hoping the model behaves. It's perfect for research teams exploring agentic workflows, security engineers prototyping policy frameworks for AI systems, or developers who understand early-stage risks but need more than manual approval theater. The plain-English policy language and semantic interposition through MCP are genuinely novel approaches to a hard problem. Skip if you need production-grade stability, can't tolerate Node version constraints, or require formally verified security guarantees. The 'LLM-compiled constitution' concept is unproven at scale, the MCP ecosystem is immature, and the research prototype label means breaking changes are expected. Also skip if your threat model includes state-level adversaries—this is for preventing accidental harm and basic agent misbehavior, not APT defense.