HTTP Garden: Finding Request Smuggling Bugs by Testing 36 Servers at Once
Hook
Nginx accepts bare LF characters in chunked encoding terminators, violating the HTTP spec. So do most other popular servers. HTTP Garden found this and dozens of other parsing inconsistencies that attackers exploit for request smuggling.
Context
HTTP request smuggling has become one of the most dangerous web vulnerabilities, earning James Kettle a $70,000 bug bounty from PayPal in 2019. The attack works because of subtle parsing differences: when a proxy and backend server disagree about where one HTTP request ends and another begins, attackers can inject requests that appear to come from other users, bypassing security controls and accessing private data.
The fundamental problem is that HTTP parsing is deceptively complex. The spec allows multiple ways to indicate message length (Content-Length, chunked Transfer-Encoding, connection close), uses ambiguous whitespace rules, and has evolved across HTTP/0.9, 1.0, 1.1, and 2.0. Every implementation makes slightly different choices about what to accept. Testing whether your specific proxy-server combination is vulnerable requires sending malformed payloads through both systems and checking if they disagree. Before HTTP Garden, this meant manually spinning up servers, writing custom scripts, and testing pairs one at a time. NARF Industries, funded by DARPA's SafeDocs program, built HTTP Garden to systematize this process: test every server against every proxy simultaneously, with reproducible builds and semantic-aware diffing.
Technical Insight
HTTP Garden's architecture centers on three components: a Docker-based target manager, a REPL for crafting payloads, and a semantic normalization engine. Each HTTP server and proxy runs in an isolated container built from a pinned source commit, ensuring reproducible results. The framework exposes every target through a standardized Python interface that accepts raw bytes and returns structured parsing results.
The power comes from the composable pipeline. You start by crafting a raw HTTP payload in the REPL, then use the transducer command to route it through one or more proxies, and finally fanout to send the (potentially modified) request to multiple backend servers. Here's a simplified example of testing chunked encoding with bare LF terminators:
# Craft a payload with LF instead of CRLF in chunk size line
payload = (
b"POST /upload HTTP/1.1\r\n"
b"Host: example.com\r\n"
b"Transfer-Encoding: chunked\r\n"
b"\r\n"
b"5\n" # Bare LF instead of \r\n
b"hello"
b"\r\n"
b"0\r\n"
b"\r\n"
)
# Send through HAProxy as transducer, then fan out to all servers
results = garden.pipeline(
payload,
transducers=['haproxy'],
targets=garden.all_servers()
)
# Check for discrepancies
garden.grid_diff(results)
The grid_diff function is where HTTP Garden shines. Rather than doing byte-for-byte comparison (which would flag every minor header reordering), it normalizes responses semantically. It extracts key parsing decisions: Did the server accept the request? What method/path did it parse? What body length? What headers were understood? Two servers that both accepted the request but parsed different body lengths represent a smuggling vulnerability.
Under the hood, each target runs a thin adapter layer that interfaces with the server's parsing internals. For C-based servers like Nginx, this means compiling with custom instrumentation that exposes internal parser state. For Python servers like Gunicorn, it imports the parsing code directly. This approach reveals not just the final response, but the intermediate parsing decisions that matter for security.
The framework includes over 1,000 pre-built payloads targeting different parts of the HTTP spec: chunk extension handling, Content-Length/Transfer-Encoding conflicts, method character restrictions, header name validation, whitespace normalization, and version string parsing. Each payload is tagged with the spec section it tests (RFC 7230 § 3.3.3 for message length, for instance), making it easy to identify which spec violations matter.
HTTP Garden also supports multi-stage pipelines for testing proxy chains. Real-world architectures often involve multiple proxies (CDN → load balancer → reverse proxy → application server). You can compose transducers to model this:
# Test a three-layer architecture
results = garden.pipeline(
smuggling_payload,
transducers=['nginx', 'haproxy', 'envoy'],
targets=['gunicorn', 'uvicorn', 'hypercorn']
)
This reveals bugs that only appear in specific combinations—like the Envoy-to-Nginx desync that affected major CDN providers.
The repository includes detailed analysis tools for discovered discrepancies. When grid_diff finds servers that disagree, it generates a comparison matrix showing exactly which parsing decisions differed, along with references to the relevant spec sections. For researchers publishing vulnerabilities, this produces camera-ready documentation of the bug.
Gotcha
HTTP Garden requires a serious Linux setup. You need x86_64 or AArch64 architecture with Docker, and enough resources to run dozens of containers simultaneously—expect to allocate at least 8GB RAM and significant CPU. The initial build process compiles 36+ servers from source, which can take 30+ minutes even on powerful machines. If you're on macOS or Windows, you're out of luck; even WSL2 has compatibility issues with some containers.
The learning curve is steep because HTTP Garden is deliberately not automated. There's no fuzzer that generates payloads for you, no mutation engine, no genetic algorithm. You manually craft each HTTP message as raw bytes, which means you need deep knowledge of the HTTP spec to know what to test. The REPL is powerful but assumes you already understand Transfer-Encoding quirks, Content-Length precedence rules, and chunk extension parsing. This is a feature for security researchers who want precise control, but it means casual users will struggle to get value from the tool. If you want automated scanning for HTTP vulnerabilities, look elsewhere—this framework is for people who already know what they're hunting for and need systematic validation across implementations.
Verdict
Use HTTP Garden if you're conducting security research on HTTP implementations, especially request smuggling or desync vulnerabilities in production proxy-server combinations. It's invaluable for specification compliance testing, discovering novel attack vectors, or validating whether your infrastructure is vulnerable to known parsing bugs. Also use it if you're implementing a new HTTP server/proxy and want to ensure your parser matches (or intentionally differs from) existing behavior. Skip it if you need automated vulnerability scanning, can't run Linux with Docker, or don't already have deep HTTP protocol knowledge. Also skip if you're testing APIs at the application layer rather than protocol layer—tools like Burp Suite or OWASP ZAP are better for that. This is a power tool for experts who need comprehensive, reproducible differential testing of HTTP parsing semantics.