Reverse-Engineering REST APIs by Watching Traffic: Inside mitmproxy2swagger
Hook
Someone built an API you need to integrate with, but there's no documentation. Your options used to be: reverse-engineer it manually, decompile their client, or give up. Now you can just watch the traffic.
Context
The undocumented API is a universal developer frustration. Mobile apps, legacy internal services, third-party integrations—countless systems expose REST APIs without OpenAPI specs, outdated documentation, or any documentation at all. Traditional approaches involve either tedious manual cURL experimentation, static analysis of client code (if you have it), or worse, guessing based on variable names.
mitmproxy2swagger takes a fundamentally different approach: passive observation. By positioning a proxy between client and server, it watches real API conversations happen, then infers the API structure from actual usage patterns. Created by Adam Łukowski (alufers), the tool transforms mitmproxy flow captures or browser HAR exports into OpenAPI 3.0 specifications. With nearly 10,000 GitHub stars, it's become the de facto solution for API archaeology—digging up the structure of APIs that were never properly documented in the first place.
Technical Insight
The tool's architecture revolves around a two-pass analysis system that balances automation with human oversight. Unlike tools that attempt full automation and produce noisy specs, mitmproxy2swagger requires you to explicitly choose which endpoints matter.
The first pass performs URL pattern extraction. It analyzes all captured requests, identifies variable path segments (like IDs or slugs), and generates path templates. Here's what that looks like:
# Capture traffic with mitmproxy first
mitmproxy -w flows.mitm
# First pass: extract patterns
mitmproxy2swagger -i flows.mitm -o api_spec.yml -p https://api.example.com -f flow
This produces an OpenAPI file with all discovered paths prefixed with ignore:. For example, if your traffic included requests to /users/123, /users/456, and /posts/789, you'd see:
paths:
/ignore:users/{param0}:
get:
# ...
/ignore:posts/{param0}:
get:
# ...
The ignore: prefix is the key insight here. Without it, you'd get hundreds of endpoints for every SaaS product, many of them internal analytics pings or ad requests you don't care about. The tool forces you to manually edit the YAML, removing ignore: from paths you want documented and renaming generic {param0} to semantic names like {userId}. This human-in-the-loop design prevents garbage-in-garbage-out scenarios.
The second pass performs schema inference on your curated endpoint list:
# After manually editing api_spec.yml to remove ignore: prefixes
mitmproxy2swagger -i flows.mitm -o api_spec.yml -p https://api.example.com -f flow --examples
Now the tool analyzes request and response bodies for the non-ignored paths, building JSON schemas through pattern recognition. If it sees {"user_id": 123, "name": "Alice"} across multiple requests, it infers integer and string types. The --examples flag preserves actual request/response bodies as OpenAPI examples, while --headers includes header documentation.
The schema inference algorithm operates conservatively. When it encounters conflicting types for the same field across different requests—say age is sometimes a string "25" and sometimes a number 25—it tends toward more permissive schemas or marks fields as not required. This reflects reality: many APIs have inconsistent implementations.
One elegant detail is the incremental merge capability. You can run multiple capture sessions and merge them:
# First session: user login flows
mitmproxy2swagger -i login_flows.mitm -o api_spec.yml -p https://api.example.com -f flow
# Second session: checkout flows (merges with existing spec)
mitmproxy2swagger -i checkout_flows.mitm -o api_spec.yml -p https://api.example.com -f flow
The tool safely merges path definitions, expanding schemas when new fields appear in subsequent captures. This matches how reverse engineering actually works: you rarely capture everything in one session. You might capture login flows on Monday, payment flows on Wednesday, and admin endpoints on Friday. Each session adds to your understanding without destroying previous discoveries.
For browser-based APIs, the tool accepts HAR (HTTP Archive) format, which every modern browser can export from DevTools. This makes it accessible even if you can't set up mitmproxy's certificate:
# Export HAR from Chrome DevTools Network tab, then:
mitmproxy2swagger -i traffic.har -o api_spec.yml -p https://api.example.com -f har
The pattern matching logic handles common REST conventions intelligently. UUIDs, integer IDs, and slug-style identifiers are recognized and parameterized. Versioned APIs with /v1/ or /api/v2/ prefixes are properly templated. Query parameters are extracted into OpenAPI parameter definitions with inferred types based on observed values.
Gotcha
The quality of your generated spec is fundamentally limited by your traffic coverage. If you never captured a request that includes an optional metadata field, that field won't appear in your schema. If you only tested the happy path, you'll miss error response schemas. This isn't a tool limitation—it's an inherent constraint of passive observation. You get documentation of what happened, not what could happen.
The --examples and --headers flags are dangerous in the wrong hands. They preserve actual request/response data, which means authentication tokens, API keys, session cookies, email addresses, and personal information will end up in your OpenAPI spec. If you're generating documentation to share with a team or publish publicly, you must sanitize the output. There's no automatic PII detection or redaction. I've seen developers accidentally commit specs with production JWT tokens in the examples section, then publish them to public repositories. The tool gives you rope; whether you hang yourself is up to you.
The two-pass workflow, while thoughtfully designed, creates friction. You can't pipe this into a fully automated pipeline. Between pass one and pass two, a human must edit YAML, understand which paths matter, and rename generic parameter names to semantic ones. For one-off reverse engineering, this is fine. For continuous API discovery in a testing environment, it's tedious. You'll want to script the ignore prefix removal for known path patterns, but the tool doesn't provide hooks for this—you're editing YAML by hand or writing your own automation layer.
Verdict
Use if: you're reverse-engineering a third-party API without documentation, especially mobile app backends or internal enterprise services where you can't access source code. It's perfect when you can run representative traffic through the proxy—QA testing sessions, automated E2E tests, or manual feature exploration. The incremental merge capability makes it ideal for long-term API discovery projects where you gradually expand coverage. Skip if: you have source code access (use code annotation-based generators instead for better accuracy), you need perfect schema coverage on the first attempt (this requires iteration), or you're working with highly dynamic APIs where paths contain truly variable components rather than identifiers. Also skip if you can't carefully review output for sensitive data leakage—running this blindly in CI/CD and publishing the results is a security incident waiting to happen.