Back to Articles

Reverse-Engineering REST APIs from Traffic Captures with mitmproxy2swagger

[ View on GitHub ]

Reverse-Engineering REST APIs from Traffic Captures with mitmproxy2swagger

Hook

What if you could document an entire REST API by simply using an application for a few minutes? No source code access, no manual postman collections—just passive traffic capture turned into production-ready OpenAPI specs.

Context

Reverse-engineering REST APIs is a common developer task. You might be integrating with a third-party service that has poor documentation, working with a legacy internal API that predates the team’s institutional knowledge, or auditing what data a mobile app is actually sending to its servers. Traditional approaches involve painstakingly recording requests in tools like Postman, manually writing curl commands, or reading through minified JavaScript to guess at endpoint structures.

The mitmproxy2swagger tool, with over 9,000 GitHub stars, automates this tedious process by converting captured HTTP traffic into OpenAPI 3.0 specifications. It sits at the intersection of two mature technologies: mitmproxy (a powerful HTTPS proxy for intercepting traffic) and OpenAPI/Swagger (the de facto standard for REST API documentation). By bridging these ecosystems, it transforms passive observation into active documentation, turning hours of manual API exploration into a semi-automated workflow that produces shareable, machine-readable specifications.

Technical Insight

Human Decision

Automated Analysis

mitmproxy Flow File

or HAR Export

First Pass:

Endpoint Discovery

x-path-templates.yaml

with ignore: prefixes

Human Curation:

Remove ignore: & Parameterize

Curated Templates

Second Pass:

Schema Extraction

Pattern Matching &

Data Aggregation

OpenAPI 3.0

Specification

System architecture — auto-generated

The architecture of mitmproxy2swagger is deceptively simple but brilliantly pragmatic: a two-pass processing model that acknowledges the fundamental challenge of API reverse-engineering—machines can capture data, but humans understand intent.

In the first pass, the tool analyzes your mitmproxy flow file (or HAR export from browser DevTools) and performs endpoint discovery. It examines every HTTP request matching your specified API prefix and extracts URL patterns. But here’s the critical design decision: instead of automatically deciding which URL segments are path parameters versus literal path components, it generates a conservative template with an ignore: prefix:

x-path-templates:
  # Remove the ignore: prefix to generate an endpoint with its URL
  # Lines that are closer to the top take precedence, the matching is greedy
  - ignore:/users/123/profile
  - ignore:/users/456/profile
  - ignore:/users/{id}/settings
  - ignore:/products/search

This forces human curation. You edit the YAML file, removing ignore: from paths you want documented and parameterizing the URLs appropriately. This might seem like a limitation, but it’s actually the tool’s greatest strength. Without this step, the tool would have to guess whether /users/123 should be /users/{id} or literally /users/123—a decision that requires semantic understanding of your API’s design patterns.

The second pass is where the magic happens. Running the same command again with your curated path templates, mitmproxy2swagger now knows how to classify requests. It implements greedy top-down matching: templates listed first take precedence. This means you can specify /users/{id}/profile/settings before /users/{id}/profile, and the tool will correctly route requests to the more specific endpoint first.

# First pass - discovery
$ mitmproxy2swagger -i flows.mitm -o api-spec.yaml -p https://api.example.com/v1

# Edit api-spec.yaml to curate paths

# Second pass - schema generation
$ mitmproxy2swagger -i flows.mitm -o api-spec.yaml -p https://api.example.com/v1 --examples

During this second pass, the tool aggregates request and response data for each endpoint. It examines JSON payloads to infer schemas, collecting all observed fields and their types. Multiple requests to the same endpoint are merged, so if one request includes {"name": "Alice"} and another includes {"name": "Bob", "age": 30}, the final schema recognizes both name and age as valid fields.

The --examples and --headers flags control whether actual captured data appears in the generated spec. This is a crucial security consideration—the README explicitly warns that enabling these flags may leak tokens, passwords, or personal information into your documentation. The tool prioritizes safe defaults while giving you the option to include this data when you’re certain it’s appropriate (like when documenting a development environment).

Support for HAR files is particularly clever. Browser DevTools are ubiquitous and don’t require any proxy configuration—developers can just open the Network tab, use an application, and export. By accepting both mitmproxy flows and HAR exports, the tool lowers the barrier to entry significantly. For simple use cases, you never need to configure a proxy at all.

The safe merging capability enables incremental discovery. You can capture traffic from different user workflows, different authentication states, or different application features, running mitmproxy2swagger multiple times against the same output file. Each run extends the schema without destroying existing endpoint documentation. This mirrors how real-world API exploration actually works—you don’t learn everything about an API in one session.

Gotcha

The two-pass architecture is both mitmproxy2swagger’s greatest strength and its most obvious limitation. You cannot fully automate endpoint discovery. If you’re hoping to point this at a week’s worth of production traffic logs and wake up to a complete API specification, you’ll be disappointed. The manual curation step is mandatory—you must edit the YAML file between passes to indicate which endpoints should be documented and how path parameters should be structured.

More fundamentally, the accuracy of your generated OpenAPI spec is bounded by your traffic coverage. If your captured traffic never includes error responses, those won’t be documented. If optional query parameters are never used in your sample requests, they won’t appear in the spec. If certain endpoints are only called under specific conditions you didn’t trigger, they’ll be invisible. This is inherent to black-box reverse-engineering, but it means the tool works best when you can systematically exercise an application’s features rather than passively capturing opportunistic traffic. You’re not documenting the API as it exists—you’re documenting the API as you observed it. Edge cases, admin-only endpoints, and deprecated-but-functional features may remain hidden unless you specifically trigger them during capture.

Verdict

Use mitmproxy2swagger if you’re reverse-engineering third-party APIs without documentation, auditing what a mobile app or legacy system is actually doing, or documenting internal APIs where the source code is inaccessible or too complex to annotate. The two-pass workflow is perfect when you need production-ready OpenAPI specs and are willing to invest 10-15 minutes of manual curation for accurate results. It’s especially valuable for security researchers, integration developers, and teams inheriting undocumented systems. Skip it if you control the source code (use annotation-based tools like tsoa or FastAPI’s built-in OpenAPI generation instead), need fully automated documentation without any manual intervention, or are working with non-REST protocols like GraphQL or gRPC. Also skip it if your traffic is highly dynamic or user-specific to the point where captured examples would be meaningless or dangerous to include in documentation.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/alufers-mitmproxy2swagger.svg)](https://starlog.is/api/badge-click/developer-tools/alufers-mitmproxy2swagger)