Back to Articles

httpx: The Swiss Army Knife of HTTP Reconnaissance That Actually Scales

[ View on GitHub ]

httpx: The Swiss Army Knife of HTTP Reconnaissance That Actually Scales

Hook

While most developers run curl in a bash loop and call it automation, security researchers are probing thousands of hosts per minute with httpx—extracting TLS certificates, computing favicon hashes, and detecting WAFs in a single pass.

Context

Traditional HTTP reconnaissance has always been a duct-tape affair. You'd chain together curl, grep, awk, and maybe some Python scripts to probe web services at scale. Need to check if 10,000 subdomains are alive? Write a bash loop with curl and pray your script handles timeouts gracefully. Want TLS certificate details? Add openssl s_client to the mix. Interested in response hashes for change detection? More scripting. The toolchain was fragmented, slow, and brittle.

The security research community—bug bounty hunters, penetration testers, red teamers—felt this pain acutely. Their workflows demanded bulk HTTP probing with rich metadata extraction: not just status codes, but technology fingerprints, CDN detection, header analysis, and content hashing. They needed pipeline-ready output (JSON, not human-readable text) and robust error handling for the hostile environments where web application firewalls actively try to break your scanners. httpx emerged from ProjectDiscovery to solve this specific problem: a purpose-built HTTP toolkit that treats reconnaissance as a first-class engineering challenge rather than an afterthought.

Technical Insight

Optional

Normalized targets

Controlled rate

HTTP/HTTPS requests

Response data

Retry if needed

Success

Title/Hash/Content

Input Sources

hosts/URLs/CIDR

Input Parser &

Target Expander

Work Queue

Rate Limited

Concurrent Workers

retryablehttp

Probe Modules

Status/Headers/TLS/Tech

WAF Detection &

Retry Logic

Structured Output

JSON/CSV/stdout

Headless Browser

Screenshots/JS

System architecture — auto-generated

httpx's architecture is deceptively straightforward: it's a Go binary that reads inputs (hosts, URLs, CIDR ranges), performs concurrent HTTP probes using the retryablehttp library, and writes structured output. But the devil is in the details—specifically, how it handles concurrency, failure modes, and probe modularity.

The core design leverages Go's goroutines for concurrency with intelligent rate limiting. Unlike naive implementations that spawn unlimited workers and overwhelm target infrastructure (or your own network), httpx implements configurable concurrency controls. Here's a typical invocation that demonstrates its pipeline integration:

# Read subdomains from stdin, probe with 50 concurrent threads,
# extract title, status code, and content length
cat subdomains.txt | httpx -threads 50 -title -status-code -content-length -json -o results.json

This command showcases httpx's stdin-to-stdout philosophy—it's designed to sit in the middle of Unix pipelines. The -json flag outputs structured data that downstream tools can consume without regex parsing:

{
  "timestamp": "2024-01-15T10:30:45Z",
  "url": "https://api.example.com",
  "input": "api.example.com",
  "title": "Example API Gateway",
  "status_code": 200,
  "content_length": 1847,
  "technologies": ["Nginx:1.21.0", "Express"],
  "tls": {
    "subject_cn": "*.example.com",
    "issuer_org": "Let's Encrypt",
    "not_after": "2024-04-15T00:00:00Z"
  }
}

The modular probe system is where httpx truly shines. Each flag (-title, -tech-detect, -favicon, -jarm) activates a specific probe module. Under the hood, httpx makes a single HTTP request and runs all enabled probes against the response, minimizing network traffic. This is crucial when probing thousands of hosts—you can't afford multiple round-trips per target.

The retryablehttp library provides the reliability foundation. It implements exponential backoff, automatic retries, and connection pooling. When a request fails due to transient network issues, httpx doesn't just give up—it retries with intelligent delays. For environments with WAFs that rate-limit aggressive scanners, this retry logic with backoff is the difference between completing a scan and getting blocked.

One particularly clever feature is JARM fingerprinting integration. JARM is a TLS fingerprinting technique that sends malformed ClientHello packets to identify server implementations. This lets you cluster web services by their underlying technology even when they don't expose identifying headers:

# Identify all servers with the same JARM fingerprint (likely same tech stack)
httpx -l hosts.txt -jarm -json | jq -r '.jarm' | sort | uniq -c | sort -rn

The headless browser integration (via Chrome DevTools Protocol) bridges traditional HTTP probing and modern JavaScript-heavy applications. Many SPAs return empty HTML shells to standard HTTP clients, rendering traditional probing useless. httpx can spawn a headless Chrome instance, execute JavaScript, and capture the rendered DOM:

# Screenshot all responsive hosts with JavaScript rendering
cat hosts.txt | httpx -screenshot -system-chrome -json

This creates screenshot files and includes metadata about JavaScript execution time and resource loading. The tradeoff is resource consumption—spinning up browser instances is expensive—but for reconnaissance where you need to see what the application actually renders, it's invaluable.

The input flexibility deserves mention. Beyond simple host lists, httpx accepts CIDR ranges (10.0.0.0/24), ASN numbers, and even Burp Suite HTTP request files. This last feature is particularly useful for pentesting: capture a request in Burp's proxy, save it, and httpx can replay it with modifications:

# Replay a Burp request against multiple hosts
httpx -request burp-request.txt -l target-hosts.txt

This transforms httpx from a simple prober into a flexible HTTP client that respects complex request structures (custom headers, authentication, POST bodies) while maintaining its bulk-processing capabilities.

Gotcha

The documentation prominently warns that httpx should not be run as a service, and this isn't paranoia—it's a legitimate security concern. The tool accepts arbitrary URLs and makes network requests to them. Exposing this as a web service without extensive input validation and sandboxing is an open SSRF (Server-Side Request Forgery) vulnerability. You could inadvertently create a tool that lets attackers probe your internal network or pivot through your infrastructure. httpx is designed as a CLI tool for interactive or scripted use by trusted operators, not as a daemon accepting untrusted input.

The breaking changes between versions can be jarring. ProjectDiscovery actively develops httpx, and new releases sometimes change flag names, output formats, or default behaviors. If you're building automation around httpx, pin your version and review changelogs carefully before upgrading. A script that works with v1.2.5 might break subtly with v1.3.0 because a JSON field was renamed or a probe module was refactored. This is common in fast-moving security tools but can be frustrating in production environments.

Resource consumption with headless features enabled is non-trivial. Enabling screenshots or JavaScript rendering means spawning Chrome instances, which each consume several hundred megabytes of RAM. Probing 10,000 hosts with screenshots enabled isn't a matter of running httpx -screenshot and walking away—you need to carefully tune concurrency, potentially batch your inputs, and monitor system resources. The tool gives you power but expects you to wield it responsibly.

Verdict

Use if: You're doing security reconnaissance, asset discovery, or monitoring at scale and need rich metadata beyond basic HTTP responses. httpx excels in bug bounty workflows, penetration testing enumeration phases, and security monitoring pipelines where you need to track changes across large web estates. It's ideal when you're already working in the command line and value pipeline integration over GUI convenience. If you need to extract TLS details, detect technologies, compute content hashes, or fingerprint servers across hundreds or thousands of targets, httpx is purpose-built for this. Skip if: You need a stable API for programmatic integration in production services—the security warnings about running as a service are real. Also skip if you're looking for a GUI tool or need enterprise support contracts. For simple one-off HTTP requests, curl is still simpler. If you're primarily doing vulnerability scanning rather than reconnaissance, jump straight to Nuclei (which can consume httpx output). Finally, if you can't accept breaking changes between versions or need LTS-style stability guarantees, httpx's rapid development cycle may not align with your requirements.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/projectdiscovery-httpx.svg)](https://starlog.is/api/badge-click/cybersecurity/projectdiscovery-httpx)