Back to Articles

Bridging HTTPX and BBRF: A Unix Pipeline Approach to Reconnaissance Data Flow

[ View on GitHub ]

Bridging HTTPX and BBRF: A Unix Pipeline Approach to Reconnaissance Data Flow

Hook

Most bug bounty hunters lose discovered assets between reconnaissance tools. The gap between finding a subdomain and tracking it in your asset database is where leads disappear—often permanently.

Context

Bug bounty reconnaissance involves multiple specialized tools working in sequence. HTTPX, Project Discovery's HTTP probe tool, excels at validating live hosts and extracting HTTP metadata from large domain lists. BBRF (Bug Bounty Reconnaissance Framework) by Honoki provides a centralized database for tracking discovered assets across programs and time. Both tools are industry standards, but they speak different languages.

The problem surfaces during workflow automation. HTTPX outputs rich JSON data about probed hosts—status codes, titles, technologies, response bodies. BBRF expects assets in its own format via command-line client calls. The manual bridge involves parsing JSON with jq, extracting relevant fields, and invoking BBRF commands for each discovered asset. This works for one-off scans but breaks down in continuous reconnaissance pipelines where hundreds of domains flow through hourly. You need something that handles the stream reliably, doesn't drop data, and fits naturally into Unix pipes. httpx2bbrf emerged from this operational pain point—a focused utility that does format translation and nothing else.

Technical Insight

httpx2bbrf follows the classic Unix filter pattern: read from stdin, process each line as JSON, invoke external commands with parsed data, repeat until EOF. The architecture is deliberately minimal—no configuration files, no state management, no complex error recovery. It's designed to slot into existing pipelines without friction.

The core flow reads HTTPX's JSON output line-by-line. HTTPX's -json flag emits one JSON object per probed host, making it perfect for streaming. Each object contains fields like url, host, status-code, title, and technology fingerprints. httpx2bbrf extracts the essential url or host field and passes it to BBRF's command-line client:

scanner := bufio.NewScanner(os.Stdin)
for scanner.Scan() {
    var result HTTPXResult
    if err := json.Unmarshal(scanner.Bytes(), &result); err != nil {
        continue // Skip malformed JSON
    }
    
    // Extract URL or fallback to host
    target := result.URL
    if target == "" {
        target = result.Host
    }
    
    // Invoke BBRF client
    cmd := exec.Command("bbrf", "url", "add", target)
    cmd.Run()
}

This snippet illustrates the design philosophy. The tool unmarshals JSON using Go's standard library, performs minimal validation (checking for empty strings), and shells out to BBRF via exec.Command. There's no retry logic, no rate limiting, no batching—just straightforward transformation and forwarding.

The decision to use exec.Command rather than BBRF's underlying API is practical. BBRF's architecture involves a Python client that communicates with a CouchDB backend through documented commands. By invoking the CLI, httpx2bbrf avoids dependency management (no Python imports), respects BBRF's configuration system (credentials, server endpoints), and remains stable across BBRF updates. The tradeoff is execution overhead—spawning a process per asset adds latency—but for reconnaissance workflows processing dozens to hundreds of hosts per scan, this overhead is negligible compared to network probe time.

Go's concurrency primitives aren't leveraged here because the bottleneck is external: BBRF's database operations and network round-trips dwarf JSON parsing time. The sequential approach also preserves ordering, which matters when debugging reconnaissance pipelines. If you see an asset in BBRF, you know it appeared in HTTPX output at that position in the stream.

Integration into workflows looks like this:

cat subdomains.txt | \
  httpx -json -silent | \
  httpx2bbrf

Or for continuous monitoring:

while true; do
  subfinder -d target.com -silent | \
    httpx -json -silent -timeout 5 | \
    httpx2bbrf
  sleep 3600
done

The tool's simplicity makes it composable. You can filter HTTPX output with jq before piping to httpx2bbrf, aggregate results with tee, or fan-out to multiple BBRF programs simultaneously. The lack of features becomes a strength in complex pipelines where predictability matters more than sophistication.

One subtle detail: the tool handles HTTPX's different output modes gracefully. When HTTPX probes fail, it may output JSON with limited fields (just host and error details). httpx2bbrf's fallback logic—trying url first, then host—ensures failed probes still register in BBRF as discovered domains, even if they're not currently live. This supports tracking asset lifecycles over time, where a domain offline today might resolve tomorrow.

Gotcha

The author's disclaimer about code quality isn't false modesty. Examining the repository reveals no tests, minimal documentation, and error handling that silently continues on failures. When BBRF commands fail—due to network issues, authentication problems, or rate limits—httpx2bbrf swallows the error and moves to the next asset. For production reconnaissance pipelines, this silent failure mode is dangerous. You might scan 10,000 hosts, believe you've stored them all in BBRF, and discover days later that half the data never arrived because your BBRF token expired mid-run.

The tool also lacks any filtering or deduplication logic. If your pipeline rescans the same domains hourly, BBRF receives duplicate url add commands continuously. While BBRF handles duplicates internally (URLs are unique in the database), this generates unnecessary API traffic and log noise. More sophisticated integration would track which assets are new versus already-known, adding only deltas. The absence of state management means httpx2bbrf can't provide this intelligence—it's a stateless filter by design, for better and worse.

Performance becomes an issue at scale. Spawning a BBRF process for every single URL works for hundreds of assets but chokes on tens of thousands. Modern reconnaissance can discover 50,000+ subdomains for large targets. At that volume, process creation overhead compounds, BBRF's API gets hammered with individual inserts instead of batch operations, and pipeline latency stretches from seconds to hours. The tool needs batch processing—accumulating 100 URLs and invoking bbrf url add once with all arguments—but the current implementation lacks this optimization.

Verdict

Use if: You're running small-to-medium bug bounty reconnaissance (under 5,000 assets per scan), already have both HTTPX and BBRF installed and working, and need a quick integration without writing custom scripts. The tool delivers immediate value for standard pipelines and its simplicity makes debugging straightforward. It's perfect for individual hunters or small teams establishing reconnaissance workflows. Skip if: You're operating at scale (tens of thousands of assets), need robust error handling and monitoring for production environments, or aren't already committed to the BBRF ecosystem. The silent failure mode and lack of batch processing make it unsuitable for large-scale continuous reconnaissance. In those scenarios, invest in custom tooling using BBRF's API directly with proper error handling, retry logic, and batch operations—or adopt comprehensive automation frameworks like Axiom that handle tool orchestration natively with observability built in.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/z0mb13s3c-httpx2bbrf.svg)](https://starlog.is/api/badge-click/developer-tools/z0mb13s3c-httpx2bbrf)