Mining Bug Bounty Scopes at Scale: Inside bounty-targets-data's Automated Intelligence Pipeline

Hook

Every 30 minutes, a silent commit lands in a GitHub repository that 3,700+ security researchers monitor for one thing: discovering which Fortune 500 companies just opened their doors to ethical hackers.

Context

Bug bounty hunters face a tedious operational problem: tracking which assets are in scope across multiple platforms. A company might run programs on HackerOne, Bugcrowd, and Intigriti simultaneously, each with different scopes. Before bounty-targets-data, researchers needed custom scrapers for each platform, dealing with rate limits, authentication challenges, and format inconsistencies. Worse, scope changes happened invisibly—a company might add new domains or retire old ones, and you'd only discover this by manually checking program pages.

Arkadiyt's bounty-targets-data solves this through radical simplicity: it's a data repository, not a software project. Every 30 minutes, automated scrapers (maintained in a separate bounty-targets repository) pull fresh scope data from six major platforms—HackerOne, Bugcrowd, Intigriti, YesWeHack, Federacy, and HackenProof—then commit the results directly to this repo. The approach transforms GitHub from version control into a free hosting platform, CDN, and time-series database simultaneously. For reconnaissance automation, this is gold: you can clone once, pull frequently, and have a constantly-updated target list without writing a single line of scraping code.

Technical Insight

System architecture — auto-generated

The repository's architecture is deceptively simple but strategically brilliant. Rather than building an API or database, it leverages Git's inherent capabilities. The file structure divides into two categories: aggregated convenience files and platform-specific raw dumps.

The aggregated files—domains.txt and wildcards.txt—combine scope data across all platforms into simple newline-delimited lists. This makes integration trivial for reconnaissance tools:

# Feed directly into subdomain enumeration
curl -s https://raw.githubusercontent.com/arkadiyt/bounty-targets-data/main/data/domains.txt | \
  subfinder -dL - -silent | \
  httpx -silent -threads 100

# Or monitor for new targets
git clone --depth 1 https://github.com/arkadiyt/bounty-targets-data
cd bounty-targets-data
while true; do
  git pull origin main
  git diff HEAD~1 HEAD -- data/domains.txt | grep '^+' | sed 's/^+//' > new_targets.txt
  if [ -s new_targets.txt ]; then
    # Alert on new domains
    notify-send "New bounty targets found"
  fi
  sleep 1800  # Check every 30 minutes
done

The platform-specific JSON files contain richer metadata. For example, the HackerOne dump includes program handles, eligibility criteria, and structured scope objects. Here's how you might parse it for high-value targets:

import json
import requests

# Fetch the latest HackerOne data
url = 'https://raw.githubusercontent.com/arkadiyt/bounty-targets-data/main/data/hackerone_data.json'
response = requests.get(url)
programs = response.json()

# Find programs offering bounties (not just VDPs) with web targets
high_value = []
for program in programs:
    if not program.get('offers_bounties', False):
        continue
    
    for target in program.get('targets', {}).get('in_scope', []):
        if target.get('asset_type') == 'URL':
            high_value.append({
                'program': program['handle'],
                'target': target['asset_identifier'],
                'max_severity': target.get('max_severity', 'unknown')
            })

# Result: filterable list of paying programs with web scope
print(f"Found {len(high_value)} high-value targets")

The Git history provides an underutilized feature: time-series analysis. Since every change is committed, you can track when companies expand or contract their programs:

# See when example.com entered any bug bounty program
git log -p --all -S 'example.com' -- data/domains.txt

# Track scope changes for a specific program
git log --oneline -- data/hackerone_data.json | head -20

This turns Git into a free monitoring system. Security researchers use this to identify companies newly entering the bug bounty space—often the best time to find vulnerabilities before programs mature and competition intensifies.

The update frequency of 30 minutes strikes a pragmatic balance. Platforms don't change scope that rapidly, but when they do, half-hour latency is acceptable for most reconnaissance workflows. The implementation commits only when changes are detected, preventing empty commits that would bloat the repository unnecessarily.

One clever detail: the repository includes both individual platform dumps and merged aggregates. This supports two distinct use cases. The aggregates (domains.txt) serve quick automation and broad reconnaissance. The platform-specific JSON files enable sophisticated filtering—like targeting only programs with specific reward ranges or excluding VDPs (Vulnerability Disclosure Programs) that don't pay bounties.

Gotcha

The most dangerous pitfall is treating wildcard scopes as gospel. The wildcards.txt file contains entries like *.example.com, but bug bounty programs almost always have exclusions. A program might include *.company.com while explicitly excluding admin.company.com or internal.company.com. Testing these excluded assets can result in scope violations, damaged reputation, or even legal trouble. The repository can't capture these exclusions consistently because platforms represent them differently—some in free-text fields, others in separate exclusion lists. Always verify the actual program page before testing wildcard domains.

Repository bloat is a growing concern. With commits every 30 minutes, the Git history expands continuously. Currently at 3+ years of history, shallow clones (git clone --depth 1) are essential for reasonable download sizes. For CI/CD pipelines that clone frequently, this matters: a full clone can hit hundreds of megabytes, while a shallow clone stays under 50MB. If you're building automation, consider fetching raw files via curl rather than cloning the entire repository.

Data completeness varies by platform. Some platforms expose comprehensive public APIs (HackerOne), while others require scraping HTML or have restricted access. Private programs—those requiring invitations—won't appear in this dataset at all, which excludes a significant portion of the bug bounty landscape. Additionally, platforms like Synack and Cobalt operate primarily on private programs, so they're not represented here. The dataset captures the public bug bounty surface, but that's a subset of the total opportunity space.

Verdict

Use if: You're building reconnaissance automation, running continuous asset discovery, or need a free, no-auth-required source of bug bounty scope data. This is essential infrastructure for any serious bug bounty hunter's toolkit—integrating it into your workflow takes minutes and immediately expands your target visibility across six platforms. It's also perfect for security researchers tracking the bug bounty ecosystem's growth or analyzing which companies invest in public security programs. Skip if: You need legal certainty about scope boundaries (always verify on program pages), require data from private programs or platforms not covered here, or can't tolerate 30-minute data latency for time-sensitive operations. Also skip if you're building a commercial service—the repository's license and growing size make direct API access to platforms more appropriate for production use cases with high reliability requirements.

Mining Bug Bounty Scopes at Scale: Inside bounty-targets-data's Automated Intelligence Pipeline

Mining Bug Bounty Scopes at Scale: Inside bounty-targets-data's Automated Intelligence Pipeline

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Mining Bug Bounty Scopes at Scale: Inside bounty-targets-data's Automated Intelligence Pipeline

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

Glow: Why Rendering Markdown in the Terminal Shouldn't Require a Browser

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]