Curate: Mining VirusTotal's URL Archive for Bug Bounty Reconnaissance
Hook
VirusTotal scans billions of URLs annually—but most security researchers never tap into this goldmine for historical endpoint discovery. A 50-line shell script changes that.
Context
Bug bounty hunters and penetration testers face a fundamental challenge: discovering the full attack surface of a target application. Modern web apps expose hundreds or thousands of endpoints, but many remain undocumented—old admin panels, forgotten APIs, development endpoints accidentally pushed to production. Traditional crawling only reveals what's currently linked, missing the historical context of what once existed.
This is where archived URL datasets become invaluable. While tools like the Wayback Machine became the de facto standard for historical reconnaissance, VirusTotal's URL archive remained an underutilized resource. VirusTotal continuously crawls and analyzes URLs as part of its malware detection operations, building a massive dataset of observed web endpoints. Curate emerged as a simple interface to this dataset, giving security researchers a lightweight way to query VirusTotal's archives without writing custom API integration code. Created by Ed Foudil (EdOverflow), a well-known bug bounty hunter and security researcher, curate embodies the Unix philosophy: do one thing simply and compose with other tools.
Technical Insight
Curate's architecture is refreshingly straightforward—it's essentially a 50-line wrapper around curl and jq that transforms VirusTotal API responses into a usable reconnaissance dataset. The core logic makes a single API call to VirusTotal's /domains/{domain}/urls endpoint, handles pagination, and extracts URLs from the JSON response.
Here's a simplified view of how curate operates:
#!/bin/bash
DOMAIN=$1
API_KEY=$2
# Query VirusTotal API for domain URLs
curl -s --request GET \
--url "https://www.virustotal.com/api/v3/domains/${DOMAIN}/urls?limit=40" \
--header "x-apikey: ${API_KEY}" | \
jq -r '.data[].attributes.url' > curate.txt
# Follow pagination cursor for more results
CURSOR=$(curl -s --request GET \
--url "https://www.virustotal.com/api/v3/domains/${DOMAIN}/urls?limit=40" \
--header "x-apikey: ${API_KEY}" | jq -r '.meta.cursor')
while [ "$CURSOR" != "null" ]; do
curl -s --request GET \
--url "https://www.virustotal.com/api/v3/domains/${DOMAIN}/urls?limit=40&cursor=${CURSOR}" \
--header "x-apikey: ${API_KEY}" | \
jq -r '.data[].attributes.url' >> curate.txt
# Update cursor for next iteration
done
The tool's strength lies in its composability with standard Unix utilities. Security researchers typically pipe curate's output through additional filtering layers. The repository includes a -r flag option that filters results for common bug bounty patterns—parameters that might indicate vulnerability vectors like redirects, file operations, or authentication flows:
# Find URLs with interesting parameters
cat curate.txt | grep -E '(redirect|url=|file=|path=|callback=)'
# Extract unique parameter names for fuzzing
cat curate.txt | unfurl keys | sort -u
# Find potential API endpoints
cat curate.txt | grep -E '/api/|/v[0-9]/'
# Discover forgotten admin panels
cat curate.txt | grep -E '(admin|dashboard|panel|manage)'
The design decision to output to a persistent file (curate.txt) rather than stdout enables iterative reconnaissance workflows. Researchers can run curate once, then experiment with different grep patterns, regex filters, and post-processing scripts without hitting API rate limits repeatedly. This file-based approach also integrates naturally with other reconnaissance tools:
# Combine curate with httpx to find live endpoints
curate example.com $VT_API_KEY
cat curate.txt | httpx -silent -status-code -title
# Feed discovered endpoints to a parameter fuzzer
cat curate.txt | grep '=' | ffuf -w - -u FUZZ -mc 200,301,302
# Extract and test all JavaScript file URLs
cat curate.txt | grep '\.js$' | wget -i - && \
grep -r 'api_key\|password\|secret' *.js
The VirusTotal API returns rich metadata beyond just URLs—timestamps, HTTP status codes, submission context—but curate deliberately discards this, focusing solely on URL extraction. This decision reduces complexity but limits advanced use cases like temporal analysis ("show me endpoints that existed in 2020 but disappeared in 2021") or filtering by HTTP response patterns.
One subtle architectural detail: curate doesn't implement exponential backoff or rate limit handling. The VirusTotal API enforces request quotas (typically 4 requests per minute for free tier accounts, more for premium), and curate will simply fail if you exceed these limits. For small-scale manual reconnaissance, this isn't problematic—querying a single domain rarely triggers rate limits. But attempting to batch-process multiple domains requires external rate limiting:
# Manual rate limiting for batch processing
while IFS= read -r domain; do
./curate.sh "$domain" "$API_KEY"
sleep 15 # Crude rate limiting
done < domains.txt
This lack of sophisticated error handling is characteristic of shell script tooling—it's fast to write and easy to understand, but lacks the resilience of compiled languages with structured error types and retry logic. The repository description acknowledging plans to "rewrite in Go" suggests the original author recognized these limitations, though that rewrite never materialized.
Gotcha
Curate's simplicity cuts both ways. The most immediate limitation is the VirusTotal API key requirement—free tier accounts face restrictive rate limits (4 requests/minute, 500 requests/day), making large-scale reconnaissance impractical without a paid subscription. More problematically, the script includes no error handling whatsoever. If your API key is invalid, the quota is exceeded, or the network connection drops mid-query, curate fails silently or produces corrupted output. There's no validation of the API response structure, so schema changes from VirusTotal would break the tool invisibly.
The shell script implementation also limits portability and maintainability. Dependencies on specific jq syntax and bash-isms mean it won't run reliably across different shells or operating systems without modification. The hardcoded output filename (curate.txt) causes problems in concurrent workflows—running curate against multiple domains simultaneously causes file clobber conflicts. The repository appears effectively abandoned (41 stars, no recent commits, promised Go rewrite never happened), so don't expect bug fixes or feature additions. For production reconnaissance pipelines, this abandonment is a deal-breaker. You're better served by actively maintained alternatives like gau or waybackurls, which query multiple archive sources, handle errors gracefully, and support structured output formats like JSON. That said, for quick manual reconnaissance where you need archived URLs from VirusTotal specifically, curate's 50-line implementation is easier to audit and modify than spinning up a more complex tool.
Verdict
Use if: You need a quick, disposable way to pull VirusTotal's archived URLs during manual bug bounty reconnaissance, you already have a VirusTotal API key, and you're comfortable with shell scripting's rough edges. Curate shines in one-off investigations where you want to grep through historical endpoints without building API integration from scratch. It's also useful as a learning reference—reading its source teaches you the VirusTotal API structure in 5 minutes. Skip if: You need production-ready tooling with error handling and rate limiting, you're processing multiple domains at scale, or you want the richness of querying multiple archive sources (Wayback Machine, Common Crawl, URLScan) in a single command. In those cases, use gau or waybackurls instead—they're actively maintained, more feature-complete, and handle the complexity that curate ignores. Also skip if you don't already have a VirusTotal API key; the signup friction and rate limits make curate impractical for casual exploration compared to tools that query open archives.