Back to Articles

Popular-Site-Subdomains: The Bug Bounty Hunter's Secret Wordlist

[ View on GitHub ]

Popular-Site-Subdomains: The Bug Bounty Hunter's Secret Wordlist

Hook

A single forgotten subdomain at Facebook once led to a vulnerability disclosure. The challenge? Finding it among thousands of potential endpoints before infrastructure changes make it disappear forever.

Context

Subdomain enumeration is the unglamorous first step of every security assessment. Before you can find vulnerabilities, you need to map the attack surface—and for companies like Google or Amazon, that means discovering hundreds or thousands of subdomains that host everything from legacy applications to staging environments to forgotten APIs. Traditional approaches involve DNS bruteforcing with generic wordlists ("admin", "dev", "staging") or querying Certificate Transparency logs, but these methods have blind spots. Generic wordlists miss company-specific naming conventions, while CT logs only capture subdomains that received SSL certificates.

Jamie Farrelly's Popular-Site-Subdomains takes a different approach: crowdsourcing real subdomain discoveries into curated lists. Each text file represents actual reconnaissance work against a specific domain—facebook.com.txt, google.com.txt, amazon.com.txt—containing subdomains that someone verified existed at some point. For bug bounty hunters and penetration testers, this repository serves as a baseline wordlist that reflects how these companies actually name their infrastructure, not how a wordlist generator thinks they might. It's the reconnaissance equivalent of standing on the shoulders of giants, rather than rediscovering the same subdomains everyone else has already found.

Technical Insight

Processing

Repository

Read subdomain list

Query each subdomain

Return IP/NXDOMAIN

Active subdomains

Failed lookups

Text Files Repository

User Script/Tool

DNS Resolver

Filtered Results

Dead Subdomains

facebook.com.txt

google.com.txt

*.txt files

massdns/custom parser

System architecture — auto-generated

The repository's power lies in its simplicity. Each file is a newline-separated list of subdomains, sorted alphabetically. No JSON schemas, no complex parsing, no API authentication. This design choice makes it trivially easy to integrate into existing workflows. Here's how you'd use it with a basic Python script to check which subdomains are currently active:

import dns.resolver
import sys

def check_subdomain(subdomain):
    try:
        answers = dns.resolver.resolve(subdomain, 'A')
        return True, [str(rdata) for rdata in answers]
    except (dns.resolver.NXDOMAIN, dns.resolver.NoAnswer, dns.exception.Timeout):
        return False, []

with open('facebook.com.txt', 'r') as f:
    subdomains = [line.strip() for line in f if line.strip()]

for subdomain in subdomains:
    is_active, ips = check_subdomain(subdomain)
    if is_active:
        print(f"[+] {subdomain} -> {', '.join(ips)}")
    else:
        print(f"[-] {subdomain} (dead)")

This script represents the fundamental workflow: load the list, verify what's still alive, and investigate active endpoints. The flat file format means you can pipe it through command-line tools just as easily. Want to feed it into massdns for high-speed resolution? cat google.com.txt | massdns -r resolvers.txt -t A -o S. Need to combine multiple lists? cat *.txt | sort -u > combined.txt. The lack of structure is the structure.

The real value emerges when you analyze patterns in the naming conventions. Looking through facebook.com.txt, you'll notice patterns like geographic identifiers ("dallas", "singapore"), product names ("messenger", "workplace"), and infrastructure hints ("edge", "api", "internal"). These patterns teach you how Facebook thinks about subdomain organization, which you can then apply to your own bruteforcing attempts. If you see "messenger-api" exists, you might try "whatsapp-api" or "instagram-api" even if they're not in the list.

For more sophisticated reconnaissance, you can diff the repository over time to identify infrastructure changes. Clone the repo, wait a month, pull updates, and run git diff on specific files. New subdomains indicate expanding infrastructure or new products. Removed subdomains might indicate decommissioned services—which, if still resolvable, could be forgotten attack surface:

git clone https://github.com/JamieFarrelly/Popular-Site-Subdomains.git
cd Popular-Site-Subdomains
# One month later
git pull
git log -p google.com.txt | grep "^+" | grep -v "^+++" > new_google_subdomains.txt

The architectural choice to separate domains into individual files rather than a single monolithic list or database is deliberate. It enables partial updates via pull requests (contributors can add to facebook.com.txt without touching google.com.txt, reducing merge conflicts), makes it easy to grab just what you need (download one file instead of cloning everything), and keeps the repository maintainable without complex tooling. It's data architecture optimized for GitHub's collaboration model rather than runtime performance—a choice that prioritizes community contribution over query efficiency.

Gotcha

The repository's Achilles' heel is staleness. Subdomains are ephemeral—companies constantly spin up new infrastructure for product launches, testing, or geographic expansion, and they decommission old services. A subdomain that existed three months ago might return NXDOMAIN today, while critical new subdomains won't appear until someone manually discovers and contributes them. There's no automated validation pipeline checking whether listed subdomains still resolve, no timestamps indicating when each entry was verified, and no status indicators separating active infrastructure from dead entries. You're getting a historical snapshot, not a live feed.

This limitation has practical consequences. If you're preparing for a bug bounty engagement and rely solely on these lists, you'll waste time investigating dead subdomains while potentially missing newly created endpoints that house the most interesting vulnerabilities (new code often means new bugs). The repository also lacks metadata that would make it more useful—no IP addresses, no HTTP status codes, no technology fingerprints. You're getting just the subdomain string, which means you still need to build the full reconnaissance pipeline yourself. For less popular domains or companies not covered in the repository, it offers zero value. The focus on "popular sites" means long-tail targets—mid-sized SaaS companies, regional platforms, niche services—get no coverage.

Verdict

Use if: You're conducting reconnaissance against major tech companies (Facebook, Google, Amazon, Microsoft, etc.) for bug bounty programs or penetration testing, and you want a curated starting wordlist that reflects real naming conventions rather than generic guesses. It's particularly valuable when combined with active enumeration tools—use these lists as a baseline, then layer on Certificate Transparency monitoring, DNS bruteforcing with custom wordlists derived from patterns you observe, and subdomain permutation techniques. Also use it if you're researching how large organizations structure their DNS namespaces or building training datasets for subdomain prediction models. Skip if: You need real-time subdomain discovery with freshness guarantees, you're targeting companies not included in the repository, or you want rich metadata like resolution status, IP addresses, or technology fingerprints. Also skip if you're expecting comprehensive coverage—these lists are community-contributed and inherently incomplete. For production security monitoring or time-sensitive assessments, invest in commercial subdomain intelligence platforms or build automated enumeration pipelines using tools like Amass, Subfinder, and Certificate Transparency log monitoring instead.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/jamiefarrelly-popular-site-subdomains.svg)](https://starlog.is/api/badge-click/developer-tools/jamiefarrelly-popular-site-subdomains)