How Git Became a DNS Time Machine: Inside TLDR's Zone Transfer Archive
Hook
Every two hours, a Python script attempts what security teams explicitly forbid in enterprise networks: zone transfers against every top-level domain on the internet. The results reveal which TLDs operate with radical transparency—and which ones learned from decades of DNS security mistakes.
Context
DNS zone transfers (AXFR) have an interesting dual nature in infrastructure security. In corporate environments, allowing zone transfers is considered a critical misconfiguration—it's literally handing attackers a complete map of your internal network topology. Security scanners flag it, compliance frameworks forbid it, and sysadmins lock it down immediately. But at the TLD level, the calculus changes entirely.
Top-level domains like .com, .org, and country-code TLDs manage public infrastructure. Their nameserver records aren't corporate secrets—they're the foundational routing tables of the internet itself. Some TLDs embrace this transparency by explicitly allowing zone transfers, making their entire zone files publicly accessible. Others restrict access, not necessarily for security, but to control how their data gets consumed. TLDR was created to document this landscape: which TLDs allow zone transfers, what those transfers contain, and how this data changes over time. By running automated zone transfer attempts every ~2 hours and committing the results to Git, it transforms version control into a time-series database for DNS infrastructure research.
Technical Insight
The architectural elegance of TLDR lies in its simplicity: marry DNS zone transfers with Git's versioning capabilities, and you get a queryable historical database with zero custom storage logic. The core loop is straightforward Python that iterates through root nameservers and TLD authoritative servers, attempts AXFR requests, and writes successful responses to files.
Here's the conceptual flow of a zone transfer attempt:
import dns.query
import dns.zone
from dns.exception import DNSException
def attempt_zone_transfer(nameserver, zone_name):
"""Attempt AXFR against a nameserver for a given zone."""
try:
zone = dns.zone.from_xfr(
dns.query.xfr(nameserver, zone_name, timeout=30)
)
# Convert zone to text format
zone_text = []
for name, node in zone.nodes.items():
for rdataset in node.rdatasets:
for rdata in rdataset:
zone_text.append(
f"{name} {rdataset.ttl} {rdataset.rdclass} "
f"{rdataset.rdtype} {rdata}"
)
return '\n'.join(sorted(zone_text))
except DNSException as e:
# Zone transfer refused or failed
return None
The magic happens in what comes next. Each successful zone transfer gets written to a file named after the TLD and nameserver. The script then commits these files to the repository. Git's diff engine automatically tracks what changed between runs: new nameserver entries appear as additions, removed servers show as deletions, and modifications to existing records are clearly highlighted.
This approach leverages Git as an unintended time-series database. Want to know when the .ai TLD added a new nameserver? Run git log -p -- ai/ and grep for additions. Curious about how many times .io's zone file has changed in the past month? Check the commit history. Need to correlate DNS changes with an outage? The commit timestamps provide precise temporal data.
The system maintains a list of TLD nameservers by querying the root zone itself. This creates a self-updating loop: as new TLDs are added to the root zone (which happens regularly as ICANN approves new gTLDs), TLDR automatically discovers and begins monitoring them. The architecture inherently scales with the DNS ecosystem.
What makes this particularly valuable for researchers is the ability to track policy changes through DNS records. When a TLD switches registry operators, you'll see wholesale changes to NS records and potentially SOA parameters. When a ccTLD modernizes its infrastructure, the evidence appears in updated nameserver addresses or the addition of IPv6 glue records. The Git history becomes a forensic timeline of internet governance decisions.
The two-hour update cadence balances freshness with repository bloat. DNS zone files at the TLD level don't change minute-to-minute like application-layer data. A nameserver addition is a significant infrastructure event that involves planning and coordination. Two hours provides sufficient resolution to catch these changes while preventing the repository from growing to unmanageable sizes too quickly (though as we'll discuss, this becomes a problem eventually).
Gotcha
The original TLDR project is explicitly deprecated, which is the first and most critical limitation to understand. The creator archived it in favor of TLDR-2, a fork maintained by flotwig that continues the monitoring effort with updated infrastructure. If you're reading this in 2024 or later, you should be looking at TLDR-2 instead of this repository. The data here is frozen in time, which ironically makes it useful only as historical snapshot, not as an ongoing research tool.
Beyond the maintenance status, there's a fundamental sampling bias: TLDR only captures TLDs that allow zone transfers. Many modern TLDs explicitly disable AXFR requests, making their zone data invisible to this approach. You're not getting a complete picture of the DNS landscape—you're seeing the subset of operators who chose transparency over control. This creates blind spots in research that assumes comprehensive coverage.
Repository size becomes unwieldy over time. Every commit adds to the total size, and Git isn't optimized for this use case. Cloning a repository with thousands of commits, each containing full zone file snapshots, takes considerable time and disk space. Shallow clones help but limit historical analysis. This is a fundamental tension between Git's design (optimized for source code) and TLDR's use case (storing large, frequently-changing datasets). Some researchers have resorted to exporting specific date ranges rather than working with the full repository.
Verdict
Use if: You're researching DNS infrastructure evolution, studying TLD operational practices, need historical DNS data for security research, or want to understand which TLDs operate with transparency. It's particularly valuable for academic work on internet governance or as a dataset for training DNS anomaly detection systems. However, immediately migrate to the TLDR-2 fork (flotwig/TLDR-2) for any active monitoring needs—this original repository is deprecated and no longer updated. Skip if: You need real-time DNS intelligence (two-hour lags are too slow), require comprehensive coverage of all TLDs (many block zone transfers), or can't tolerate the repository size (multi-gigabyte clones). Also skip if you're looking for subdomain enumeration data—this captures TLD-level nameservers, not individual domain records. For enterprise DNS monitoring, this approach doesn't translate; corporate zone transfers should remain locked down.