dig8: Building a DNS Crawler from Scratch in Go

Hook

Most Go developers reach for net.LookupHost() and call it a day. But what if you need to crawl millions of domains, track the entire resolution chain, and cache authoritative nameservers? You'll need to go lower.

Context

DNS libraries typically fall into two camps: the standard library's black-box resolver that works great for simple lookups but gives you zero visibility into the resolution process, or full-featured protocol libraries like miekg/dns that provide every DNS feature imaginable but require you to build your own crawling infrastructure on top. If you're building a DNS reconnaissance tool, conducting internet measurement research, or need to collect DNS data at scale, neither approach is ideal. You don't just want answers—you want to understand the path to those answers, cache nameserver information across queries, and process thousands of domains with progress tracking and failure handling.

This is the problem space dig8 tackles. Rather than wrapping existing DNS libraries or relying on the operating system's resolver, it implements the DNS wire protocol from scratch and builds a recursive resolution engine specifically designed for crawling scenarios. The library handles everything from label compression in DNS packets to managing query trees for batch domain processing. It's an educational look at what happens when you need complete control over DNS resolution, even if that means reinventing some wheels.

Technical Insight

System architecture — auto-generated

dig8's architecture starts at the packet level with manual encoding and decoding of DNS wire format. The library implements label packing with compression pointer support, allowing it to construct and parse DNS packets without relying on higher-level abstractions. This might seem like overkill until you consider that crawling scenarios often need to inspect specific header flags, track additional section records, or handle malformed responses from authoritative servers that standard libraries might reject.

The core DNS client layer manages connection pooling and packet ID allocation. Here's where it gets interesting: DNS packet IDs are 16-bit values used to match responses to queries, and in concurrent scenarios, you need thread-safe ID management. dig8 implements an ID pool that assigns unique identifiers across goroutines, preventing ID collisions when you're firing off hundreds of simultaneous queries:

// Conceptual example of how dig8 manages concurrent queries
type DNSClient struct {
    conn      net.Conn
    idPool    chan uint16
    responses map[uint16]chan *DNSMessage
}

func (c *DNSClient) Query(domain string, qtype uint16) (*DNSMessage, error) {
    // Get unique ID from pool
    id := <-c.idPool
    defer func() { c.idPool <- id }() // Return to pool when done
    
    // Build query packet
    query := &DNSMessage{
        ID: id,
        Question: Question{Name: domain, Type: qtype},
    }
    
    // Send and wait for response
    respChan := make(chan *DNSMessage, 1)
    c.responses[id] = respChan
    
    c.conn.Write(query.Pack())
    return <-respChan, nil
}

Above the client sits the recursive resolver, which is where dig8 diverges significantly from simple DNS libraries. Instead of just querying configured nameservers, it implements actual recursive resolution: starting from root servers, following referrals through TLD servers, and finally reaching authoritative nameservers. Crucially, it caches nameserver information per zone, so when crawling multiple domains under the same TLD, it doesn't repeat the root and TLD lookups.

The resolver maintains a nameserver cache that maps zones to their authoritative servers. When resolving "example.com", it first checks if it knows the nameservers for "com". If not, it queries the roots. Once it has the TLD servers, it caches them and queries for "example.com" nameservers. This zone-aware caching is essential for crawler efficiency—processing 100,000 .com domains might only require one trip to the root servers and one to the .com TLD servers, rather than 100,000 of each.

The top layer is the dcrl (DNS crawler) framework, which manages batch processing of domain lists. It implements a query tree structure where each domain can spawn multiple query types (A, AAAA, MX, etc.), track completion status, and handle failures. The tree uses cursor-based iteration to support pause/resume functionality—critical when crawling millions of domains over hours or days:

// Simplified example of dig8's query tree structure
type QueryTree struct {
    Root     *QueryNode
    Cursor   *QueryNode  // Current position for resumable iteration
    Database *sql.DB     // For persisting results
}

type QueryNode struct {
    Domain   string
    QType    uint16
    Status   string  // "pending", "complete", "failed"
    Result   *DNSMessage
    Branches []*QueryNode  // Child queries
}

func (t *QueryTree) ProcessNext() error {
    if t.Cursor == nil {
        return io.EOF
    }
    
    // Resolve current node
    result, err := resolver.Query(t.Cursor.Domain, t.Cursor.QType)
    if err != nil {
        t.Cursor.Status = "failed"
        t.Database.LogFailure(t.Cursor.Domain, err)
    } else {
        t.Cursor.Status = "complete"
        t.Cursor.Result = result
        t.Database.Insert(result)
    }
    
    // Advance cursor to next pending node
    t.Cursor = t.findNextPending()
    return nil
}

This architecture reveals dig8's primary use case: it's built for scenarios where you need both low-level protocol control and high-level batch processing infrastructure. The library bridges the gap between "send a DNS query" and "crawl the internet's DNS data", providing the plumbing for research projects, security reconnaissance, or DNS data collection efforts that standard libraries don't address.

Gotcha

The elephant in the room is adoption and maintenance. With only 2 GitHub stars and no visible community, dig8 hasn't achieved the battle-testing that production code demands. DNS is deceptively complex—edge cases around label encoding, response validation, timeout handling, and protocol compliance are everywhere. Libraries like miekg/dns have thousands of users finding bugs and corner cases. With dig8, you're the QA team.

The library also lacks modern DNS features entirely. There's no DNSSEC validation, so you can't verify response authenticity—a critical limitation if you're building security tools. DNS-over-HTTPS and DNS-over-TLS are absent, meaning you're stuck with plain UDP/TCP queries that are trivially observable and tamperable on the network. The documentation is essentially non-existent beyond source code comments, so you'll be reading implementation details to understand usage. For a library that implements a complex protocol from scratch, this is a significant barrier to adoption. You'll need to invest serious time understanding the codebase before you can use it effectively, and when you hit issues, you're largely on your own.

Verdict

Use if: you're building DNS measurement research tools, conducting large-scale DNS surveys, or need granular control over the entire resolution chain with built-in crawling infrastructure. dig8 provides unique value if you need to track nameserver hierarchies, cache zone authorities across thousands of queries, or integrate DNS resolution directly with database persistence for bulk data collection. It's also educational if you want to understand DNS protocol implementation details in Go. Skip if: you're building production services, need DNSSEC or encrypted DNS support, require community-vetted code for reliability, or can accomplish your goals with miekg/dns or the standard library. The low adoption rate and missing modern features make it risky for anything beyond research or personal projects where you can afford to debug protocol-level issues yourself.

dig8: Building a DNS Crawler from Scratch in Go

dig8: Building a DNS Crawler from Scratch in Go

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

dig8: Building a DNS Crawler from Scratch in Go

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

Inside awesome-selfhosted: How a 292K-Star GitHub List Became the Self-Hosting Movement's Central Nervous System

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]