LZR: The Two-Packet Protocol Detective for Internet-Wide Security Research
Hook
While most port scanners send dozens of probes to identify a single service, LZR fingerprints 18 different protocols simultaneously using only two additional packets—a breakthrough that makes scanning all 65,535 ports across the entire IPv4 internet actually feasible.
Context
Internet-wide security research faces a fundamental problem: scale versus precision. Tools like ZMap revolutionized network scanning by completing full IPv4 sweeps in minutes, but they only tell you if a port accepts connections—not what's actually listening. Traditional application-layer tools like ZGrab can identify services, but sending full protocol handshakes to billions of IP-port combinations generates prohibitive bandwidth and takes unreasonable time.
This gap became critical as researchers discovered that misconfigured services increasingly appear on non-standard ports. A database might run on port 8080 instead of 3306, or an SSH server on port 443. The GPS (Global Port Scanning) project needed to scan all 65,535 ports across the internet, not just the common ones, but existing tools forced an impossible tradeoff: either scan quickly without knowing what you found, or fingerprint thoroughly while taking months to complete. LZR, developed by Stanford's Empirical Security Research Group and presented at USENIX Security 2021, solves this by sitting between rapid connection establishment and full protocol analysis, using minimal probes to make intelligent routing decisions at scale.
Technical Insight
LZR's architecture operates as an interception layer that hijacks connections opened by ZMap before the kernel can interfere. When ZMap completes a TCP handshake, the Linux kernel would normally send RST packets to close these unsolicited connections. LZR prevents this using iptables rules to drop outgoing RSTs, then captures incoming packets using libpcap and responds with raw socket transmission. This kernel bypass is essential—without it, remote servers would see connection resets before LZR could probe them.
The tool's efficiency comes from its probe strategy. Rather than sending protocol-specific handshakes sequentially, LZR dispatches a carefully designed set of universal probes that trigger recognizable responses from multiple protocols simultaneously. The first probe is typically a TLS Client Hello, which identifies HTTPS, SMTPS, IMAPS, and other TLS-wrapped services. The second is often an HTTP GET request or SSH probe. By analyzing response patterns—packet timing, size, content signatures—LZR categorizes services into one of over 35 protocol fingerprints.
Here's a simplified example of how you might configure and run LZR in pipeline mode with ZMap:
# Setup iptables to prevent kernel RST interference
sudo iptables -A OUTPUT -p tcp --tcp-flags RST RST -j DROP
# Run ZMap to initiate connections on port 443 across a target range
zmap -p 443 10.0.0.0/8 -o - | \
lzr \
--source-ip=192.168.1.100 \
--gateway-mac=00:11:22:33:44:55 \
--interface=eth0 \
--output-file=results.json \
--timeout=10s
The --source-ip parameter is critical because LZR needs to know which IP address to use when crafting response packets—it can't rely on the kernel's routing table since it's operating below that layer. Similarly, --gateway-mac tells LZR where to send packets at the link layer, bypassing normal ARP resolution.
What makes LZR particularly clever is its 'HyperACKtive' filtering mechanism. Some stateful firewalls will ACK any port and maintain fake connection states, creating false positives that suggest services exist when they don't. LZR detects these by observing whether the remote host continues responding meaningfully to application-layer probes or simply ACKs everything generically. This filtering is crucial for research accuracy—without it, large-scale scans would be polluted with millions of firewall artifacts masquerading as real services.
The concurrent architecture uses a worker pool pattern where connection states are tracked in memory maps. Each worker goroutine processes captured packets, updates state machines for ongoing probes, and emits results when fingerprinting completes or timeouts occur:
// Simplified conceptual structure (not actual LZR code)
type Connection struct {
RemoteIP net.IP
RemotePort uint16
State string
Probes []ProbeResult
Timeout time.Time
}
func worker(packets <-chan Packet, results chan<- Result) {
connections := make(map[string]*Connection)
for pkt := range packets {
key := connectionKey(pkt)
conn := connections[key]
// Update state based on response
fingerprint := analyzeResponse(conn, pkt)
if fingerprint.Confident || time.Now().After(conn.Timeout) {
results <- Result{IP: conn.RemoteIP, Protocol: fingerprint}
delete(connections, key)
}
}
}
LZR also supports standalone mode where it reads IP:port pairs and initiates its own SYN packets, making it independent of ZMap for certain use cases. This mode includes built-in rate limiting and IPv6 support, though it sacrifices some of ZMap's extreme speed optimizations. The output is JSON-structured, containing not just protocol identification but also response snippets, timing data, and confidence scores that downstream analysis can leverage.
Gotcha
LZR's setup requirements are genuinely complex and can frustrate even experienced users. The iptables kernel bypass isn't a simple configuration option—it's a fundamental architectural requirement that means LZR must run with root privileges and can interfere with other network operations on the same machine. If you're running other services on your scanning host, you risk breaking them when you DROP all outgoing RSTs globally. The documentation recommends dedicated scanning infrastructure with careful source IP and gateway configuration, which makes casual experimentation difficult. Getting the gateway MAC address wrong results in silent failures where packets are crafted but never leave the host properly.
The ZMap dependency, while powerful, creates integration friction. You can't simply point LZR at a target like you would with nmap—you need to understand ZMap's output format, coordinate timing between the tools, and handle the stateful coordination between connection establishment and probing. The race condition caveat in HyperACKtive filtering is particularly tricky: if a legitimate service responds slowly while a firewall responds quickly, LZR might misclassify the connection. The timeout values are critical tuning parameters, and there's no universal setting that works across all network conditions. For researchers scanning from academic networks with good connectivity, the defaults work; for scans from cloud providers with variable latency, you'll need experimentation and possibly custom timeout profiles.
Verdict
Use if: You're conducting internet-wide security research where bandwidth efficiency and scan speed are critical, you need to identify services on non-standard ports across millions of hosts, you have the infrastructure to dedicate scanning machines with proper kernel configuration, and you're already comfortable with the ZMap/ZGrab ecosystem. LZR excels in academic research contexts, large-scale security audits across entire autonomous systems, or when building network census datasets where precise protocol identification at scale justifies the setup complexity. Skip if: You're doing penetration testing on individual networks or small target sets where nmap's comprehensive features and ease of use outweigh speed concerns, you need a tool that works out-of-the-box without kernel modifications and root access, or you want standalone operation without coordinating multiple specialized tools. For most security practitioners doing routine assessments, the operational overhead isn't justified—LZR is a research instrument, not a daily-use scanner.