Back to Articles

LZR: Internet-Scale Service Fingerprinting with Two Extra Packets

[ View on GitHub ]

LZR: Internet-Scale Service Fingerprinting with Two Extra Packets

Hook

Most port scanners waste hundreds of packets trying to fingerprint services. LZR identifies 18 different protocols with just two extra packets beyond the TCP handshake—and it does it by sitting between your kernel and the network.

Context

When you scan the internet for security research, speed isn’t a luxury—it’s a requirement. Projects like Censys and Shodan need to fingerprint billions of hosts across all 65,535 ports to map the internet’s attack surface. Traditional tools like nmap excel at thorough scanning but take minutes per host when checking multiple protocols. Even fast scanners like masscan struggle with the combinatorial explosion: if you want to check whether port 9002 is running HTTP, TLS, SSH, or any of a dozen other protocols, you typically need separate connection attempts for each one.

The ZMap ecosystem solved the connection speed problem—it can scan the entire IPv4 internet in under an hour—but left application-layer fingerprinting as a bottleneck. ZGrab2 performs protocol handshakes downstream, but it still requires one full connection per protocol. LZR emerged from Stanford’s ESRG research group to solve a specific problem: how do you efficiently detect unexpected services (like HTTPS running on port 8080 or SSH on port 443) without multiplying your packet budget by the number of protocols you’re testing? The tool can detect up to 18 unique protocols simultaneously and fingerprint over 35 different protocols total. The answer involves intercepting TCP state at a level most developers never touch.

Technical Insight

Kernel Bypass

JSON: IP, ports, seq/ack nums

TCP ACK + Probes

Service Responses

HTTP/TLS/SSH fingerprints

blocks

ZMap Scanner

LZR TCP Interceptor

Raw Socket Handler

Connection State Table

Protocol Probe Engine

Target Hosts

Results Output

iptables RST Drop

System architecture — auto-generated

LZR’s architecture is elegant because it exploits a narrow window in the TCP handshake that most tools ignore. When ZMap completes a three-way handshake (SYN, SYN-ACK, ACK), LZR intercepts that final ACK packet and immediately attaches protocol probes to it. Instead of waiting to see what the server sends first, LZR speculatively sends multiple protocol-specific greetings—an HTTP GET request, a TLS ClientHello, an SSH version string—all in rapid succession.

The trick is that LZR never lets the kernel see these connections. During build, you specify a dedicated source IP and LZR configures iptables to drop all RST packets from that address:

make all source-ip=<your-source-ip>

This prevents the kernel from killing connections it doesn’t know about. LZR then uses raw sockets to craft packets with exact sequence numbers matching ZMap’s handshakes. Here’s how you’d scan port 9002 for unexpected HTTP and TLS services:

sudo zmap --target-port=9002 --output-filter="success = 1 && repeat = 0" \
  -f "saddr,daddr,sport,dport,seqnum,acknum,window" -O json --source-ip=$source-ip | \
sudo ./lzr --handshakes http,tls

ZMap outputs JSON with TCP state information (sequence numbers, ACK numbers, window sizes), and LZR consumes this stream in real-time. For each completed handshake, LZR maintains connection state and injects probes. The --handshakes flag accepts a comma-separated list of protocols, and LZR sends them all within two packets by packing multiple probes together.

LZR includes HyperACKtive filtering to address a problem unique to internet-scale scanning: stateful firewalls that ACK any connection attempt regardless of whether a real service exists behind them. LZR sends random probes to ephemeral ports (controlled by the -haf flag) and watches for responses. If a host ACKs both legitimate and random ports, LZR marks it as a false positive.

For deeper analysis, LZR can feed results directly to ZGrab2 for complete protocol handshakes. Replace the port number in etc/all.ini and pipe through:

sudo zmap --target-port=9002 --output-filter="success = 1 && repeat = 0" \
  -f "saddr,daddr,sport,dport,seqnum,acknum,window" -O json --source-ip=$source-ip | \
sudo ./lzr --handshakes wait,http,tls -feedZGrab | \
zgrab multiple -c etc/all.ini

LZR also supports standalone operation without ZMap by using the -sendSYNs flag, though you’ll need your gateway’s MAC address and a rate limit:

sudo ./lzr --handshakes http -sendSYNs -sourceIP $source-ip \
  -gatewayMac $gateway -rate 10000 <services_list

For IPv6 deployments, LZR automatically detects the address family by checking for the ’.’ character in the source IP string—if absent, it uses ip6tables instead of iptables. The --ipv6 flag enables IPv6-specific packet handling. This dual-stack support is crucial for mapping both address spaces in modern internet surveys.

Gotcha

LZR’s biggest limitation is operational complexity. You need root access, dedicated source IPs, and the discipline to maintain iptables rules that suppress kernel RST packets. If you accidentally scan from the wrong source IP, your kernel will race LZR to send RSTs, poisoning your results. The setup isn’t portable—you can’t just go install it and start scanning from a laptop at a coffee shop.

The HyperACKtive firewall filtering has a race condition documented in the README: whichever response arrives first (legitimate service or random port probe) determines whether a host gets flagged as having an ACKing firewall. On asymmetric networks or under packet loss, this can cause inconsistent classifications. The documentation acknowledges this caveat but doesn’t provide mitigation strategies beyond retransmission counts.

LZR is also tightly coupled to the ZMap ecosystem’s assumptions about internet-scale scanning. It expects a firehose of JSON containing TCP state from ZMap, and its threading model (-w flag for workers) is optimized for processing millions of hosts, not the hundreds you’d see in typical pentesting. For small networks or one-off scans, the overhead of setting up LZR outweighs any performance benefit. The tool explicitly targets researchers conducting longitudinal studies of internet-wide service deployment, not practitioners doing quarterly vulnerability assessments.

Verdict

Use LZR if you’re conducting academic research or threat intelligence work that requires scanning millions of hosts to detect protocol anomalies, shadow IT, or service misconfigurations across the entire IPv4 or IPv6 internet. It’s the right tool when you need to answer questions like ‘how many hosts run HTTPS on non-standard ports?’ or ‘what percentage of SSH servers appear on ports other than 22?’ The two-packet efficiency becomes significant at ZMap scale, and the HyperACKtive filtering is valuable when false positives from stateful firewalls would skew your datasets. Skip LZR for penetration testing, bug bounty hunting, or any scenario where you’re scanning fewer than 100,000 hosts. The operational complexity—dedicated IPs, iptables rules, raw socket privileges—isn’t justified unless you’re operating at internet scale. For typical security work, nmap’s service detection or masscan’s banner grabbing provides better usability with sufficient performance. Only reach for LZR when you need to process millions of hosts and can dedicate infrastructure to support its architectural requirements.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-dev-tools/stanford-esrg-lzr.svg)](https://starlog.is/api/badge-click/ai-dev-tools/stanford-esrg-lzr)