Parsing Nmap Scans With Nothing But Bash: gnmap-parser's Zero-Dependency Approach
Hook
While modern security tools race to add ML capabilities and cloud integrations, one of the most starred Nmap parsers on GitHub runs entirely in 200 lines of Bash with zero dependencies.
Context
Network reconnaissance generates mountains of data. A single Nmap scan of a /24 network with comprehensive port scanning can produce thousands of lines of output detailing every open port, service version, and host state. Multiply this across dozens of subnets, multiple scan types (TCP, UDP, version detection), and weekly recurring scans, and you're drowning in text files that are difficult to query, compare, or feed into subsequent analysis tools.
Nmap offers three output formats: normal (human-readable), XML (machine-parseable but verbose), and greppable (.gnmap). The greppable format was specifically designed for Unix text processing—one-line-per-host records with consistent field separators. Yet most security teams either manually grep through these files or reach for heavy dependencies like Python with specialized XML parsing libraries. Jason Frank's gnmap-parser takes a different approach: embrace the greppable format's Unix-native design and build a parser using only the tools already installed on every security professional's system. This architectural choice trades some flexibility for guaranteed portability, making it possible to analyze scan results on air-gapped networks, embedded systems, or any environment where installing dependencies is bureaucratically impossible.
Technical Insight
The genius of gnmap-parser lies in understanding Nmap's greppable format structure and exploiting it with surgical text manipulation. Each line in a .gnmap file follows a strict pattern: Host: <IP> (<hostname>) Status: <state> for host status, and Host: <IP> (<hostname>) Ports: <port data> for discovered ports. Port data uses the format <port>/<state>/<protocol>/<owner>/<service>/<rpc>/<version>, with multiple ports separated by commas. This consistency means you can extract everything you need with grep, awk, and sed.
Here's how the script extracts all alive hosts and their IP addresses:
# From the gnmap-parser source
grep "Status: Up" *.gnmap |
awk '{print $2}' |
sort -V -u > Alive-Hosts-IP-Only.txt
grep "Status: Up" *.gnmap |
awk '{print $2,$3}' |
sed 's/[()]//g' |
sort -V -u > Alive-Hosts-IP-Hostname.txt
This pipeline filters for lines containing "Status: Up", extracts the second field (IP address) with awk, sorts by version number (handling IP addresses correctly with -V), removes duplicates with -u, and redirects to a file. The hostname variant adds field 3 and uses sed to strip the parentheses that Nmap wraps around hostnames. No XML parsing, no dependencies, just text streams.
The port extraction is where things get interesting. To generate a unique list of all discovered ports across all scans, the script parses the comma-separated port records:
grep "Ports:" *.gnmap |
cut -d" " -f4- |
tr ',' '\n' |
sed -e 's/^[ \t]*//' |
awk -F"/" '{print $1}' |
grep "^[0-9]" |
sort -n -u > Unique-Ports.txt
This pipeline cuts everything after "Ports:" (field 4 onward), translates commas into newlines (one port per line), trims whitespace, uses awk with / as the field separator to extract just the port number, filters for lines starting with digits (avoiding protocol headers), and sorts numerically. The result is a clean list of every unique port discovered across potentially hundreds of scan files.
The most sophisticated output is the CSV matrix showing which hosts have which ports open. The script generates both IP-only and IP-with-hostname versions:
for IP in $(cat Alive-Hosts-IP-Only.txt); do
echo -n "$IP,"
for PORT in $(cat Unique-Ports.txt); do
grep "$IP.*Ports:" *.gnmap | grep -q " $PORT/open/"
if [ $? -eq 0 ]; then
echo -n "$PORT,"
else
echo -n ","
fi
done
echo ""
done > IP-Only-Matrix.csv
This nested loop structure—iterating through each alive host, then checking each unique port—is computationally inefficient (O(n*m) where n is hosts and m is ports), but it works reliably and the code is transparent. For each host-port combination, it greps the scan files for that IP's port data, then greps again specifically for that port in an open state. The exit status determines whether to output the port number or just a comma (empty cell).
The gathering mode demonstrates another practical insight—security assessments often produce scan files scattered across directory hierarchies (organized by date, subnet, scan type, etc.). Rather than requiring users to manually consolidate files, the script provides a one-command solution:
find $TARGET -name "*.gnmap" -exec cp {} . \;
This finds all .gnmap files in the target directory tree and copies them to the current working directory, creating a flat structure for batch processing. While this duplicates data, it solves a real workflow problem: centralizing distributed scan results without manual file management.
The entire parsing architecture relies on Nmap's greppable format being consistent and well-structured. This is a bet on stability—the .gnmap format has remained largely unchanged for over a decade because breaking it would destroy countless scripts in the wild. By building on this stable foundation with standard Unix tools, gnmap-parser achieves both portability and longevity.
Gotcha
The file-copying requirement is the biggest practical limitation. If you have 50GB of historical scan data, gathering mode will duplicate all of it into your working directory. The script doesn't parse in-place because its design assumes you want to analyze a consolidated set of scans, but this architectural choice becomes painful with large datasets or limited disk space. There's no streaming mode, no option to process files from their original locations, and no cleanup functionality to remove the copies after parsing.
The .gnmap extension requirement is inflexible. If you've archived scans with naming conventions like scan-2024-01-15.nmap.greppable or compressed them as .gnmap.gz, you'll need to rename or uncompress everything before the script will recognize them. This is a consequence of using simple glob patterns (*.gnmap) rather than more sophisticated file detection. Additionally, the script provides no customization options—you can't request just the unique ports list without also generating the IP lists and CSV matrices. Every run produces all output formats, which means unnecessary processing time if you only need one specific view of the data. The matrix generation, in particular, can take significant time on large datasets due to its nested loop structure that greps through files repeatedly. For a scan set with 1,000 hosts and 100 unique ports, that's 100,000 grep operations.
Verdict
Use if: You're working in restricted environments where installing Python or other interpreters isn't an option (air-gapped networks, locked-down compliance systems, embedded devices), you need to quickly analyze Nmap scans without setting up parsing infrastructure, you're consolidating scan results from multiple sources and want standard output formats (IP lists, port lists, matrices) for spreadsheet import or further processing, or you value transparency and want to understand exactly how your scan data is being parsed without reading XML parsing library documentation. Skip if: You're processing large scan archives where duplicating files is impractical, you need custom output formats or selective data extraction rather than all-or-nothing parsing, you want to parse scans in-place without file reorganization, you're building automation that needs programmatic access to parsed data structures rather than text files, or you're already invested in XML-based Nmap workflows and need the additional detail that XML captures but greppable format omits (like script output, timing data, or service fingerprints).