Recog: How Rapid7 Fingerprints the Internet with Regex and XML
Hook
When security tools identify that a server is running a specific version of OpenSSH from a single SSH banner, they’re likely using Recog—a framework that matches network fingerprints against a crowdsourced database of regex patterns maintained in XML files.
Context
Network reconnaissance has always faced a fundamental challenge: how do you reliably identify what’s running on a remote system when all you have is a banner string like “SSH-2.0-OpenSSH_7.4p1 Debian-10+deb9u7”? Traditional approaches required maintaining hardcoded regex patterns scattered across codebases, making them brittle, untestable, and impossible to share across tools.
Recog emerged from Rapid7’s need to standardize fingerprinting across their security product line. Rather than embedding identification logic in application code, they created a framework that separates the fingerprint database (XML files with regex patterns and test cases) from the matching implementation (libraries in Ruby, Java, and Go). This architectural split means security researchers can contribute fingerprints without touching application code, while developers can integrate identification patterns simply by parsing XML files. The result is a community-maintained fingerprint database designed for identifying products, services, operating systems, and hardware.
Technical Insight
Recog’s architecture treats fingerprints as data, not code. Each XML file targets a specific protocol or service field—SSH banners, HTTP Server headers, SNMP system descriptions, TLS JARM hashes, FTP banners, HTTP cookies, HTTP WWW-Authenticate headers, and favicons. The XML structure defines both the pattern and the structured data to extract:
<fingerprints matches="ssh.banner">
<fingerprint pattern="^RomSShell_([\d\.]+)$">
<description>Allegro RomSShell SSH</description>
<example service.version="4.62">RomSShell_4.62</example>
<param pos="0" name="service.vendor" value="Allegro"/>
<param pos="0" name="service.product" value="RomSShell"/>
<param pos="1" name="service.version"/>
</fingerprint>
</fingerprints>
The param elements extract structured metadata from regex capture groups. pos="0" indicates a static value, while pos="1" extracts the first capture group (the version number in this example). This transforms an opaque banner string into queryable fields: vendor, product, version, and other details including CPE identifiers for vulnerability correlation.
The framework enforces test-driven fingerprint development. Every fingerprint must include at least one <example> tag showing the exact string it should match. When you run the verification tools, Recog loads each XML file, extracts the examples, runs them through the regex patterns, and validates that the extracted parameters match expectations. This creates a continuous integration loop—contributions that break existing fingerprints fail automated tests.
The repository split in March 2022 separated concerns elegantly. The main rapid7/recog repository contains only the XML fingerprints and verification utilities. Language-specific implementations (recog-ruby, recog-java, recog-go) include this content repository as a git submodule, ensuring all implementations work from the same fingerprint database. When security teams discover a new service variant, they commit one XML update that propagates to Java tools, Ruby scripts, and Go services simultaneously.
For matching operations, implementations load the appropriate XML file based on the data type (HTTP header, SSH banner, etc.), then iterate through fingerprints until finding a match. The extracted parameters become a structured dataset suitable for populating database fields, generating reports, or feeding into vulnerability scanners.
The fingerprint files support filesystem-based external examples for sensitive or large test cases that shouldn’t live in version control, and base64-encoded examples for handling unprintable characters. An optional flags attribute on fingerprint patterns controls regex interpretation behavior.
Gotcha
The README includes a critical disclaimer that security teams often miss: “the Ruby codebase is still fairly new and subject to change quickly. Please contact us before leveraging the Recog code within any production projects.” This warning applies specifically to the Ruby library implementation code in this repository, not the XML fingerprints themselves. For production systems, you want the stable recog-ruby gem (version 3.0.0+), which includes the fingerprints as a submodule and provides only the recog_match tool (other fingerprint management tools were removed from the gem in the repository split).
Regex-based fingerprinting has fundamental limitations that no framework can eliminate. Software vendors change banner formats without warning, potentially invalidating existing fingerprints. Conversely, attackers intentionally manipulate banners to evade detection—changing service banners is trivial. Recog gives you community-maintained patterns, but it can’t solve the cat-and-mouse game of fingerprint evasion. False positives can occur, especially with overly generic patterns, and coverage gaps exist for proprietary or obscure services that the community hasn’t encountered yet.
The repository split also means the recog-ruby gem (3.0.0+) has a different structure than earlier versions—the XML fingerprints are nested under a recog directory (as a git submodule), and only the recog_match tool is included since fingerprint management tools focus on content development rather than matching operations.
Verdict
Use Recog if you’re building security tools (scanners, asset inventory, threat detection) and need proven fingerprints for common services—the XML database appears to be well-maintained based on the active development visible in the repository structure. It’s especially valuable if you need multi-language support or want to contribute fingerprints back to the security community. The separation of fingerprints from code means you can integrate the XML files into any language, not just the three official implementations (Ruby, Java, Go). Skip it if you need guaranteed accuracy for compliance or billing purposes—regex fingerprinting is heuristic by nature and will produce false positives and false negatives. Also skip if you’re doing simple, single-service identification where writing a few custom regexes is faster than managing git submodules and parsing XML. For production Ruby projects, use the recog-ruby gem (version 3.0.0+) directly rather than code from this repository, which the README indicates is subject to change. Best suited for security practitioners who value community-maintained intelligence over rolling their own fingerprint database from scratch.