Building Your Own Shodan: A Deep Dive into IVRE’s Network Reconnaissance Architecture
Hook
What if you could build your own Shodan, with full control over the data and zero monthly fees? IVRE makes this possible by aggregating scan results from Nmap, Masscan, Zeek, and a dozen other tools into a single queryable database—completely self-hosted.
Context
Commercial network reconnaissance platforms like Shodan and Censys have become indispensable for security teams, but they come with significant tradeoffs. You’re limited to their scanning schedules, you can’t control what gets scanned, your sensitive queries are visible to the vendor, and subscription costs scale painfully with team size. For organizations with compliance requirements, sensitive targets, or specific reconnaissance needs, these platforms simply don’t work.
IVRE emerged from this gap. Its official name “Instrument de veille sur les réseaux extérieurs” translates to “Instrument for monitoring external networks,” and it provides the infrastructure to build custom reconnaissance platforms. Instead of replacing individual tools like Nmap or Zeek, IVRE acts as the aggregation and analysis layer—normalizing heterogeneous scan data into unified schemas, providing query interfaces similar to Shodan’s filters, and offering both passive network monitoring and active scanning capabilities. The result is a framework that security teams can deploy on their own infrastructure, scanning what they need, when they need it, with complete data sovereignty.
Technical Insight
IVRE’s architecture revolves around a clever abstraction: it treats all reconnaissance tools as data sources that feed into pluggable storage backends. This separation of concerns means you can swap MongoDB for PostgreSQL or Elasticsearch without changing your scanning workflows, and you can correlate results from wildly different tools through a unified query interface.
The framework divides reconnaissance into different data categories, each with its own database schema and CLI tooling. Active scan results from Nmap, Masscan, ZGrab2, Nuclei, and ProjectDiscovery tools flow into one category. Passive network traffic metadata from Zeek (formerly Bro), p0f, and Argus captures goes into another. Flow records from Nfdump and specialized collections like DNS resolution data from ZDNS or HTTP metadata from httpx are handled separately. Each category appears to be accessed through dedicated Python modules that abstract the underlying storage engine.
Here’s how IVRE normalizes Nmap XML output into its schema. When you run a scan and import it, IVRE doesn’t just dump the XML into a database—it parses service fingerprints, extracts CPE identifiers, normalizes port states, and creates indexed structures for fast querying:
# Import Nmap scan results
import ivre.xmlnmap
import ivre.db
db = ivre.db.db.nmap
# Parse and store Nmap XML file
with open('scan_results.xml', 'rb') as fdesc:
parser = ivre.xmlnmap.Nmap2DB(db)
parser.parse(fdesc)
# Query for hosts with specific service
for host in db.get(db.searchservice('http')):
print(f"{host['addr']}: {host.get('ports', [])}")
The real power emerges when you combine multiple data sources. IVRE’s query DSL lets you build complex filters that would require custom code with individual tools. The CLI tools backed by the Python API allow querying across these different data sources.
For passive reconnaissance, IVRE includes specialized tooling to process Zeek (Bro) logs continuously. You deploy Zeek sensors on network taps or monitoring points, configure them to send logs to IVRE’s import tools, and suddenly you have historical data on every connection, DNS query, and HTTP request flowing through your network. This enables use cases like building a private passive DNS service or tracking how attackers are fingerprinting your infrastructure.
The web interface provides query building capabilities that rival Shodan’s UX, offering faceted search, geographic visualization, and the ability to pivot through related hosts. The interface generates queries against the same Python API that CLI tools use, which means you can prototype queries in the web UI and then automate them via scripts.
IVRE’s multi-backend support is implemented through a database abstraction layer that maps its internal schemas to each engine’s native capabilities. MongoDB gets document-based storage with its aggregation pipeline. PostgreSQL uses JSONB columns for flexible schemas while leveraging relational features for metadata. Elasticsearch provides full-text search across service banners and HTTP bodies. The abstraction gives you deployment flexibility across these different backends.
Gotcha
IVRE’s biggest limitation is that it’s fundamentally a framework, not a turnkey solution. The documentation assumes you already understand how to run Nmap scans, configure Zeek sensors, and tune database performance. Setting up a production deployment means making architectural decisions about which backend to use, how to scale scanning infrastructure, and how to handle data retention—decisions that require expertise in both security tooling and systems administration.
The learning curve is steep because you’re really learning three things simultaneously: IVRE’s abstractions, the underlying tools it integrates, and the database backend you’ve chosen. Debugging requires understanding both IVRE’s query translation layer and your database’s execution plans. The documentation provides examples and help through CLI --help options and Python help() functions, but understanding how to optimize for production requires deeper expertise.
Scalability is another concern. While IVRE can theoretically handle internet-wide scanning like Shodan does, actually achieving that scale requires substantial infrastructure investment. You’ll need distributed scanning architecture (IVRE doesn’t provide this—you build it yourself with container orchestration or other tools), significant database hardware, and careful index tuning. For organizations scanning millions of hosts, expect to dedicate engineering resources to optimization. The framework is powerful, but it won’t magically scale without thoughtful architecture.
Verdict
Use IVRE if you need self-hosted reconnaissance infrastructure with complete data control—particularly organizations with compliance restrictions on SaaS usage, security teams monitoring internal networks where Shodan can’t reach, or researchers requiring custom scanning methodologies. It’s ideal when you’re already comfortable with command-line security tools and need to aggregate their results for historical analysis, EASM programs, or continuous asset monitoring. The investment in setup and maintenance pays off when you have ongoing reconnaissance needs and the technical expertise to operate it. Skip IVRE if you want a simple, turnkey solution for occasional reconnaissance tasks. Commercial platforms like Shodan or Censys are more appropriate when you’re scanning public internet assets sporadically, when you lack infrastructure for hosting databases and processing large scan volumes, or when your team doesn’t have Linux systems administration experience. Also skip it if you need immediate results—IVRE’s value compounds over time as you build historical data, making it a poor fit for one-off projects or quick assessments.