Back to Articles

Building Your Own Shodan: How IVRE Turns Network Recon Tools Into Intelligence Infrastructure

[ View on GitHub ]

Building Your Own Shodan: How IVRE Turns Network Recon Tools Into Intelligence Infrastructure

Hook

Every Shodan query you run sends your reconnaissance targets to a third party. IVRE lets you build the same capabilities in-house, aggregating data from a dozen tools into a unified intelligence platform you completely control.

Context

Network reconnaissance has bifurcated into two camps: those who use commercial platforms like Shodan and Censys for instant gratification, and those who run individual tools like Nmap or Masscan for specific scans. The former trades privacy and control for convenience—your search queries reveal your targets, and you're limited to what the platform has scanned. The latter generates fragmented data silos where correlating last month's Nmap results with today's Zeek logs requires custom scripts.

IVRE (Instrument de veille sur les réseaux extérieurs, or "External Networks Monitoring Instrument") emerged from this gap. Created for organizations that need the correlation power of Shodan but can't outsource their reconnaissance data—think defense contractors, financial institutions, or red teams—it provides a Python-based aggregation layer that normalizes output from 14+ tools into MongoDB, PostgreSQL, or Elasticsearch. The result is a self-hosted reconnaissance platform where passive DNS monitoring, active vulnerability scans, and service fingerprints coexist in queryable formats, with full data lineage and zero third-party exposure.

Technical Insight

IVRE's architecture revolves around three core components: parsers, database abstraction, and query interfaces. The parsers are where the framework's value proposition crystallizes. Rather than forcing you to write custom XML parsers for Nmap or JSON handlers for Masscan, IVRE provides battle-tested ingestion pipelines that extract structured data while preserving tool-specific nuances.

Consider how IVRE handles Nmap scan ingestion. When you run a scan and feed the XML output to IVRE, the parser doesn't just extract open ports—it captures script outputs, service versions, OS fingerprints, and traceroute data, then normalizes them into a unified schema. Here's a practical example:

from ivre.db import db
from ivre import xmlnmap

# Parse Nmap XML and insert into database
with open('scan_results.xml', 'rb') as f:
    parser = xmlnmap.Nmap2DB(db.nmap)
    parser.parse(f)

# Query for hosts with specific vulnerabilities
for host in db.nmap.get(db.nmap.searchcve('CVE-2023-12345')):
    print(f"{host['addr']}: {host['ports']}")
    
# Find all hosts running OpenSSH versions < 8.0
for host in db.nmap.get(
    db.nmap.searchproduct('OpenSSH', version_max='8.0')
):
    print(f"{host['addr']}: OpenSSH {host['service_version']}")

The database abstraction layer is IVRE's second architectural strength. The db module provides a unified API regardless of whether you're running MongoDB, PostgreSQL, or Elasticsearch. This isn't just connection pooling—IVRE translates high-level query constructs into backend-specific operations. When you chain filters like searchcve() and searchport(), the framework optimizes the query for your chosen database, handling index selection and aggregation pipeline construction.

The passive data pipeline demonstrates IVRE's design sophistication. When monitoring network traffic with Zeek (formerly Bro), IVRE ingests connection logs, DNS queries, HTTP headers, and SSL certificates into the same database as active scans. This creates temporal correlation opportunities that standalone tools can't provide:

# Correlate passive DNS with active scan data
from ivre.db import db

# Find domains resolved to a target IP in passive DNS
passive_records = db.passive.get(
    db.passive.searchhost('192.0.2.15')
)

# Cross-reference with active scan results for the same IP
active_data = db.nmap.get(
    db.nmap.searchhost('192.0.2.15')
)

# Identify service changes over time
for record in passive_records:
    print(f"DNS: {record['value']} -> {record['addr']} at {record['firstseen']}")
    
for scan in active_data:
    print(f"Scan: {scan['addr']} had {len(scan['ports'])} ports open at {scan['starttime']}")

The recently added MCP (Model Context Protocol) server integration represents a forward-thinking architectural decision. IVRE now exposes reconnaissance data as tools that LLM agents can query, enabling natural language reconnaissance workflows. An AI agent can ask "What services are running on hosts in 10.0.0.0/8 with CVE-2023-12345?" and receive structured data without you writing custom API integrations. This bridges the gap between raw reconnaissance data and decision-making, though it requires careful prompt engineering to avoid hallucinated security findings.

IVRE's CLI tools (ivre scan2db, ivre db2view, ivre ipinfo) follow Unix philosophy—each does one thing well and composes via pipes. The scan2db tool handles bulk imports, accepting Nmap XML, Masscan JSON, or Zeek logs. The db2view command creates materialized views for faster web UI queries, pre-computing aggregations that would otherwise timeout on large datasets. For automation, ivre runscans orchestrates scanning campaigns, managing target lists, rate limiting, and result ingestion in a single command—essentially turning IVRE into a self-service scanning platform.

One underappreciated feature is the tag and category system. You can annotate scan results with custom metadata—"production", "external", "aws-us-east-1"—then filter across those dimensions. This transforms IVRE from a passive data store into an active EASM (External Attack Surface Management) tool where you track your organization's exposure over time, not just individual scan snapshots.

Gotcha

IVRE's primary limitation is operational complexity. You're not installing a single binary—you're deploying a Python application, a database cluster, background workers for data ingestion, and optionally a web frontend. The documentation provides Docker Compose files, but at scale, you'll need to tune database indices, manage disk space growth (a comprehensive internet scan can generate terabytes), and handle parser failures when tools output malformed data. If your organization doesn't already run database infrastructure, the operational overhead may exceed the tool's value.

Database performance degrades non-linearly with dataset size. MongoDB handles millions of hosts reasonably well, but complex queries that join passive DNS with active scan results across time ranges can take minutes without careful index design. The PostgreSQL backend theoretically offers better query optimization, but the IVRE team's primary focus has been MongoDB, so edge cases exist. Elasticsearch improves search speed but adds another component to maintain. There's no autoscaling magic—you'll need database administration skills to keep things responsive.

The web interface, while functional, feels like a secondary citizen compared to the CLI. Advanced filtering requires dropping into the command line or writing Python scripts. If your security team expects Shodan's polish—autocomplete, instant visual feedback, export to every format—they'll be disappointed. IVRE is power-user software that assumes comfort with terminals and scripting.

Verdict

Use IVRE if you're building reconnaissance infrastructure for an organization that can't use commercial platforms due to privacy, compliance, or operational security requirements. It's ideal for red teams maintaining persistent intelligence on client networks, enterprises running continuous attack surface monitoring, or researchers correlating passive and active data at scale. You should already have database operations expertise and Python developers who can extend parsers for custom tools. Skip it if you're an individual doing occasional scans, need immediate results without infrastructure investment, or lack the operational capacity to maintain databases and troubleshoot parsing pipelines. For ad-hoc reconnaissance, Shodan's API or standalone Nmap/Masscan will be faster. If you want centralized recon data but need enterprise support and managed services, commercial EASM platforms like Censys or Shodan Enterprise are more pragmatic despite their cost and privacy trade-offs.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/ivre-ivre.svg)](https://starlog.is/api/badge-click/cybersecurity/ivre-ivre)