Building a Self-Hosted Attack Surface Scanner with Natlas: Architecture Deep-Dive
Hook
Natlas started calling itself an 'Attack Surface Management' platform in 2015, three years before Gartner coined the term as a market category. Sometimes the open-source community just builds what it needs without waiting for analysts to name it.
Context
Security teams traditionally ran nmap scans manually or with cron jobs, dumping XML output to files that nobody systematically reviewed. When you manage hundreds or thousands of IP addresses, this approach breaks down fast. You lose historical context—when did that port open? You can't search across scan results. Multiple team members duplicate work. And coordinating scans across different network segments becomes a nightmare of shell scripts and shared spreadsheets.
Natlas emerged to solve the orchestration problem: how do you continuously scan your entire attack surface, store results in a queryable format, and distribute the work across multiple scanning agents without manual coordination? The answer is a client-server architecture where agents poll for work, the server acts as a coordination layer, and Elasticsearch provides the search and aggregation engine. It's not trying to be a vulnerability scanner—tools like OpenVAS already do that. Instead, Natlas focuses on exposure monitoring: what services are running, what certificates are expiring, what configurations have changed.
Technical Insight
The architecture hinges on three components working in concert: the server (Flask application), agents (Python workers), and Elasticsearch (data layer). The server doesn't push work to agents—agents pull it. This design decision eliminates the need for the server to track agent availability or handle connection failures to remote workers.
The work distribution algorithm is particularly clever. Instead of maintaining a centralized queue, Natlas uses a cyclic pseudo-random number generator seeded with the current cycle ID. When an agent requests work, the server generates a random IP from the configured scan ranges using the PRNG. Multiple agents with the same cycle seed will eventually cover all IPs without explicit coordination. If an agent crashes mid-scan, another agent will eventually get assigned the same target in a future cycle. Here's the core logic simplified:
# Simplified from natlas-server's work distribution
import random
def get_work_target(scope_ranges, cycle_id, completed_targets):
random.seed(cycle_id)
# Generate candidate targets using PRNG
max_attempts = 100
for _ in range(max_attempts):
target = random.choice(scope_ranges).random_ip()
# Check if already scanned this cycle
if target not in completed_targets:
return target
# Fallback: cycle complete, increment cycle_id
return None
When an agent receives a target, it constructs an nmap command with server-defined NSE scripts and scan flags. This is where Natlas's flexibility shines—you're not limited to predefined checks. Want to run ssl-cert, http-title, and custom NSE scripts you wrote for your environment? Configure it server-side, and all agents inherit the scanning profile:
# Agent executing scan with configurable nmap options
import subprocess
import json
def execute_scan(target, nmap_flags, nse_scripts):
cmd = [
'nmap',
'-oX', '-', # XML output to stdout
'-sV', # Service version detection
'--script', ','.join(nse_scripts),
nmap_flags,
target
]
result = subprocess.run(cmd, capture_output=True, timeout=600)
# Parse XML, extract structured data
parsed = parse_nmap_xml(result.stdout)
# Add metadata before submitting to server
parsed['agent_id'] = AGENT_ID
parsed['scan_time'] = time.time()
return parsed
The agent submits results back to the server as JSON, which then indexes them in Elasticsearch. The Elasticsearch schema is designed for time-series queries—each scan result is a document with a timestamp, enabling queries like "show me all hosts that exposed port 22 in the last week but don't expose it now" or "which certificates expire in the next 30 days."
The web interface provides a search DSL that maps to Elasticsearch queries. You can search by port, service, SSL certificate subject, HTTP headers, or any field nmap returns. The power comes from aggregating data over time. Instead of a single point-in-time scan, you see trends: this host started running Redis six days ago, that host's SSH version changed yesterday.
Docker deployment wraps these components into orchestrated containers. The docker-compose setup includes the Flask server, Elasticsearch, and optionally agent containers. For production, you'd run agents on separate hosts or in different network zones:
# Simplified docker-compose structure
services:
server:
build: ./natlas-server
environment:
- ELASTICSEARCH_URL=http://elasticsearch:9200
ports:
- "5000:5000"
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.x
environment:
- discovery.type=single-node
volumes:
- esdata:/usr/share/elasticsearch/data
agent:
build: ./natlas-agent
environment:
- NATLAS_SERVER_URL=http://server:5000
# Agents can be scaled: docker-compose up --scale agent=5
One architectural choice that stands out: the server is stateless except for Elasticsearch. If the server crashes and restarts, agents continue working because they only need HTTP connectivity to submit results. There's no session state to lose. This design makes horizontal scaling straightforward—run multiple server instances behind a load balancer, and agents distribute requests across them automatically.
Gotcha
The Docker-only deployment since 2020 is both a strength and a limitation. It simplifies setup dramatically—no more Python dependency hell or Elasticsearch configuration mistakes. But if your organization has policies against containers, or you need to run agents on legacy systems that can't run Docker, you're stuck. The pre-2020 codebase supported native Python installation, but maintaining that alongside Docker proved too much overhead for the maintainers.
Elasticsearch resource requirements can surprise you. A small deployment scanning 1,000 hosts might seem manageable, but Elasticsearch wants 2-4GB RAM minimum, and that's before you account for index growth over time. If you're scanning daily and retaining history, storage grows linearly. There's no built-in data retention policy—you have to configure Elasticsearch index lifecycle management yourself or risk filling your disks. For teams expecting a lightweight tool, the infrastructure overhead is real.
Natlas also generates aggressive network traffic. Running default nmap scripts against production services can cause application slowdowns or trigger rate limiting. Some NSE scripts are intrusive—they send malformed packets or enumerate users. You need to carefully curate which scripts run in your scanning profile, and even then, expect IDS alerts. One team reported their SOC flagged Natlas agents as potential attackers until they whitelisted the scanning infrastructure. Continuous scanning means continuous noise.
Verdict
Use if: You manage a large IP space (500+ hosts) and need continuous visibility into what's exposed, you have Elasticsearch expertise in-house or are willing to learn it, and you want the flexibility to customize scans with arbitrary nmap NSE scripts. It's particularly valuable if you need historical tracking—knowing when services appeared or disappeared is often more important than knowing the current state. The distributed agent model also shines when you need to scan across multiple network segments or cloud regions. Skip if: You're looking for one-off scans (just use nmap directly), you need a turnkey vulnerability scanner with CVE detection (look at OpenVAS), you can't run Docker in your environment, or you lack the resources to maintain an Elasticsearch cluster. Also skip if you need cloud-hosted solutions with vendor support—Natlas is self-hosted open source, which means you're on your own for troubleshooting and scaling.