Distributing Masscan Across Cloud Infrastructure: Inside pscan's Parallel Scanning Architecture

Hook

A single masscan instance can theoretically scan the entire Internet in under 6 minutes at 10 million packets per second. But what if you need to go faster, or your network can't sustain that rate? The answer is horizontal distribution.

Context

Port scanning large IP ranges has always been a bottleneck in security reconnaissance. While masscan revolutionized speed with asynchronous TCP scanning and custom network stacks, even this tool faces limitations: network egress throttling from cloud providers, bandwidth caps on single hosts, and firewall rate limiting that triggers defensive responses. When bug bounty hunters or red teams need to enumerate massive IP spaces across multiple autonomous systems, a single scanning host becomes the constraint, no matter how optimized the scanner.

This is where pscan enters the picture. Rather than improving the scanner itself, it solves the distribution problem: how do you coordinate multiple masscan instances across different cloud servers, split the workload intelligently, aggregate results, and tear down infrastructure when complete? By leveraging Axiom—a framework for dynamically provisioning fleets of cloud servers—pscan transforms masscan from a single-host tool into a distributed scanning platform. The result is linear scaling: need to scan faster? Spin up more instances.

Technical Insight

At its core, pscan is a Shell orchestration layer that manages three distinct phases: infrastructure provisioning, workload distribution, and result aggregation. The tool doesn't reimplement scanning logic; instead, it wraps masscan and delegates infrastructure concerns to Axiom, which handles the actual cloud provider API interactions for spinning up servers across AWS, DigitalOcean, GCP, or other providers.

The workload distribution mechanism is elegant in its simplicity. When you invoke pscan with a target IP range and specify the number of instances, the script divides the IP space into equal chunks. For example, scanning 10.0.0.0/8 across 10 instances means each server receives approximately 1.67 million IPs. This is accomplished through CIDR subnet splitting logic that ensures no overlap and complete coverage:

# Simplified example of how pscan distributes work
TARGET="10.0.0.0/8"
INSTANCES=10

# Axiom fleet provisioned
axiom-fleet myfleet -i $INSTANCES

# IP range split across fleet
prips $TARGET | split -n l/$INSTANCES - chunks_

# Distribute scanning jobs
for i in $(seq 1 $INSTANCES); do
  axiom-exec "masscan -iL chunks_$i -p 80,443,8080 --rate 50000 -oL scan_$i.txt" \
    myfleet$i &
done

wait

# Aggregate results
axiom-exec "cat scan_*.txt" myfleet* > final_results.txt

# Cleanup infrastructure
axiom-fleet myfleet -d

This approach leverages several key architectural decisions. First, pscan uses Axiom's axiom-exec for remote command execution rather than implementing SSH multiplexing itself, which means it inherits Axiom's built-in parallelism and connection pooling. Second, by splitting IP ranges at the Shell level using utilities like prips or mapcidr, pscan avoids language-specific dependencies—any system with standard Unix tools can generate the splits.

The rate limiting strategy is particularly noteworthy. Rather than having each instance scan at maximum speed (which could trigger upstream defensive responses), pscan allows per-instance rate configuration. This distributes not just the workload but the packet velocity, making the scan appear as organic traffic from different source IPs rather than a concentrated flood from a single address. Cloud providers also appreciate this: a single instance pushing 10M packets/second triggers abuse alerts, but ten instances each sending 1M packets/second often flies under the radar.

Result aggregation happens in two stages. Each Axiom instance writes masscan output to a local file in the standard format (typically list format with -oL for easier parsing). Once all scanning jobs complete, pscan uses axiom-exec to concatenate remote files and transfer them back to the local orchestrator. For large result sets, this can be optimized with compression:

# Compressed result retrieval
axiom-exec "gzip scan_*.txt" myfleet*
axiom-scp "myfleet*:~/scan_*.txt.gz" ./results/
gunzip ./results/*.gz
cat ./results/*.txt | sort -u > deduplicated_results.txt

The Shell-based architecture means pscan is highly transparent—you can read the entire execution flow in a few hundred lines of bash. But this transparency comes with trade-offs in error handling. If one Axiom instance fails mid-scan (network partition, provider issues, quota exceeded), pscan's error recovery depends on bash's error propagation, which is notoriously fragile. Unlike compiled languages with explicit exception handling, a failing subshell might silently complete, leaving gaps in scan coverage that only become apparent when analyzing results.

Gotcha

The most significant limitation is the hard dependency on Axiom's configuration complexity. Before pscan can run a single scan, you need Axiom properly configured with cloud provider credentials, SSH keys, instance images (packer builds), and network configurations. This isn't a five-minute setup—it requires understanding cloud IAM roles, API quotas, and network topology. If your Axiom configuration has issues (wrong region, insufficient permissions, exceeded quotas), pscan inherits those failures with minimal diagnostic feedback.

Cost management is another landmine. Spinning up ten c5.xlarge AWS instances for distributed scanning sounds fast until you realize you're burning $1.70 per instance per hour. A four-hour reconnaissance scan across multiple providers could easily cost $70-100, and if your cleanup script fails (server teardown doesn't execute), those instances keep running and accumulating charges. Unlike managed services with built-in cost controls, pscan gives you the raw power of cloud infrastructure with all the financial responsibility. There's no automatic circuit breaker if your scan runs longer than expected or if you accidentally provision 50 instances instead of 5. The Shell script won't stop you from expensive mistakes—it assumes you know what you're doing.

Verdict

Use pscan if you're conducting large-scale reconnaissance across extensive IP ranges (class A/B networks or multiple organizations), already have Axiom infrastructure configured and tested, understand cloud cost implications and can justify the expense, and need scan completion speed that justifies distributed architecture overhead. It's purpose-built for bug bounty hunters and red teams who regularly scan massive attack surfaces where time-to-discovery matters more than cost efficiency.

Skip pscan if you're scanning smaller networks (single class C or smaller), haven't invested time in Axiom setup and cloud provider configuration, lack budget for spinning up multiple scanning instances, need mature tooling with extensive error handling and recovery mechanisms, or want a fire-and-forget solution with minimal operational complexity. For most penetration testing scenarios, a properly configured single masscan instance provides sufficient speed without the coordination overhead. The complexity tax of distributed scanning only pays dividends at scale.

Distributing Masscan Across Cloud Infrastructure: Inside pscan's Parallel Scanning Architecture

Distributing Masscan Across Cloud Infrastructure: Inside pscan's Parallel Scanning Architecture

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Distributing Masscan Across Cloud Infrastructure: Inside pscan's Parallel Scanning Architecture

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]