CeleryStalk: Building a Distributed Security Scanner on Top of Python's Task Queue

Hook

Most security scanners treat your network like a checklist. CeleryStalk treats it like a distributed systems problem—and that changes everything about how enumeration scales.

Context

Penetration testing workflows are embarrassingly parallel problems disguised as sequential tasks. You discover a network with 200 hosts, identify 1,500 services across them, and need to run nikto on every web service, gobuster on every HTTP endpoint, enum4linux on every SMB share, and screenshot everything. The naive approach—bash scripts with sequential execution—means your eight-core machine sits mostly idle while nmap crawls through hosts one by one.

The typical solution is GNU Parallel or xargs with threading, but these approaches lack orchestration intelligence. They don't understand service context (running sqlmap against an SSH port wastes time), can't pause and resume granular tasks mid-scan, and offer no visibility into which specific tool-host combinations succeeded or failed. CeleryStalk emerged from this gap: what if we treated security enumeration as a distributed task queue problem, where Celery workers pull jobs from a queue and execute tools based on intelligent service detection?

Technical Insight

System architecture — auto-generated

CeleryStalk's architecture is essentially a domain-specific language built atop Celery's task queue system. At initialization, it ingests Nmap XML or Nessus files, parses service detection results, and populates a SQLite database with a normalized schema of hosts, ports, and services. The clever part is the service mapping layer—a configuration file that defines which tools should run against which service signatures.

Here's a simplified example of how the tool mapping works:

# From config file - maps service patterns to tool commands
[
  {
    "service": "http",
    "tool": "nikto",
    "command": "nikto -h {target} -p {port} -o {output}"
  },
  {
    "service": "http",
    "tool": "gobuster",
    "command": "gobuster dir -u {scheme}://{target}:{port} -w /usr/share/wordlists/dirb/common.txt"
  },
  {
    "service": "smb",
    "tool": "enum4linux",
    "command": "enum4linux -a {target}"
  }
]

When a service is identified, CeleryStalk queries this mapping, performs variable substitution, and dispatches each matching command as a Celery task. The beauty is that Celery handles all the distributed execution complexity—task routing, retry logic, result storage, and worker health monitoring—while CeleryStalk focuses on the security-specific orchestration.

The dual-mode scoping model demonstrates thoughtful design for different engagement types. In VAPT (Vulnerability Assessment and Penetration Testing) mode, scope is IP-centric: you define target ranges explicitly, and anything outside those ranges is automatically filtered. In Bug Bounty mode, scope is domain-centric and expansive: subdomain enumeration tools like Sublist3r discover new targets, and those discoveries are automatically considered in-scope if they match the root domain. This mirrors real-world engagement constraints—corporate pentests have strict network boundaries, while bug bounties reward creative scope expansion within domain boundaries.

The workspace concept provides scan reproducibility. Each workspace maintains its own SQLite database, configuration state, and output directory. You can pause all tasks in a workspace (via Celery's revoke mechanism), restart the worker days later, and resume exactly where you left off:

# Submit targets and generate tasks
celerystalk scan -f nmap_results.xml -w customer_engagement

# Pause specific tasks by ID
celerystalk cancel -w customer_engagement -t 42

# Query status across all tasks
celerystalk query -w customer_engagement

# Resume after configuration changes
celerystalk rescan -w customer_engagement

Under the hood, task state transitions (Pending → Running → Completed → Failed) are tracked in the database, and the HTML report generator queries this state to build a unified view across all tool outputs. The integration with Aquatone for screenshot correlation is particularly useful—seeing visual evidence alongside nikto findings helps prioritize which HTTP services merit deeper investigation.

The tool also demonstrates proper separation between coordination logic and execution. CeleryStalk never implements scanning functionality itself; it shells out to existing tools. This Unix philosophy approach means you get battle-tested tools (nmap, gobuster, nikto) with their full feature sets, while CeleryStalk provides the orchestration glue. Configuration files let you swap tool versions or add custom tools without touching Python code:

[gobuster]
command = gobuster dir -u {scheme}://{target}:{port}/ -w {wordlist} -o {output} -t 20
service = http
path = /usr/bin/gobuster
required = True

This configurability means teams can adapt CeleryStalk to their specific tool chains—swap feroxbuster for gobuster, add custom scripts, or adjust threading parameters—all through configuration rather than code modification.

Gotcha

The Python 2 dependency is the elephant in the room. Python 2 reached end-of-life in January 2020, and running legacy Python on security infrastructure is increasingly untenable. Modern Linux distributions no longer ship Python 2 by default, many security tools have migrated to Python 3, and dependency management for Python 2 projects is becoming archaeological work. Installing CeleryStalk in 2024 means maintaining a legacy Python environment with unmaintained libraries—exactly what you don't want in security tooling.

The root requirement compounds security concerns. CeleryStalk must run as root because many of the tools it orchestrates (nmap SYN scans, network interface manipulation) require elevated privileges. However, this means your entire Celery worker infrastructure runs with root privileges, expanding the attack surface significantly. If any tool CeleryStalk executes has a vulnerability or if your configuration introduces command injection (template substitution isn't automatically escaped), you're compromising a root process. The architectural decision to require root rather than implementing capability-based privilege escalation (like running workers as unprivileged users and using sudo for specific commands) limits deployment in security-conscious environments. You're essentially forced to run this on disposable Kali VMs rather than persistent security infrastructure.

Verdict

Use if: You're conducting large-scale penetration tests with hundreds of hosts and need asynchronous tool orchestration with granular job control; you're already working in Kali Linux throwaway environments where Python 2 and root execution aren't blockers; you value service-aware automation that intelligently maps tools to detected services; or you need workspace-based organization for managing multiple simultaneous engagements. The dual-mode scoping is particularly valuable if you switch between corporate VAPT engagements and bug bounty work. Skip if: You require Python 3 compatibility or need to integrate with modern CI/CD pipelines; you're building persistent security infrastructure where running as root violates organizational policy; you need active maintenance and community support for a production security workflow; or you're working outside Kali Linux where tool dependency management becomes painful. Consider AutoRecon for a Python 3 alternative or build custom orchestration with Nuclei and modern workflow tools if you need long-term maintainability.

CeleryStalk: Building a Distributed Security Scanner on Top of Python's Task Queue

CeleryStalk: Building a Distributed Security Scanner on Top of Python's Task Queue

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

CeleryStalk: Building a Distributed Security Scanner on Top of Python's Task Queue

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]