Nebula: The AI-Powered Pentesting Assistant That Learns Your Workflow
Hook
The average penetration test generates 40+ pages of documentation, yet pentesters spend only 15% of their time actually exploiting systems—the rest is reconnaissance, note-taking, and report writing. What if an AI could handle the busywork while you focus on breaking things?
Context
Penetration testing has always been a documentation-heavy discipline. Security professionals juggle multiple terminal windows running nmap, Burp Suite, and SQLMap while simultaneously maintaining detailed notes about every command executed, every vulnerability discovered, and every dead end explored. The constant context switching between exploitation and documentation creates friction that slows engagements and increases the risk of missing critical findings.
The rise of large language models promised to revolutionize this workflow, but most solutions required copying terminal output to ChatGPT in a browser, losing command history context, and manually correlating AI suggestions with your actual testing environment. Nebula takes a different approach: it embeds the AI directly into your pentesting workflow as a persistent assistant that observes your commands, automatically documents findings, and provides contextual guidance without requiring you to leave the terminal. Built by the BerylliumSec team, it's designed for security professionals who want AI augmentation without sacrificing the command-line efficiency that makes terminal-based workflows so powerful.
Technical Insight
Nebula's architecture revolves around a dual-mode command interpreter that seamlessly toggles between system command execution and AI interaction. Commands prefixed with '!' are routed to the configured LLM (either local Ollama models or OpenAI's API), while unprefixed commands execute normally in the shell. This design pattern creates a conversational testing environment where you can run nmap -sV target.com, immediately ask !what vulnerabilities are commonly associated with the services found?, and receive context-aware responses based on your actual scan results.
The real power lies in Nebula's context management system. Every command you execute and its output gets logged and fed into the AI's context window, allowing the model to maintain awareness of your engagement state. Here's how you might interact with it during a typical reconnaissance phase:
# Example workflow showing context-aware AI assistance
nebula> nmap -sV -p- 192.168.1.100
# Nebula logs: Discovered SSH (22), HTTP (80), MySQL (3306)
nebula> !analyze the services found and suggest next steps
# AI Response: "MySQL on 3306 combined with HTTP suggests a web application
# with database backend. Recommend: 1) Directory enumeration on port 80
# 2) Check for default MySQL credentials 3) Test for SQL injection if
# web forms are discovered. The SSH service should be noted for potential
# brute-force as a fallback persistence mechanism."
nebula> gobuster dir -u http://192.168.1.100 -w /usr/share/wordlists/common.txt
# Discovers /admin, /login, /api endpoints
nebula> !given these endpoints and the MySQL backend, what injection points should I prioritize?
# AI Response: "Focus on /api endpoints first—APIs often have less input
# validation than user-facing forms. Test for SQL injection in any
# parameter-accepting endpoints. The /admin path may have authentication
# bypass vulnerabilities. I'll search for recent CVEs related to these patterns."
Under the hood, Nebula implements this through a session manager that maintains a rolling context buffer. When you invoke AI assistance, it constructs prompts that include your recent command history, tool outputs, and any previous AI interactions. This approach mirrors how an experienced mentor would guide a junior pentester—by actually observing what they're doing rather than answering questions in a vacuum.
The automated note-taking system operates as a background logger that timestamps every action and categorizes it by engagement phase (reconnaissance, exploitation, post-exploitation). It generates markdown-formatted reports that meet industry standards for penetration testing documentation. Screenshots can be captured with a simple !screenshot command, automatically embedded in the notes with contextual descriptions generated by the AI.
For malware analysis workflows, Nebula includes the Deep Application Profiler (DAP), which uses neural network-based classification to identify potentially malicious executables. Unlike traditional signature-based detection, DAP analyzes behavioral patterns and structural characteristics:
# Simplified conceptual example of DAP analysis
nebula> dap analyze suspicious_binary.exe
# DAP Output:
# Behavioral Indicators:
# - Network socket creation detected
# - Registry modification patterns match known persistence mechanisms
# - Entropy analysis suggests packed/obfuscated code sections
# - API call patterns: 78% similarity to ransomware families
# Risk Score: 8.7/10 (High Confidence Malware)
nebula> !explain the entropy findings and suggest unpacking approaches
# AI explains entropy analysis and recommends specific unpacking tools
The agent system for internet-sourced context queries live CVE databases, exploit repositories, and security blogs to augment the base LLM's knowledge. When you ask about a specific vulnerability, Nebula can fetch the latest proof-of-concept exploits or mitigation strategies that wouldn't exist in the model's training data. This architecture decision addresses one of LLMs' biggest limitations—knowledge cutoff dates—by dynamically pulling current information when needed.
LLM integration supports both local Ollama instances (running models like Llama 2, Mistral, or CodeLlama) and cloud APIs. The local inference path prioritizes privacy for sensitive engagements where you can't send client data to third-party APIs. Configuration is straightforward through environment variables or a config file that specifies model endpoints, API keys, and context window preferences. The system automatically handles token limits by intelligently truncating older context while preserving critical findings and recent command history.
Gotcha
The 16GB minimum RAM requirement isn't just a suggestion—local LLM inference genuinely demands substantial memory, and running sophisticated models like Llama 2 70B alongside active pentesting tools can bring even well-provisioned systems to their knees. In field engagements where you're working on a laptop, you'll likely need to fall back to cloud APIs or smaller, less capable models that may provide generic advice. The resource contention becomes particularly problematic during intensive scanning operations or when analyzing large packet captures.
More concerning is the AI hallucination risk. LLMs can confidently suggest exploits that don't exist, cite CVE numbers incorrectly, or recommend attack vectors that are technically nonsensical. An experienced pentester can immediately recognize when the AI is confabulating, but this tool's biggest danger is creating false confidence in its suggestions. You absolutely cannot trust AI-generated exploit code without thorough validation. I've seen it recommend SQL injection payloads with incorrect syntax or suggest privilege escalation techniques for the wrong operating system version. Nebula is an assistant, not a replacement for security expertise—treat every AI suggestion as a hypothesis that requires verification, not gospel truth. The Docker implementation's X server requirements also add friction, particularly around security implications of forwarding display servers when running privileged tooling in containers.
Verdict
Use Nebula if you're an experienced penetration tester who understands security fundamentals well enough to validate AI suggestions, conducts frequent engagements where documentation overhead is slowing you down, has the hardware resources to run local models or can safely use cloud APIs for your engagement type, and wants to stay in terminal-based workflows without constant context switching. It excels at accelerating reconnaissance interpretation, automating engagement documentation, and providing quick reference for common attack patterns. Skip it if you're a beginner who might trust incorrect AI guidance without validation, work in air-gapped or highly restricted environments where neither cloud APIs nor sufficient local compute is available, need guaranteed accuracy in security recommendations for compliance-driven assessments, or prefer traditional manual methodologies where you maintain complete control over every decision. This tool amplifies expert knowledge but cannot substitute for it.