Nebula: Embedding AI Models Directly Into Penetration Testing Workflows
Hook
What if your penetration testing terminal could understand the output of nmap, suggest the next exploit, and write your engagement report simultaneously—all without leaving the command line?
Context
Penetration testing has always been bottlenecked by documentation overhead and cognitive load. Security professionals spend hours running reconnaissance tools like nmap, nikto, and sqlmap, then manually correlating outputs, documenting findings in separate note-taking apps, and searching CVE databases for context. The workflow is fragmented: terminal for execution, browser for research, text editor for notes. This context-switching kills momentum during time-sensitive engagements.
Nebula emerged to collapse this workflow into a single AI-augmented CLI. Built by BerylliumSec, it wraps standard penetration testing tools with large language models that interpret outputs in real-time, suggest exploitation paths, and generate structured documentation automatically. Instead of replacing existing tools, Nebula acts as an intelligent layer that makes terminal output actionable—think of it as giving your shell the ability to reason about security findings while maintaining the manual control professionals require.
Technical Insight
Nebula’s architecture centers on dual-mode AI integration: local inference via Ollama and cloud-based processing through OpenAI’s API. This design prioritizes flexibility—teams requiring air-gapped environments can run Llama 3.1 or Mistral locally, while those prioritizing speed can leverage OpenAI models with API keys injected via environment variables.
The interaction model is prefix-based. Prepend commands with ! to trigger AI mode, or toggle between terminal and AI contexts using the built-in mode switcher. For example, after running a port scan, you might ask:
# Standard terminal execution
nmap -sV -p- 192.168.1.10
# AI interpretation without leaving the CLI
! analyze the nmap output above and suggest next steps for enumeration
Under the hood, Nebula appears designed to capture stdout from executed commands, append it to the LLM context window, and stream responses back to the terminal. The tool positions itself as an assistant rather than autonomous agent—it suggests, and you decide.
The automated note-taking system is where Nebula differentiates itself from simple chatbot wrappers. Every command execution gets logged with timestamps, and AI-parsed outputs are categorized into structured findings. When you run nikto -h target.com, Nebula’s AI appears to extract discovered vulnerabilities and auto-populate your engagement notes. The notes are stored in ~/.local/share/nebula/logs as artifacts.
Agent-based internet search extends this further. When you query security-related questions, Nebula can dispatch agents to search live sources and synthesize findings into actionable context. This bridges the gap between static LLM knowledge cutoffs and the rapidly evolving threat landscape.
Installation reflects the tool’s Python-native design:
# Install via pip
python -m pip install nebula-ai --upgrade
# For local inference, pull Ollama models
ollama pull mistral
# For OpenAI integration, set API key
export OPENAI_API_KEY="sk-proj-your-key-here"
# Launch
nebula
Docker deployment includes X11 forwarding to preserve GUI capabilities for the screenshot annotation feature:
# Allow X server connections
xhost +local:docker
# Run with volume mounts for persistent logs and engagements
docker run --rm -it \
-e DISPLAY=$DISPLAY \
-v ~/.local/share/nebula/logs:/root/.local/share/nebula/logs \
-v ~/engagements:/engagements \
-v /tmp/.X11-unix:/tmp/.X11-unix \
berylliumsec/nebula:latest
The screenshot system integrates with the note-taking pipeline—capture a finding visually, annotate it with markup, and link it to relevant log entries. This addresses the common problem of disconnected screenshots in penetration testing reports.
Nebula’s status feed refreshes every five minutes to display recent activity—a passive monitoring layer that helps teams track parallel testing streams without manually tailing log files.
Gotcha
Nebula’s local inference mode demands serious hardware. The 16GB RAM minimum isn’t a suggestion—running Llama 3.1 or DeepSeek models on CPU without adequate memory causes performance issues. GPU acceleration via Ollama helps where available. Cloud API mode sidesteps hardware requirements but introduces latency and ongoing costs for API usage.
The more insidious limitation is AI reliability in security contexts. LLMs can hallucinate, and in penetration testing, a hallucinated exploit path could waste hours or worse—trigger unintended damage. The README positions Nebula as an assistant rather than autonomous agent, but users need existing expertise to validate AI suggestions against established CVE data and tool documentation. The tool’s accessibility could tempt junior practitioners to trust outputs without proper verification.
Docker X11 forwarding can be complex depending on host configurations. The README’s xhost +local:docker command works on standard setups but may require additional configuration in some environments.
Finally, Nebula’s documentation references a commercial ‘Nebula Pro’ with additional features like autonomous mode and code analysis—capabilities absent from the open-source version. The free tier lacks clarity on upgrade paths or feature parity, creating uncertainty for teams evaluating long-term adoption.
Verdict
Use Nebula if you’re an intermediate-to-senior penetration tester who already knows nmap from metasploit and wants AI to handle the grunt work of output analysis, note correlation, and documentation generation. It shines during time-compressed engagements where manual note-taking becomes a bottleneck, and the AI’s real-time internet search capabilities can accelerate threat modeling. Teams with adequate hardware (16GB+ RAM) or budget for OpenAI API credits will extract maximum value. Skip it if you’re resource-constrained (sub-16GB RAM systems will struggle with local models), operate in strict air-gapped environments where internet-based agent searches aren’t viable, or need deterministic security tooling where AI unpredictability is unacceptable. Absolute beginners should approach cautiously—the lack of built-in output validation means you need existing expertise to catch when the AI suggests incorrect approaches. If you’re choosing between Nebula and traditional scripted automation, ask yourself: do I spend more time running tools or interpreting their outputs? If it’s the latter, Nebula could save significant hours.