How Git Became a Time-Series Database for Bug Bounty Recon

Hook

What if tracking changes in your reconnaissance data didn't require a database, API, or orchestration platform—just git commands you already know?

Context

Bug bounty hunters and penetration testers face a deceptively simple problem: knowing what changed. You run nmap against a target today, then again next week. New ports? Closed services? Version upgrades? Without systematic tracking, you're either keeping sprawling text files, manually diffing outputs, or building custom tooling. Most reconnaissance frameworks solve this with databases—PostgreSQL for ReconNG, embedded storage for Amass. But databases introduce complexity: schema migrations, query languages, backup procedures, and deployment overhead.

Jobert Abma's recon.sh takes a radically different approach: it treats git itself as the database. Every reconnaissance command you run gets wrapped, its output captured to a deterministically-named file, and automatically committed. Need to know what changed? Run git diff. Want historical context? Check git log. Collaborating with teammates? Push to a shared private repository. The entire system is 200 lines of bash that transforms version control into a reconnaissance time-series database, with zero dependencies beyond git and standard Unix tools.

Technical Insight

System architecture — auto-generated

The architecture is deceptively elegant. Recon.sh works by injecting itself into your workflow through bash functions that intercept command execution. When you run recon example.com nmap -p- example.com, the tool captures the command string, executes it, and stores the output using a deterministic filename based on a hash of the command itself.

Here's the critical insight: by hashing the command string, identical commands always overwrite the same file. Run nmap -p80,443 target.com twice, and you get one file with two commits showing exactly what changed between scans. This is fundamentally different from timestamped output files that proliferate endlessly. The hash-based naming creates stable identities for each unique command variant:

# Inside recon.sh's core logic (simplified)
recon() {
    local asset="$1"
    shift
    local cmd="$*"
    
    # Generate deterministic filename from command hash
    local hash=$(echo -n "$cmd" | shasum -a 256 | cut -d' ' -f1)
    local output_file="output/$asset/$hash.txt"
    
    # Execute and capture
    $cmd > "$output_file" 2>&1
    
    # Store metadata
    echo "$cmd" > "output/$asset/.$hash.command"
    
    # Git commit automatically
    git -C output add "$asset/"
    git -C output commit -m "$asset: $cmd"
}

The directory structure mirrors your mental model: output/example.com/ contains all reconnaissance for that asset, with each hash file representing a unique command variant. The hidden .command files preserve human-readable command strings for search and reference.

Git becomes the query engine through familiar commands. Want to see when a new subdomain appeared in your DNS enumeration? git log -p output/target.com/<hash>.txt shows the entire timeline with inline diffs. Need to search across all reconnaissance outputs? git grep 'admin.panel' searches both current state and history. The recon diff command is just a thin wrapper around git diff, showing what changed since your last scan.

For artifact management, recon.sh provides an artifacts command that copies binary files or large datasets into the repository structure without executing them as commands. This handles screenshots, packet captures, or downloaded files that need tracking but aren't command outputs:

# Store a screenshot with context
recon artifacts example.com screenshot.png
# Results in: output/example.com/artifacts/screenshot.png
# Committed automatically with metadata

The multi-node collaboration model leverages git's distributed nature. Set up a private repository, configure it as a remote, and multiple operators can push reconnaissance data simultaneously. Git handles merge conflicts (rare, since each command hash is unique), and everyone gets a complete historical view. No API servers, no database replication, no custom sync protocols—just git push and git pull.

The search implementation deserves attention. The recon search command greps through both command metadata files and output content, providing a primitive but effective knowledge base query system. Since everything is plain text in git, you can use standard Unix tools: find, ag, ripgrep, even git grep with revision ranges to search historical states.

One subtle architectural decision: recon.sh doesn't parse or interpret command outputs. It's format-agnostic, treating everything as opaque text. This simplicity means it works equally well with nmap XML, subfinder JSON, or custom script output. The trade-off is no structured querying—you can't ask "show me all hosts with port 8080 open" without writing your own grep patterns or parsers. But this constraint is also a strength: the tool never breaks because a reconnaissance tool changed its output format.

Gotcha

The git-as-database model hits hard limits faster than you'd expect. A comprehensive nmap scan of a /16 network generates megabytes of XML output. Run that daily for a month, and your repository balloons to gigabytes. Git isn't designed for large binary blobs or high-volume text commits—clone times increase, diff performance degrades, and .git directory size becomes unwieldy. There's no built-in compression beyond git's delta encoding, which doesn't help much with constantly-changing scan outputs.

Bash portability issues lurk throughout. The script assumes GNU coreutils and bash 4.x behaviors that don't exist on macOS by default (though homebrew fixes most issues). Complex commands with nested quotes, pipes, or redirections can confuse the wrapper's argument parsing. Binary output breaks the text-based storage model—try storing a raw packet capture and watch git struggle. The artifacts command helps, but it's a workaround rather than a solution. If your reconnaissance workflow relies heavily on tools that output binary formats (screenshots, compiled data), you'll spend time writing wrapper scripts to convert everything to text or fighting git's large file limitations. For high-frequency automated scanning—think continuous subdomain monitoring checking every 5 minutes—the commit overhead becomes a bottleneck, and you'll quickly have tens of thousands of commits creating repository bloat.

Verdict

Use if: You're a solo bug bounty hunter or small team running periodic (daily/weekly) reconnaissance against a manageable target list (dozens, not hundreds of assets), you value audit trails and historical context over real-time dashboards, and you're comfortable with command-line git workflows. Recon.sh excels when simplicity and transparency matter more than features—no installation beyond cloning a repo, no database to maintain, and full visibility into what's being tracked. It's perfect for engagements where you need to demonstrate to clients exactly what changed and when, since git commits provide irrefutable evidence. Skip if: You're running high-frequency automated scanning, managing hundreds of targets with verbose output tools, need structured queries beyond grep (like "find all hosts where MySQL port changed from closed to open"), or require integration with security orchestration platforms and ticketing systems. The git-based storage doesn't scale to enterprise reconnaissance operations, and the lack of a proper database means no complex analytics without building your own ETL pipeline. If you're already invested in frameworks like Amass or Axiom with rich APIs and dashboards, recon.sh's minimalism becomes a limitation rather than a feature.

How Git Became a Time-Series Database for Bug Bounty Recon

How Git Became a Time-Series Database for Bug Bounty Recon

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

How Git Became a Time-Series Database for Bug Bounty Recon

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]