GitGot: Why Manual Review Beats Automation for GitHub Secret Hunting
Hook
Most secret scanners automate everything and drown you in false positives. GitGot does the opposite—it forces you to review each result manually, and somehow that makes it faster.
Context
GitHub hosts over 100 million repositories, and developers accidentally commit API keys, credentials, and tokens every single day. Automated secret scanners like TruffleHog and Gitleaks excel at sweeping through git history in CI/CD pipelines, but they fall short during reconnaissance: you're not scanning repositories you control—you're hunting through the vast public GitHub landscape for exposed secrets related to a specific target organization. The signal-to-noise ratio is abysmal. Search for "api_key" and you'll find millions of test fixtures, documentation examples, and vendored libraries before you find anything actionable.
Traditional automated approaches fail here because they lack context. They can't distinguish between a dummy API key in a tutorial and a production credential for your bug bounty target. They can't learn that every result from "test-fixtures" directories is worthless for your current investigation. GitGot emerged from Bishop Fox's penetration testing practice specifically to solve this problem: rapid, targeted reconnaissance against specific organizations where a human can provide the contextual judgment that automation lacks, but with enough tooling support to avoid drowning in manual grep.
Technical Insight
GitGot's architecture centers on an interactive REPL that wraps the GitHub Search API with stateful session management. When you launch a search query, results stream in one at a time. For each result, you see the repository, file path, and matching content snippet. You then make a judgment call: is this interesting, or noise? If it's noise, you don't just skip it—you tell GitGot why it's noise, and the tool learns.
The genius is in the blacklisting system. You can blacklist by filename ("package-lock.json"), by repository ("ExampleOrg/test-repo"), by user ("automated-bot"), or—most powerfully—by content similarity using fuzzy hashing. This last option leverages ssdeep to generate locality-sensitive hashes of file contents. When you encounter a vendored third-party library or a test fixture template that's been copied across dozens of repositories, you blacklist by fuzzy hash once, and GitGot automatically filters out all similar files for the remainder of your session.
Here's what a typical GitGot workflow looks like in practice:
# Search for AWS credentials in repos mentioning target company
$ python3 gitgot.py -q "targetcorp AWS_ACCESS_KEY_ID"
# GitGot streams results:
[Result 1/847]
Repo: targetcorp-demos/old-lambda-example
Path: config/credentials.js
Content: const AWS_ACCESS_KEY_ID = "AKIAIOSFODNN7EXAMPLE"
Action? [i]nteresting, [s]kip, [b]lacklist: b
Blacklist by: [f]ilename, [r]epo, [u]ser, [c]ontent: r
# Now all results from targetcorp-demos/* are filtered
[Result 2/847]
Repo: someuser/terraform-examples
Path: modules/aws/main.tf
Content: access_key = "AKIAIOSFODNN7EXAMPLE"
Action? [i]nteresting, [s]kip, [b]lacklist: b
Blacklist by: [f]ilename, [r]epo, [u]ser, [c]ontent: c
# Fuzzy hash generated - all similar Terraform examples now filtered
The session state—your blacklists, interesting findings, and search progress—persists to disk. You can pause mid-search, and when you resume, GitGot picks up exactly where you left off. More importantly, you can export your blacklist and reuse it across related searches. If you've spent an hour training GitGot to ignore common Terraform example patterns while hunting for "targetcorp" secrets, you can apply that same blacklist when searching for "target-corp" or "targetcorporation" variations.
Under the hood, GitGot constructs GitHub Search API queries with your specified keywords and paginates through results. The tool respects rate limits and includes exponential backoff. For each result, it fetches the raw file content and applies your accumulated blacklist rules before presenting it for review. When you mark something as interesting, GitGot can optionally apply custom regex patterns you've defined to extract specific credential formats:
# In your .gitgot/regex.txt configuration
AWS_KEY="AKIA[0-9A-Z]{16}"
SLACK_TOKEN="xox[baprs]-[0-9a-zA-Z-]+"
PRIVATE_KEY="-----BEGIN (RSA|OPENSSH|ENCRYPTED) PRIVATE KEY-----"
# When GitGot finds a match, it logs structured data:
{
"query": "targetcorp AWS_ACCESS",
"repo": "targetcorp/mobile-app",
"path": "ios/AppConfig.swift",
"match_type": "AWS_KEY",
"matched_string": "AKIAJ4XQXQXQXQXQXQXQ",
"url": "https://github.com/targetcorp/mobile-app/blob/main/ios/AppConfig.swift",
"timestamp": "2024-01-15T14:23:01Z"
}
This structured output feeds directly into your reconnaissance pipeline or bug bounty reporting workflow. The combination of human-filtered results and automated regex extraction means you get high-confidence findings without manual copy-paste.
GitGot also supports GitHub Gist search, which is frequently overlooked in secret reconnaissance. Developers often paste configuration snippets or debug output into gists without realizing they contain credentials. The gist search follows the same interactive workflow but queries the separate Gist API endpoint. Since gists are less structured than repository code, the fuzzy hashing blacklist becomes even more valuable for filtering out recurring junk.
Gotcha
The human-in-the-loop design is both GitGot's strength and its Achilles heel. You cannot walk away from an active session—the tool requires continuous interaction to progress through results. If your search query returns 10,000 results, even with aggressive blacklisting, you're committing to hours of active review. There's no "auto-skip" mode or confidence threshold to let it run unattended. This makes GitGot poorly suited for broad reconnaissance across many unrelated targets or continuous monitoring scenarios.
The dependency on ssdeep for fuzzy hashing adds installation friction. ssdeep requires Python bindings (python-ssdeep) which wrap a C library that must be compiled on installation. On macOS with modern Xcode or Linux with build-essential, this usually works fine, but Windows users often hit compilation errors. Docker deployment mitigates this, but it's still an obstacle compared to pure-Python alternatives. Additionally, GitHub's Search API has well-documented limitations: it only indexes the default branch of repositories, it may lag behind recent commits by hours or days, and it returns a maximum of 1,000 results per query regardless of actual matches. If your target organization has thousands of repositories, you'll need to craft multiple narrower queries to get comprehensive coverage, which multiplies the interactive review time.
Verdict
Use if: You're conducting targeted bug bounty or penetration testing reconnaissance against a specific organization where precision matters more than coverage, you can dedicate focused time to interactive review, and you're willing to invest in training the tool's blacklists for high-quality results. GitGot excels when you need to hunt through GitHub's public data for a specific target and you have the domain knowledge to quickly classify results as interesting or noise. The fuzzy hashing and session persistence pay dividends when investigating related entities (subsidiaries, acquired companies, domain variations) where blacklist reuse dramatically accelerates subsequent searches. Skip if: You need fully automated secret detection for CI/CD pipelines, you're scanning repositories you control (use Gitleaks or TruffleHog instead), you want continuous monitoring of new commits, or you're doing broad unfocused reconnaissance across hundreds of unrelated targets. The manual review requirement makes GitGot impractical for high-volume scenarios, and the GitHub API limitations mean you'll miss secrets in non-default branches or very recent commits. For those use cases, accept the higher false positive rate of automated tools and filter results downstream.