git-wild-hunt: Using GitHub's Search API as a Credential Scanner

Hook

In 2021, security researchers found over 2 million secrets exposed in public GitHub repositories—and that was just in one year. The irony? You can use GitHub's own search API to find most of them.

Context

Credential leaks on GitHub are a perennial problem. Developers accidentally commit AWS keys, API tokens, and private certificates to public repositories constantly. By the time they realize it, automated scrapers have already harvested them—credential compromise can happen within minutes of a commit going public.

Traditional credential scanning tools like truffleHog clone entire repositories and walk through git history, which is thorough but painfully slow when you're investigating whether your organization's credentials have leaked somewhere. git-wild-hunt takes a different approach: instead of scanning repositories exhaustively, it weaponizes GitHub's code search API to find specific credential patterns across millions of public repos in minutes. It's threat intelligence reconnaissance disguised as a Python script.

Technical Insight

The architecture is refreshingly simple. git-wild-hunt is essentially a 400-line wrapper that chains together three operations: crafting GitHub API search queries, retrieving matching code snippets, and running regex patterns against the raw content. The elegance lies in leveraging GitHub's infrastructure to do the heavy lifting instead of building a complex crawler.

The tool accepts GitHub advanced search syntax, which most security engineers underutilize. You can search for specific filenames, paths, or extensions combined with credential-adjacent keywords. For example, searching for filename:.aws/credentials returns actual AWS credential files, while extension:pem private surfaces private keys. Here's how you'd search for exposed Kubernetes configs:

# Example search query construction
query = 'filename:.kube/config kind: Config'

# The tool wraps this in API calls
api_url = f'https://api.github.com/search/code?q={query}'
headers = {'Authorization': f'token {github_token}'}
response = requests.get(api_url, headers=headers)

# Each result includes the file's raw content URL
for item in response.json()['items']:
    raw_url = item['html_url'].replace('github.com', 'raw.githubusercontent.com')
    raw_url = raw_url.replace('/blob/', '/')
    content = requests.get(raw_url).text
    # Now scan content with regex patterns

What makes this practical is the credential detection layer borrowed from truffleHog. git-wild-hunt ships with regex patterns for 30+ credential types—AWS keys (both AKIA and ASIA formats), Google service account JSON blobs, Slack tokens, GitHub PATs, RSA private keys, and more. Instead of reinventing detection logic, it reuses battle-tested patterns that the security community has refined over years.

The configuration is file-based, using a simple JSON structure that defines output paths, logging verbosity, and which regex pattern set to load. This makes it trivial to customize for your threat model:

{
  "output_dir": "./results",
  "log_level": "INFO",
  "patterns_url": "https://raw.githubusercontent.com/dxa4481/truffleHogRegexes/master/truffleHogRegexes/regexes.json",
  "queries": [
    "filename:.npmrc _auth",
    "extension:pem BEGIN RSA PRIVATE",
    "filename:credentials aws_access_key_id"
  ]
}

Results stream to JSON files, one per search query, containing the repository name, file path, matched pattern type, and the actual secret string. This format integrates cleanly into SIEM platforms or ticketing systems for incident response workflows.

The real power emerges when you understand GitHub's search operators. You can scope searches by organization (org:yourcompany), date ranges (pushed:>2024-01-01), or even repository size to avoid scanning massive monorepos. For threat intelligence, you might search for your company's domain in environment files: filename:.env "yourcompany.com". For security audits, search for cloud provider patterns across your organization's public repos before attackers do.

Gotcha

Rate limiting will hit you faster than you expect. GitHub's authenticated API allows 5,000 requests per hour, but code search has a stricter limit of 30 requests per minute. If you're running multiple queries or paginating through hundreds of results, you'll burn through this quota quickly. The tool doesn't include sophisticated rate limit handling—it'll just fail when you hit the ceiling. You need to implement exponential backoff yourself or space out your queries manually.

False positives are unavoidable with regex-based detection. You'll get flooded with matches from example code, documentation, and test fixtures that contain fake credentials. A search for AWS keys will return thousands of dummy AKIAIOSFODNN7EXAMPLE tokens from SDK documentation. Triaging results requires human judgment or additional filtering logic that git-wild-hunt doesn't provide. The tool also only surfaces the first code search page by default—if your query returns 1,000 matches, you'll miss most of them unless you manually implement pagination. For production security monitoring, you'll need to wrap this in orchestration that handles both limitations.

Verdict

Use if: You're conducting incident response and need to quickly determine if specific credentials were exposed publicly, you're doing threat intelligence reconnaissance to see if your organization's secrets have leaked, or you need a lightweight tool you can easily modify for custom search patterns and integrate into existing security workflows. The simplicity makes it perfect for one-off investigations where setup overhead matters more than feature completeness. Skip if: You need comprehensive git history scanning (use truffleHog directly instead), you want real-time monitoring of new commits (shhgit is purpose-built for this), you require enterprise-grade accuracy with low false positives (GitHub Advanced Security is worth the cost), or you're scanning at scale across hundreds of organizations (you'll hit rate limits immediately and need commercial tooling).

git-wild-hunt: Using GitHub's Search API as a Credential Scanner

git-wild-hunt: Using GitHub's Search API as a Credential Scanner

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

git-wild-hunt: Using GitHub's Search API as a Credential Scanner

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]