Back to Articles

shhgit: Real-Time Secret Detection Across Public Git Repositories

[ View on GitHub ]

shhgit: Real-Time Secret Detection Across Public Git Repositories

Hook

Every 60 seconds, developers push API keys, database passwords, and private keys to public GitHub repositories. Fraudsters have automated scanners running 24/7 to find them first.

Context

The modern development workflow has created a perfect storm for credential leaks. Developers work across multiple environments, configuration files live alongside code, and CI/CD pipelines require secrets to function. A single git push can expose an AWS key that grants access to production databases, or leak a Stripe API token that processes real payments.

Traditional security approaches fall short here. Manual code reviews can’t keep pace with deployment velocity. Static analysis tools only catch secrets after they’re committed to internal repositories. By the time security teams discover a leaked credential in a public repository, it’s often too late—the commit history is permanent, forks exist, and automated scrapers have already indexed the secret. shhgit was built to solve this timing problem by monitoring public code repositories in real-time, catching secrets within seconds of exposure rather than days or weeks later.

Technical Insight

Detection Engine

New repo events

Files

High-randomness strings

Pattern matches

Entropy scores

Real-time updates

Reports

Streaming APIs

GitHub/GitLab/Gist

Repository Cloner

Temp Storage

Signature Scanner

150+ Patterns

Entropy Analyzer

Shannon Calculation

Findings Aggregator

Web Interface

Live Monitor

CSV/Webhook Export

System architecture — auto-generated

shhgit’s architecture centers on a dual-detection engine that combines signature matching with entropy analysis. The codebase is written in Go (despite the repository metadata listing JavaScript), as evidenced by the Go build instructions and Go workflow badge in the README. It leverages the streaming APIs from GitHub, GitLab, Gist, and BitBucket to receive notifications about newly created or updated repositories. When a repository appears, shhgit clones it to a temporary directory (with configurable timeouts and size limits to prevent resource exhaustion), then scans every file against its detection ruleset.

The signature system uses a YAML configuration format that supports four matching strategies. Here’s how you’d define a signature to catch AWS credentials:

signatures:
  - part: 'contents'
    match: 'AWS_ACCESS_KEY_ID'
    name: 'AWS Access Key ID'
  - part: 'contents'
    regex: 'AKIA[0-9A-Z]{16}'
    name: 'AWS Access Key ID Value'
  - part: 'filename'
    match: '.aws/credentials'
    name: 'AWS CLI credentials file'
  - part: 'extension'
    match: '.pem'
    name: 'Private key file'

The part field determines where to look: filename, extension, path, or contents. Simple string matching uses match, while complex patterns use regex. This layered approach catches both obvious cases (files literally named credentials) and subtle ones (base64-encoded tokens buried in configuration).

What makes shhgit particularly effective is its entropy detection layer. Shannon entropy measures the randomness in a string—legitimate secrets like API keys have high entropy (lots of random characters), while false positives like placeholder text have low entropy. The default threshold is 5.0, configurable via the --entropy-threshold flag:

# More aggressive scanning (more false positives)
shhgit --entropy-threshold 4.0

# Conservative scanning (fewer false positives)
shhgit --entropy-threshold 6.0

# Disable entropy checks entirely
shhgit --entropy-threshold 0

For CI/CD integration, shhgit supports local directory scanning instead of monitoring public APIs. This lets you scan your codebase before pushing:

# Scan local repository and export results to CSV
shhgit --local /path/to/repo --csv-path findings.csv --silent

The webhook integration enables real-time alerting. When shhgit finds a secret, it POSTs to your configured endpoint with customizable payload data. This connects seamlessly to Slack, PagerDuty, or custom incident response workflows. The tool also includes a web interface (available when deployed via Docker Compose) that displays findings in real-time—useful for security teams running continuous monitoring operations.

Performance tuning happens through several flags. The --threads option controls concurrency (defaults to your CPU count), --maximum-repository-size prevents downloading massive monorepos, and --clone-repository-timeout kills hung operations. For focused searching, --search-query overrides signatures entirely:

# Find all Stripe API keys across public repos
shhgit --search-query 'sk_live_[0-9a-zA-Z]{24}'

The blacklist system in config.yaml reduces noise. You can exclude file extensions (like .md or .txt), specific paths (node_modules/, vendor/), or string patterns that generate false positives in your environment. The blacklisted_entropy_extensions array is particularly clever—it disables entropy checking for file types that legitimately contain high-randomness data, like minified JavaScript or binary assets.

Gotcha

The elephant in the room: shhgit is officially abandoned. The README clearly states it’s no longer maintained, with the author offering paid consultation instead of community support. This means no security patches, no updates for new secret types, and no fixes for bugs you encounter. You’re inheriting technical debt the moment you deploy it.

False positives are the operational reality. Signature-based detection fundamentally trades precision for recall—it catches everything that looks like a secret, which includes test fixtures, example code, documentation, and placeholder values. Entropy checking helps but introduces its own false positives (random-looking commit hashes, base64-encoded images, minified code). Expect to spend significant time tuning your config.yaml and building blacklists. The GitHub API rate limiting also becomes problematic. Even with multiple tokens, aggressive scanning can trigger abuse detection. The tool clones entire repositories to scan them, which consumes bandwidth and storage—scanning large monorepos or repositories with extensive binary assets will hit your configured size limits or timeouts. There’s no incremental scanning; every discovery means a fresh clone.

Verdict

Use shhgit if you need a quick, free solution for ad-hoc OSINT reconnaissance of public repositories, or if you’re integrating basic secrets detection into a legacy CI pipeline where you can afford manual triage and you’re comfortable forking and maintaining abandoned code. It excels at rapid deployment and has a proven signature database covering 150+ secret types. Skip it for any production security program or if you need ongoing vendor support, compliance features, low false-positive rates, or confidence that vulnerabilities will be patched. The abandonment issue is disqualifying for regulated industries or teams without Go expertise to maintain a fork. Invest instead in actively maintained alternatives: TruffleHog for comprehensive Git history scanning with verification, GitLeaks for superior CI/CD integration and performance, or native platform solutions like GitHub Advanced Security if you’re already invested in those ecosystems. shhgit solved an important problem in its time, but the security tooling landscape has matured significantly, and unmaintained security tools become liabilities.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/eth0izzle-shhgit.svg)](https://starlog.is/api/badge-click/cybersecurity/eth0izzle-shhgit)