Back to Articles

Finding Leaked Secrets at Scale: A Deep Dive into GitHub Dorks

[ View on GitHub ]

Finding Leaked Secrets at Scale: A Deep Dive into GitHub Dorks

Hook

In 2022, researchers found over 6 million secrets leaked across 1.8 million GitHub repositories. Most remained exposed for weeks before detection, if they were caught at all.

Context

Every developer has seen the warnings about not committing secrets, yet credential leaks remain one of the most common security vulnerabilities in modern software development. A single accidentally committed AWS key or database password can lead to catastrophic breaches, with attackers often discovering exposed credentials within hours of being pushed to public repositories.

Traditionally, finding these leaked secrets required either manual GitHub searches using carefully crafted queries (known as “dorks” in the security community) or building custom automation. Security teams would maintain lists of search patterns and manually execute them across repositories, a process that was both time-consuming and error-prone. The github-dorks project emerged to solve this exact pain point: automating the systematic application of proven search patterns across entire organizations, user accounts, or specific repositories, while handling the complexities of API rate limiting and authentication.

Technical Insight

API Layer

Search Patterns

target, dorks file, auth

read patterns

authenticate

execute queries

rate limit hit

retry

results

format & write

CLI Input

Main Controller

Dork Patterns File

github3.py Client

GitHub Search API

Wait Handler

Result Processor

CSV Output

System architecture — auto-generated

At its core, github-dorks is a focused Python tool that orchestrates GitHub Search API queries using a curated collection of dork patterns. The architecture is deliberately simple: it reads search patterns from text files, executes them through the github3.py client library, and outputs results in a structured format.

The real value lies in the dork patterns themselves. The project maintains over 40 search queries targeting common leak patterns across languages and frameworks. For example, the github-dorks.txt file includes patterns like filename:.npmrc _auth (targeting npm credentials), extension:pem private (searching for private keys), and filename:.env DB_PASSWORD (finding database credentials in environment files). These patterns leverage GitHub’s code search syntax, combining filename matching, content searching, and file extension filtering to maximize detection accuracy.

Running a scan is straightforward. After installing via pip and setting up a GitHub personal access token, you execute searches against a target:

# Basic usage scanning a specific user
python github-dorks.py -u target_username -d github-dorks.txt

# Scanning an entire organization with CSV output
python github-dorks.py -o company_org -d github-dorks.txt --output-file results.csv

# Using GitHub Enterprise
python github-dorks.py -u username -d dorks.txt --ghe https://github.company.com

Under the hood, the tool handles one of the most challenging aspects of GitHub API automation: rate limiting. GitHub’s Search API is particularly restrictive, allowing only 30 authenticated requests per minute. The implementation monitors rate limit headers and automatically sleeps when limits are approached, resuming execution once the window resets. This means a scan of a large organization with hundreds of repositories might take considerable time, but it runs without manual intervention.

The Docker containerization option is particularly valuable for security teams running periodic audits. The provided Dockerfile creates a minimal Python environment with all dependencies bundled:

# Build the container
docker build -t github-dorks .

# Run a scan from the container
docker run -it github-dorks -u target_user -d github-dorks.txt

For customization, security teams typically create their own dork files targeting organization-specific patterns. For instance, if your company uses internal domain names or specific configuration patterns, you’d add targeted searches:

filename:config.yml "internal.company.com" password
extension:java "jdbc:postgresql" password
filename:terraform.tfvars secret_key

The search results return code snippets, file paths, and repository URLs, allowing security teams to quickly identify and triage potential leaks. However, the output format is basic—essentially raw API responses formatted as CSV or console output—which means most teams need to build additional tooling around it for ticket creation, deduplication, and remediation tracking.

Gotcha

The primary limitation is the GitHub Search API itself. Even with authentication, the 30 requests per minute cap means scanning organizations with dozens of repositories against 40+ dork patterns can take hours. If you’re trying to scan a large enterprise GitHub organization with hundreds of repositories, you’re looking at a very slow process. There’s no parallelization or distributed execution support, so you’re strictly bound by these rate limits.

False positives are another significant challenge. Configuration examples, test fixtures, and dummy credentials scattered throughout codebases will trigger matches. A repository with documentation showing example .env files will light up the same as one with actual leaked credentials. The tool provides no intelligence for distinguishing real secrets from placeholder values, meaning manual review of every finding is necessary. For large organizations, this can result in hundreds of false positives drowning out the few real issues. Additionally, the tool only searches current repository content visible to GitHub’s search index—it won’t dig through git history like tools such as truffleHog, meaning secrets that were committed and then removed in subsequent commits may go undetected.

Verdict

Use if: You’re conducting periodic security audits of your own organization’s repositories, you need a simple automated way to apply proven dork patterns without building custom tooling, you’re comfortable with manual result review and triage, or you’re running security assessments where you have days or weeks for comprehensive scanning. This tool excels at retrospective audits and providing evidence of security posture for compliance purposes. Skip if: You need real-time secret detection (implement pre-commit hooks with tools like detect-secrets instead), you’re working with very large organizations where API rate limits make scanning impractical, you require sophisticated false-positive filtering and can’t afford manual review overhead, or you need git history scanning to catch removed-but-previously-exposed secrets. For modern CI/CD pipelines, integrate purpose-built secret scanners like gitleaks that run on every commit rather than relying on periodic detective controls.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/techgaun-github-dorks.svg)](https://starlog.is/api/badge-click/cybersecurity/techgaun-github-dorks)