Building a Universal Secrets Database: How secrets-patterns-db Centralizes Detection Across Security Tools
Hook
Most secret scanning tools detect fewer than 700 API key patterns, yet there are over 1,600 documented token formats across cloud providers and SaaS platforms. That coverage gap is a security blind spot.
Context
Secret scanning tools like Trufflehog and Gitleaks have become essential in AppSec pipelines, preventing developers from accidentally committing API keys, database credentials, and authentication tokens to version control. But each tool maintains its own pattern database, creating a fragmented ecosystem where detection coverage varies wildly. Trufflehog v3 ships with roughly 700 detectors, Gitleaks includes about 60, and smaller tools have even fewer.
The problem compounds when organizations run multiple scanning tools or want to migrate between them. Pattern knowledge doesn’t transfer—switching from Gitleaks to Trufflehog means learning a new configuration format and losing custom patterns you’ve developed. Security teams end up maintaining separate ruleset forks for each tool, duplicating effort and risking inconsistent coverage across their toolchain. secrets-patterns-db emerged to solve this fragmentation by creating a single, comprehensive database of regex patterns that can be converted into any tool’s native format.
Technical Insight
At its core, secrets-patterns-db is a YAML-based pattern repository that serves as a universal source of truth. The main database file, db/rules-stable.yml, contains over 1,600 regex patterns with structured metadata. Each pattern entry includes the regex itself, a descriptive name, confidence level, and additional context. Here’s a representative pattern structure:
- Name: 'AWS Access Key ID'
Regex: 'AKIA[0-9A-Z]{16}'
Confidence: 'high'
Description: 'Amazon Web Services Access Key ID'
Tags:
- aws
- cloud
- authentication
This universal format is deliberately tool-agnostic. The project includes Python conversion scripts in the converters/ directory that transform this canonical YAML into tool-specific configurations. For Trufflehog v2, the converter generates a .regexes.json file with patterns wrapped in proper JSON structure. For Gitleaks, it outputs a .gitleaks.toml configuration file matching that tool’s expected format. This architecture means pattern improvements flow automatically to all supported tools—add a pattern once, deploy it everywhere.
The conversion process is straightforward but powerful. Here’s how you’d integrate this database with Gitleaks:
import yaml
import toml
# Load the universal database
with open('db/rules-stable.yml', 'r') as f:
patterns = yaml.safe_load(f)
# Transform to Gitleaks format
gitleaks_config = {'rules': []}
for pattern in patterns:
gitleaks_config['rules'].append({
'id': pattern['Name'].lower().replace(' ', '-'),
'description': pattern.get('Description', pattern['Name']),
'regex': pattern['Regex'],
'tags': pattern.get('Tags', []),
'allowlist': {'regexes': []}
})
# Export to TOML
with open('.gitleaks.toml', 'w') as f:
toml.dump(gitleaks_config, f)
The confidence levels embedded in patterns are particularly valuable for tuning false positive rates. Patterns marked high confidence typically have low false positive rates—formats like AKIA[0-9A-Z]{16} for AWS keys are highly specific. Medium and low confidence patterns cast wider nets but may flag legitimate code strings. In production environments, you might filter to only high-confidence patterns initially, then gradually expand coverage:
# Filter for production use
high_confidence_patterns = [
p for p in patterns
if p.get('Confidence') == 'high'
]
print(f"Reduced from {len(patterns)} to {len(high_confidence_patterns)} patterns")
A critical architectural decision is the ReDoS validation built into the CI pipeline. Regular Expression Denial of Service attacks exploit inefficient regex patterns that cause catastrophic backtracking, freezing scanning tools. The project runs every pattern through redos-detector or similar tools to ensure linear time complexity. This validation prevents malicious or poorly-crafted patterns from turning your security tool into a denial-of-service vector—a real concern when processing untrusted code repositories.
The database organization favors breadth over depth. You’ll find patterns for mainstream services (GitHub tokens, Stripe keys, Slack webhooks) alongside obscure SaaS providers and regional cloud platforms. This maximalist approach differs from curated tools like Trufflehog v3, which focuses on patterns they can actively verify. secrets-patterns-db includes any documented secret format, reasoning that detection coverage matters more than perfect precision. The tradeoff is higher false positive potential, but the confidence levels help manage this risk.
Gotcha
The project’s beta status isn’t just disclaimer boilerplate—there are real rough edges. Pattern organization is flat, lacking the hierarchical categorization you’d expect in a production database. There’s no severity scoring system, so a leaked root database password gets the same treatment as a low-privilege API key. The contribution guidelines note these gaps explicitly, suggesting future work on tagging systems and pattern categorization. If you’re building enterprise security workflows that need to route alerts based on impact severity, you’ll need to add that classification layer yourself.
Regex-only detection has inherent limitations that no pattern database can fully solve. Many patterns will match legitimate constants, test fixtures, or placeholder values in code. A pattern for Slack webhooks (https://hooks.slack.com/services/[A-Z0-9]{9}/[A-Z0-9]{11}/[A-Za-z0-9]{24}) will flag both real webhooks and example URLs in documentation. Context-aware validation—actually testing if a detected token works—requires active verification like Trufflehog v3’s approach. secrets-patterns-db gives you the patterns but not the verification layer, so expect to implement your own filtering logic or accept higher false positive rates than tools with built-in credential checking.
Verdict
Use if: You’re building or maintaining a security scanning platform and need comprehensive pattern coverage without vendor lock-in, you’re running multiple secret scanning tools and want consistent detection rules across them, you’re in a regulated industry where documentation of attempted detection (even if imperfect) matters for compliance, or you’re willing to invest engineering time in tuning and filtering patterns for your specific codebase characteristics. Skip if: You need out-of-the-box accuracy with minimal false positives for immediate production deployment, you’re satisfied with your current tool’s built-in patterns and don’t need broader coverage, you lack the engineering resources to evaluate and customize patterns for your environment, or you need active credential verification rather than just pattern matching. This is a power tool for security teams who want control and comprehensiveness, not a plug-and-play solution for teams wanting simplicity.