Back to Articles

Building a Real-Time Phishing Detector with Certificate Transparency Logs

[ View on GitHub ]

Building a Real-Time Phishing Detector with Certificate Transparency Logs

Hook

Every minute, thousands of TLS certificates are issued globally—and buried in that firehose of data are phishing domains being set up right now, before they’ve sent a single malicious email.

Context

Phishing attacks have evolved beyond crude email scams. Modern attackers invest in convincing infrastructure: domains that look legitimate, valid TLS certificates that trigger the browser’s green lock icon, and carefully crafted landing pages. By the time security teams discover these sites through traditional means—user reports, web crawlers, or blocklists—the damage is often done. The attackers’ Achilles’ heel is Certificate Transparency (CT), a public logging system that requires certificate authorities to record every TLS certificate they issue. This creates an unusual opportunity: you can watch phishing infrastructure being built in real-time.

Phishing Catcher, created by security researcher x0rz, taps into this data stream through CertStream, an API that broadcasts certificate issuances as they happen. The tool doesn’t wait for phishing sites to go live or for victims to report them. Instead, it catches suspicious domains at registration time by analyzing certificate common names and Subject Alternative Names (SANs) against configurable heuristics. It’s a working proof-of-concept that demonstrates how CT logs can serve as an early-warning system, giving defenders a chance to identify threats before they’re weaponized.

Technical Insight

Configuration

Scoring Engine

TLS cert feed

domain name

keyword scores

custom rules

score >= 65

score >= 80

score >= 90

CertStream API

Certificate Listener

Domain Scorer

suspicious.yaml

external.yaml

stdout Alerts

System architecture — auto-generated

Phishing Catcher’s architecture is remarkably straightforward: it connects to the CertStream API, receives a real-time feed of certificate data, scores each domain using keyword matching, and outputs alerts when thresholds are met or exceeded. The scoring engine is the heart of the tool, implemented as a heuristic system.

The configuration lives in two YAML files. The default suspicious.yaml contains baseline patterns, while external.yaml allows users to add custom rules without modifying the core configuration. Here’s how scoring works:

keywords:
  'login': 25

Both configuration files contain two YAML dictionaries: keywords and tlds. The keys are the strings to match, and the values are the scores to assign when that string is found in a certificate’s domain name. For example, a score of 25 is added when the keyword ‘login’ appears in a TLS certificate domain name.

The tool outputs three severity levels based on score thresholds: “Potential” (65+ points), “Likely” (80+ points), and “Suspicious” (90+ points). The README notes that the score_domain() function contains the scoring algorithm details, though the specific heuristics beyond keyword matching aren’t fully documented.

Running Phishing Catcher requires minimal setup. After installing dependencies with pip install -r requirements.txt, you simply execute the script:

$ ./catch_phishing.py

The tool connects to CertStream and begins printing alerts to stdout. There’s no database, no web interface, no configuration beyond editing YAML files. This Unix-philosophy simplicity makes it trivial to pipe output into other tools, redirect to log files, or integrate with SIEM systems.

The repository also includes Docker support for environments where Python dependency management is challenging. Building the container is straightforward:

docker build . -t phishing_catcher

This approach encapsulates all dependencies and works identically across platforms, which is particularly valuable when deploying on macOS or systems where Python environment issues are common.

The real power of Phishing Catcher lies in its customizability. Security teams can tune the YAML configuration to their specific threat landscape. Monitoring a financial institution? Boost scores for banking keywords. Protecting a SaaS platform? Add your brand name with high point values. The external.yaml override system means you can maintain custom rules separately from the upstream configuration, making updates easier. You’re essentially building a domain-specific language for phishing detection, encoded as weighted keywords that reflect your organization’s risk profile.

Gotcha

Phishing Catcher’s simplicity comes with significant limitations. The author explicitly labels it a “working PoC,” and that caveat matters. The keyword-based heuristic approach will likely generate false positives. Legitimate domains containing high-scoring keywords may trigger alerts even when they’re perfectly benign. You’ll spend considerable time tuning thresholds and keyword scores to find a balance between catching threats and drowning in noise.

The tool only sees domains that obtain TLS certificates, which is admittedly the majority of modern phishing sites (the green lock increases victim trust). However, it misses HTTP-only phishing sites entirely and can’t detect attacks using compromised legitimate domains. The detection window is also limited: once a certificate is issued, attackers can use it for months. Phishing Catcher alerts you at issuance time, but if you miss that alert or can’t act quickly, the domain remains a threat.

There’s no persistence layer, no alert deduplication, no integration with takedown services or threat intelligence platforms. Output goes to stdout and nowhere else unless you build that infrastructure yourself. The tool doesn’t track which domains you’ve investigated, doesn’t suppress alerts for known-good domains, and doesn’t provide any analysis beyond the initial score. It’s a detection primitive, not a complete solution. You’ll need to build workflows around it—storing alerts in a database, enriching them with additional intelligence, connecting to ticketing systems, and implementing response procedures. That engineering work may exceed the effort of using a commercial threat intelligence platform.

Verdict

Use Phishing Catcher if you’re building custom threat intelligence pipelines and want a lightweight, hackable foundation for CT log monitoring. It’s ideal for security researchers studying phishing trends, red teams testing detection systems, or organizations with the engineering resources to build workflows around raw alerts. The YAML-based configuration makes it easy to encode domain-specific knowledge, and the simple stdout-based output integrates cleanly with Unix toolchains and log aggregation systems. Skip it if you need production-ready phishing detection with low false positives, automated response workflows, or comprehensive threat coverage beyond certificate transparency. Commercial alternatives like DomainTools or RiskIQ provide broader data sources, lower noise, and mature integrations at the cost of flexibility and transparency. Treat Phishing Catcher as a starting point that requires significant tuning and infrastructure investment, not a turnkey solution.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/x0rz-phishing-catcher.svg)](https://starlog.is/api/badge-click/developer-tools/x0rz-phishing-catcher)