Hunting Phishing Kits at Scale: How Kitphishr Weaponizes Directory Traversal for Threat Intelligence
Hook
Phishing kit developers consistently make one critical mistake: they leave their source code in open directories on compromised servers, complete with their real identities, victim logs, and infrastructure details—free intelligence for anyone who knows where to look.
Context
When threat researchers analyze phishing campaigns, the real gold isn’t just identifying malicious URLs—it’s recovering the phishing kit source code itself. These kits, typically distributed as zip files, contain everything an attacker uses: login panels, credential harvesting scripts, email templates, and critically, operational security failures. Attackers often hardcode personal email addresses, leave debugging comments with real names, or include log files showing previous victims. The problem is scale: manually checking thousands of phishing URLs daily for exposed directories is humanly impossible.
Traditional phishing feeds like PhishTank and OpenPhish identify malicious URLs, but they don’t automate the tedious reconnaissance work of directory traversal. Researchers needed to manually walk up URL paths (going from /bank/login.php to /bank/ to /), checking each level for open directories containing the telltale .zip files. Kitphishr automates this intelligence collection workflow, transforming hours of manual work into a concurrent, high-throughput operation that can process hundreds of URLs simultaneously while you focus on analysis.
Technical Insight
Kitphishr’s architecture is a textbook example of Go’s concurrency model applied to I/O-bound web reconnaissance. The tool implements a producer-consumer pattern where URL sources (stdin or aggregated feeds) feed into a worker pool that performs concurrent HTTP requests. Each worker doesn’t just check the submitted URL—it intelligently walks up the directory tree, testing parent paths for open directories that might expose the kit’s zip file.
The concurrent design is critical because phishing URLs are ephemeral. Compromised servers get cleaned, domains get taken down, and kits get removed within hours or days. Processing URLs sequentially would be too slow—by the time you finish, many targets might be offline. With configurable concurrency (defaulting to 50, scalable to 250+), Kitphishr can process large workloads quickly.
Here’s the recommended invocation that balances speed with reliability:
# Process URLs from a custom list with high concurrency
cat suspicious_urls.txt | kitphishr -c 250 -v -d -o phishing_kits
# Or let kitphishr auto-fetch from all feeds
kitphishr -c 250 -v -d -o phishing_kits
The -c 250 flag spins up 250 concurrent workers, -v provides verbose logging to track traversal attempts, -d enables automatic download of discovered kits, and -o specifies the output directory. The verbosity flag is particularly valuable during initial runs to understand hit rates and adjust concurrency based on your network capacity.
What makes kitphishr particularly clever is its feed aggregation strategy. Rather than requiring manual curation, it automatically pulls from four authoritative sources: PhishTank (community-verified phishing), PhishStats (automated detection), OpenPhish (commercial feed), and Phishing.Database (crowdsourced daily updates). This multi-feed approach captures different attacker behaviors—some campaigns appear in one feed hours before others. For PhishTank integration, you can optionally configure an API key as an environment variable:
export PT_API_KEY=your_phishtank_api_key
Though note that PhishTank registration is currently disabled, so only existing account holders can leverage authenticated access. Without it, kitphishr still functions using public endpoints from the other three feeds.
The timeout configuration (-t flag) deserves special attention. Phishing kits can vary in size, and the default timeout works for most scenarios. If you’re hunting kits on slow servers or need to ensure downloads of larger files complete, you can increase the timeout:
kitphishr -c 100 -t 120 -d -o kits # 120-second timeout for large files
From a defensive intelligence perspective, the downloaded kits are treasure troves. A typical phishing kit might reveal the attacker’s operational email (hardcoded in scripts), victim log locations in open directories, or authentication panels the attacker uses to check collected credentials. This information enables victim notification, infrastructure takedowns, and attribution research that wouldn’t be possible from just analyzing the phishing URL itself.
Gotcha
Kitphishr’s biggest limitation is the PhishTank API key requirement for optimal performance. With PhishTank registration currently disabled, new users can’t access authenticated feeds, limiting them to the three public sources. This isn’t fatal—OpenPhish, PhishStats, and Phishing.Database still provide substantial daily URLs—but it reduces coverage compared to running with full access. If you’re setting up a new threat intelligence pipeline in 2024 and lack an existing PhishTank account, you’re missing some potential phishing URLs from that feed.
Network footprint is another consideration. Running at -c 250 generates 250 simultaneous HTTP requests, which may trigger monitoring or rate limiting in some network environments. If you’re operating in an environment with strict egress monitoring, you may need to run at lower concurrency levels. The tool prioritizes speed and throughput, which is its design goal for rapid intelligence collection.
Verdict
Use kitphishr if you’re a threat intelligence analyst, security researcher, or incident responder who needs to systematically collect phishing kit artifacts at scale for attribution work, victim notification campaigns, or defensive research into attacker TTPs. It’s particularly valuable for organizations running SOCs that need to quickly assess whether a reported phishing campaign is targeting your brand and what data collection infrastructure is involved. The automated feed aggregation and concurrent processing turn what used to be a multi-hour manual process into a much faster automated workflow. Skip it if you’re looking for a phishing detection or blocking tool (this collects artifacts post-detection, not pre-detection), or if you lack the legal and ethical framework to handle potentially malicious code and stolen credential data that phishing kits contain. Also consider your network environment’s tolerance for high volumes of concurrent requests to potentially suspicious domains when deciding on appropriate concurrency settings.