RecurseBuster: Why HEAD-First Requests Make Recursive Content Discovery Faster
Hook
Most content discovery tools send GET requests for every path they test. RecurseBuster sends HEAD requests first, cutting bandwidth by 90% while actually discovering more content through its hybrid spider approach.
Context
Web application penetration testing relies heavily on content discovery—finding hidden directories, endpoints, and files that aren't linked from the main application. Traditional tools like DirBuster and Gobuster use wordlists to brute-force paths, sending thousands of requests to check if resources exist. This approach has two fundamental problems: it's bandwidth-intensive (downloading full responses for every 404), and it only finds what's in your wordlist.
RecurseBuster emerged from the recognition that modern web applications have deep directory structures that require recursive scanning, and that discovered pages themselves contain valuable intelligence about other paths. Written in Go by C-Sto, it combines the speed of concurrent programming with a smarter request strategy: validate existence with lightweight HEAD requests, then fetch interesting content with GET requests. The tool also extracts links from responses, turning every discovered page into a new source of targets—a spider-assisted approach that catches resources no wordlist would contain.
Technical Insight
The architectural cleverness of RecurseBuster lies in its HEAD-first optimization combined with a work queue that manages recursive discovery. When you launch a scan, the tool initializes a thread pool (configurable via -t flag) and begins walking through your wordlist against the target URL. For each potential path, it sends a HEAD request first—a minimal HTTP method that returns only headers, no body content. Only when it receives a positive response (2xx or 3xx status) does it follow up with a GET request to retrieve the actual content.
This seems counterintuitive at first: you're sending twice as many requests. But the math works in your favor. A typical 404 response might include 5-10KB of HTML (custom error pages, styling, scripts). A HEAD request for that same resource consumes perhaps 500 bytes. When 95% of your brute-force attempts hit non-existent paths, you've just reduced your bandwidth consumption by an order of magnitude. More importantly, you've reduced server load, making your scan less likely to trigger rate limiting or WAF rules.
Here's how you might run a basic recursive scan with soft 404 detection:
# Basic recursive scan with 10 threads
recursebuster -u https://target.com -w /usr/share/wordlists/common.txt -t 10 -o results.txt
# With canary-based soft 404 detection and Burp integration
recursebuster -u https://target.com -w wordlist.txt \
-c "uniquestring123" \
-proxy http://127.0.0.1:8080 \
-sitemap sitemap.xml \
-t 5
The soft 404 detection is particularly elegant. Many applications return 200 OK responses for non-existent paths, rendering status code checks useless. RecurseBuster sends a canary request—a path that definitely shouldn't exist, like /uniquestring123randompath—and captures that response as a baseline. When subsequent requests return 200, it compares the response body against the canary using similarity ratios. If the responses are nearly identical, it's likely a soft 404 and gets filtered out. This approach adapts to each application's error handling without manual configuration.
The recursive component operates through a discovered directory queue. When the tool finds a directory (identified by trailing slashes or specific response patterns), it doesn't just note it and move on—it adds that directory to a work queue. The thread pool then runs the entire wordlist against that new directory, discovering nested resources. Each new directory spawns another round of enumeration, creating a breadth-first exploration of the application's structure. You can control this behavior with blacklists (-bl flag) to exclude paths like /logout or whitelists (-wl flag) to constrain scanning to specific subtrees.
The spider functionality extracts href attributes from HTML responses and queues them as additional targets. This catches paths that would never appear in a wordlist: timestamped directories, UUID-based paths, or application-specific naming conventions. Combined with the recursive directory discovery, you get coverage that pure brute-forcing cannot achieve:
// Simplified spider logic (conceptual)
if isHTML(response) {
links := extractLinks(response.Body)
for _, link := range links {
if inScope(link, whitelist, blacklist) {
queue.Add(link)
}
}
}
The tool maintains state across the entire scan, tracking visited URLs to avoid duplicate work. This deduplication is crucial in recursive scanning where the same path might be discovered through multiple routes—once from the wordlist against a parent directory, once from a link extraction, and perhaps again from a different parent directory. Without careful state management, you'd waste resources re-scanning the same endpoints repeatedly.
Interactive control is another thoughtful feature. During a long recursive scan, you might realize certain paths are uninteresting or that you need to adjust scope. Pressing Ctrl+X drops you into an interactive mode where you can add blacklist entries or whitelist additional domains without killing the entire scan. This responsiveness is valuable in time-constrained assessments where restarting a scan means losing hours of progress.
Gotcha
RecurseBuster defaults to single-threaded execution (-t 1), making it surprisingly slow out of the box. This conservative default exists to avoid triggering WAF rules or rate limiting, but many developers expect content discovery tools to be fast by default. You'll need to manually tune the thread count based on your target's tolerance, and there's no automatic throttling or adaptive rate limiting to help find that sweet spot. Getting the right balance between speed and stealth requires experience and experimentation.
Authentication support is limited to basic HTTP auth, which severely constrains where you can use the tool effectively. Modern web applications use session cookies, JWT tokens, OAuth flows, or API keys—none of which RecurseBuster handles natively. You can't pass custom headers for API tokens or maintain session state across requests. For applications behind anything more sophisticated than basic auth, you'll need to proxy through Burp Suite and handle authentication separately, adding complexity to your workflow. The spider functionality also only follows same-domain links by default, so multi-domain applications or microservice architectures require manual whitelist configuration to achieve complete coverage.
Verdict
Use if: You're conducting penetration tests or security assessments against web applications with deep directory structures, especially when static wordlists prove insufficient and you need the hybrid spider-assisted discovery to catch dynamically named paths. The HEAD-first optimization makes it particularly valuable when scanning over high-latency connections or when you need to minimize bandwidth and server load to stay under WAF thresholds. It's also ideal when you need interactive control to adjust scope mid-scan during time-constrained assessments. Skip if: Your targets use modern authentication mechanisms beyond basic auth, as you'll fight the tool constantly to maintain session state. Also skip if you need active maintenance and community support—the project appears dormant and the author has suggested merging functionality into other tools. For simple, non-recursive directory enumeration, gobuster or ffuf will serve you better with less configuration overhead. If you want similar recursive capabilities with more active development, feroxbuster provides comparable features with a more modern codebase.