GoGrabber: Building a Three-Phase Web Reconnaissance Pipeline in Go
Hook
Most web reconnaissance workflows require at least three different tools piped together with duct tape and bash scripts. GoGrabber asks: what if the entire pipeline ran in a single binary with shared state?
Context
Web reconnaissance has always been a fragmented affair. You start with nmap or masscan to find open ports, pipe results to gobuster or ffuf for directory enumeration, then feed everything to aquatone or eyewitness for screenshots. Each tool has different input formats, output structures, and configuration quirks. The glue code becomes as complex as the reconnaissance itself.
This fragmentation creates real problems beyond inconvenience. State gets lost between tools—you can’t easily correlate which directories came from which ports without building custom parsers. False positives multiply when tools can’t share context about how targets behave. Memory and CPU spike as each tool spawns its own worker pools without coordination. GoGrabber emerged from Bonzi Technology’s internal need for a unified reconnaissance pipeline that could run on bug bounty programs and penetration tests without bash spaghetti holding it together.
Technical Insight
GoGrabber’s architecture is built around three sequential phases that share a common target state. The pipeline begins with TCP port scanning across configurable ranges (typically 80, 443, 8000-8443), then feeds discovered HTTP services into a directory bruteforcer, and finally screenshots any interesting findings. Each phase uses Go’s concurrency primitives to maximize throughput without overwhelming targets or local resources.
The most interesting technical decision is the soft-404 detection algorithm. Traditional directory bruteforcers treat any 200 OK response as a success, leading to thousands of false positives on targets that return custom error pages with success codes. GoGrabber solves this by generating random canary URLs at the start of each scan:
// Generate canary URLs with random strings
canaries := []string{
fmt.Sprintf("/randomstring_%s", generateRandomString(12)),
fmt.Sprintf("/test_%s.php", generateRandomString(8)),
fmt.Sprintf("/admin_%s/", generateRandomString(10)),
}
// Fetch canary responses to establish baseline
for _, canary := range canaries {
resp := fetchURL(baseURL + canary)
canaryBodies = append(canaryBodies, resp.Body)
}
// During bruteforcing, compare each response
for _, path := range wordlist {
resp := fetchURL(baseURL + path)
if resp.StatusCode == 200 {
similarity := calculateSimilarity(resp.Body, canaryBodies)
if similarity < 0.85 { // Not a wildcard response
recordValidPath(path, resp)
}
}
}
The similarity calculation uses a ratio-based approach (likely Levenshtein distance or similar) to determine if the response body is substantively different from the canary responses. This dramatically reduces false positives on modern web applications that return branded 404 pages, CMS installations with catch-all routing, or cloud hosting platforms with default error templates. The 0.85 threshold is configurable, allowing operators to tune sensitivity based on target behavior.
The screenshot phase leverages go-rod, a Chrome DevTools Protocol library that provides headless browser automation. Unlike older tools that spawn full Chrome processes for each target, go-rod maintains a connection pool to reused browser contexts:
type ScreenshotWorker struct {
browser *rod.Browser
pool *rod.BrowserPool
}
func (w *ScreenshotWorker) Capture(url string, timeout time.Duration) error {
page := w.browser.MustPage()
defer page.MustClose()
page.MustNavigate(url).MustWaitLoad()
// Wait for dynamic content
time.Sleep(2 * time.Second)
screenshot, err := page.Screenshot(true, nil)
if err != nil {
return fmt.Errorf("screenshot failed: %w", err)
}
return saveScreenshot(url, screenshot)
}
The tool automatically downloads a compatible Chromium binary on first run, eliminating the “works on my machine” problem that plagues browser automation tools. This is a key user experience win—no manual Chrome installation, no PATH configuration, no version mismatches.
HTTP fingerprint evasion is built into the directory enumeration phase with several anti-detection features. Custom headers can be injected via command-line flags, including the ability to load host headers from files for CDN and WAF bypass techniques. Request jitter introduces random delays between requests to avoid rate limiting and behavioral detection. User-agent rotation is supported, though the implementation appears to use a static list rather than more sophisticated browser fingerprint randomization.
The output structure reflects thoughtful design for both human analysis and tool integration. Each scan creates a timestamped directory containing subdirectories for raw HTTP responses (responses/), screenshots (screenshots/), and reports in multiple formats. The Markdown report includes inline base64-encoded screenshots, making it a self-contained artifact perfect for documentation or client deliverables. JSON and CSV outputs enable downstream processing with jq, awk, or custom scripts. XML support is included for enterprise toolchain integration, though this format sees less use in modern workflows.
Worker pool sizing is configurable for each phase independently—5 port scanning workers, 10 directory bruteforcing workers, and 5 screenshot workers by default. This lets operators tune performance based on target sensitivity and local resources. The Go runtime handles goroutine scheduling efficiently, but operators still need to be mindful of file descriptor limits and network bandwidth when scaling to thousands of targets.
Gotcha
The port scanning implementation is where GoGrabber shows its focus on breadth over depth. It’s TCP-only with no service version detection, UDP support, or advanced fingerprinting capabilities. You get a list of open TCP ports, nothing more. For initial reconnaissance this is often sufficient, but you’ll still need nmap for detailed service enumeration. The scanner also lacks masscan’s raw socket performance, making it slower on truly massive IP ranges (think /16 subnets).
Authentication handling is another significant limitation. While you can inject cookies and custom headers, there’s no built-in support for multi-step login flows, session management, or authenticated crawling. If your target requires OAuth, SAML, or even basic form-based authentication that spawns new tokens, you’re back to manual cookie extraction and header crafting. Tools like Burp Suite or authenticated Nuclei templates handle this more elegantly.
The screenshot engine, while convenient, can become a resource bottleneck quickly. Each worker spawns a Chromium process that consumes 100-200MB of RAM. With default settings (5 workers), you’re looking at 500MB-1GB just for browsers, before counting the Go runtime and response storage. On memory-constrained VPS instances or when scanning hundreds of services, this becomes problematic. There’s no built-in backpressure mechanism or memory limit enforcement, so scans can spiral into OOM scenarios if you’re not monitoring resource usage.
Verdict
Use GoGrabber if you’re conducting broad web reconnaissance across multiple targets—think bug bounty program scoping, red team initial access research, or penetration test discovery phases where you need visual documentation of findings without tool-chaining overhead. It excels at eliminating false positives from wildcard-heavy targets and generating client-ready reports with inline screenshots. The unified pipeline makes it perfect for rapid triage scenarios where you need comprehensive coverage in minutes, not hours of tool juggling. Skip it if you need surgical precision on single applications (Burp Suite’s crawler provides better depth), advanced port scanning with service detection (nmap remains unmatched), or authenticated crawling of complex web applications (ZAP or authenticated Nuclei are better choices). Also skip it if you’re working on resource-constrained systems or scanning thousands of services simultaneously—the screenshot engine’s memory footprint will bite you. GoGrabber is ‘easy mode’ reconnaissance, and that’s exactly when you should reach for it.