Evine: The Interactive Web Crawler That Treats Reconnaissance Like a REPL

Hook

Most web crawlers force you to write configuration files, run the script, wait, then realize you filtered the wrong URLs. Evine flips this: it's a web crawler you drive like a debugger, adjusting extraction patterns mid-session while watching results stream in real-time.

Context

Traditional web crawlers follow a rigid workflow: configure, execute, parse output, repeat. Tools like Scrapy excel at production scraping with pipelines and middleware, but their power comes at a cost—iteration cycles measured in minutes, not seconds. During security reconnaissance or OSINT investigations, you rarely know exactly what you're looking for until you see the site's structure. You might start hunting for API endpoints, then pivot to extracting email patterns, then decide to map the entire subdomain structure.

Evine emerged from this frustration in the security research community around 2019-2020. Built by Saeed Dehqan in Go, it reimagines the crawler as an interactive tool rather than a batch processor. Instead of editing Python files and rerunning scripts, you get a text-based UI where views switch between URL configuration, header manipulation, regex filtering, and post-crawl querying. It's the difference between writing a compiler and using a REPL—same underlying power, radically different workflow for exploration.

Technical Insight

System architecture — auto-generated

Evine's architecture centers on a view-based TUI system that separates crawling concerns into distinct, keyboard-navigable screens. The main views include URL input, options configuration, headers management, query interface, regex filtering, and response inspection. Each view manages its own state while sharing a central crawl context that persists between operations.

The concurrent crawling engine leverages Go's goroutines with configurable depth-first traversal. Here's how you'd interact with it after launching:

// Evine session flow (user interaction pattern)
// 1. Start with target URL
evine
// Navigate to URL view, enter target
https://example.com

// 2. Configure options view
// Set crawl depth: 3
// Set delay: 500ms
// Set max URLs: 1000
// Enable robots.txt respect: yes

// 3. Add custom headers
// User-Agent: Custom-Bot/1.0
// Authorization: Bearer token

// 4. Set regex filters to capture specific URL patterns
// Include: /api/.*, /admin/.*
// Exclude: .*(logout|signout).*

// 5. Start crawl and watch real-time results
// Press 'c' to begin crawling

The extraction system is where Evine differentiates itself. It supports three extraction methods that you can apply post-crawl without re-running the entire operation:

Predefined Keys: Built-in extractors for common patterns like urls, emails, query_urls, subdomains, dns, cdns. Type .keys emails in the query view, and Evine scans all crawled content for email regex patterns instantly.

File Extensions: Extract by file type with .exts js,json,xml to pull all JavaScript files, API responses, or structured data discovered during the crawl.

jQuery-like Selectors: The most powerful method uses goquery (Go's jQuery port) for DOM extraction. After crawling, query with .selector a[href*="api"] to extract all API-related links, or .selector meta[name="description"] for metadata harvesting.

The filtering system uses Go's regexp package for URL inclusion/exclusion during crawling. This happens at traversal time, not extraction time—critical for managing crawl scope on large sites. The concurrent goroutine pool respects these filters before fetching, preventing wasted bandwidth:

// Conceptual filter application (from Evine's internals)
for url := range urlQueue {
    if includeRegex.MatchString(url) && !excludeRegex.MatchString(url) {
        go func(u string) {
            resp := fetch(u)
            extractLinks(resp)
            applyDelay()
        }(url)
    }
}

The delay mechanism between requests is configurable per-crawl, making Evine suitable for both aggressive fuzzing (0ms delay) and polite scraping (1000ms+ delays). Combined with depth limiting and max URL caps, you can tune the crawler's behavior from surgical reconnaissance to exhaustive site mapping.

One clever architectural decision is separating crawl execution from data extraction. The crawler fetches and stores responses in memory, then extraction queries run against this cached data. This means you can experiment with different selectors, keys, and file extension filters without re-crawling—essential when working with rate-limited targets or conducting post-mortem analysis.

For OSINT workflows, Evine integrates external sources like robots.txt parsing (automatically respects disallow rules if configured) and Wayback Machine lookups to discover historical URLs no longer linked from the current site. This bridges multiple reconnaissance techniques in a single interface, eliminating the need to chain separate CLI tools with pipes and temporary files.

Gotcha

Evine's biggest limitation is its abandonment—the repository hasn't seen meaningful updates since 2020, and the README explicitly requires Go 1.13.x, which is four major versions behind current releases. While it may compile with newer Go versions, expect rough edges and unpatched bugs. This matters more than usual because web standards evolve rapidly; lack of maintenance likely means broken parsing on modern sites.

The jQuery selector implementation has documented quirks: it doesn't handle single quotes in attribute selectors and chokes on extra whitespace. More critically, there's no JavaScript execution support. Evine fetches raw HTML responses and parses the initial DOM, making it blind to SPAs, dynamically loaded content, or sites that render primarily through React/Vue/Angular. For security testing modern web applications, this eliminates entire attack surfaces from analysis. The tool also lacks distributed crawling capabilities, per-domain rate limiting (only global delays), or plugin architectures for custom extraction logic. If your target requires authentication beyond simple headers, session management, or cookie handling across complex flows, you'll spend more time working around Evine's limitations than benefiting from its interactivity.

Verdict

Use if: You're conducting OSINT reconnaissance or security testing on static or server-rendered sites where you need rapid iteration on extraction patterns without writing code. Evine excels at exploratory crawling during penetration tests—starting broad, then narrowing filters as you understand the target's structure. The interactive query system after crawling is genuinely useful for hypothesis-driven investigation: "Are there any admin URLs?", "What emails appear?", "Which JavaScript files load?". It's also excellent for learning web scraping concepts, as the TUI makes data flow and filtering logic visible in ways that framework code obscures. Skip if: Your targets are JavaScript-heavy SPAs, you need production-grade reliability, or this isn't a throwaway analysis tool. The abandoned codebase is a non-starter for anything requiring ongoing maintenance or modern web compatibility. For serious scraping, use Colly (Go) or Scrapy (Python) with their robust ecosystems. For security reconnaissance specifically, switch to actively maintained alternatives like GoSpider or Hakrawler, which handle modern sites better and integrate cleanly into automated pipelines. Evine is a clever idea frozen in 2020—valuable for understanding interactive crawler design, but superseded by tools that kept evolving.

Evine: The Interactive Web Crawler That Treats Reconnaissance Like a REPL

Evine: The Interactive Web Crawler That Treats Reconnaissance Like a REPL

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Evine: The Interactive Web Crawler That Treats Reconnaissance Like a REPL

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

Inside awesome-selfhosted: How a 292K-Star GitHub List Became the Self-Hosting Movement's Central Nervous System

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]