Back to Articles

WhatWeb: How 1800+ Ruby Plugins Turn HTTP Responses Into Security Intelligence

[ View on GitHub ]

WhatWeb: How 1800+ Ruby Plugins Turn HTTP Responses Into Security Intelligence

Hook

While most security scanners make a dozen requests and call it reconnaissance, WhatWeb can fingerprint WordPress installations even when they’ve stripped out every obvious identifier—using favicon hashes, default file paths, and link structure analysis across 15+ detection vectors.

Context

Web fingerprinting emerged as a critical reconnaissance phase in security assessments because understanding a target’s technology stack reveals attack surfaces before you probe for vulnerabilities. In the early 2000s, identifying a CMS meant manually checking meta tags or guessing default paths. This approach failed the moment administrators removed generator tags or changed default configurations.

WhatWeb, developed by Andrew Horton and Brendan Coles, took a different approach: treat fingerprinting as a probabilistic problem requiring multiple evidence sources. Released under GPLv2 and written in Ruby, it addresses the core tension in reconnaissance—stealth versus thoroughness. Public security researchers need passive identification that doesn’t trigger alarms, while penetration testers on authorized engagements need aggressive enumeration. Most tools force you to choose one philosophy; WhatWeb bakes both into its architecture through an aggression-level system that fundamentally changes how the scanner behaves.

Technical Insight

Detection

Level 1: Single GET

Level 2-4: Multiple probes

JSON/XML/SQL

Target URLs

WhatWeb Core Engine

Aggression Level 1-4

HTTP Request Handler

HTTP Response Data

Plugin Engine

1800+ Modules

Regex Pattern Tests

Header Analysis

File/Path Checks

Cookie Detection

Certainty Aggregator

Output Formatter

Fingerprint Results

System architecture — auto-generated

WhatWeb’s architecture centers on a plugin system where each of the 1800+ modules contains multiple independent tests that vote on technology presence with varying certainty levels. When you scan reddit.com, the output isn’t just pattern matching—it’s aggregated evidence:

$ ./whatweb reddit.com
http://reddit.com [301 Moved Permanently] Country[UNITED STATES][US], 
  HTTPServer[snooserv], IP[151.101.65.140], 
  RedirectLocation[https://www.reddit.com/], 
  UncommonHeaders[retry-after,x-served-by,x-cache-hits,x-timer], 
  Via-Proxy[1.1 varnish]
https://www.reddit.com/ [200 OK] 
  Cookies[edgebucket,eu_cookie_v2,loid,rabt,rseor3,session_tracker,token], 
  Country[UNITED STATES][US], 
  Email[banner@2x.png,snoo-home@2x.png], Frame, HTML5, 
  HTTPServer[snooserv], HttpOnly[token], IP[151.101.37.140], 
  Open-Graph-Protocol[website], Script[text/javascript], 
  Strict-Transport-Security[max-age=15552000; includeSubDomains; preload], 
  Title[reddit: the front page of the internet]

This single request (aggression level 1) extracts cookies, headers, security policies, embedded email addresses from asset paths, geographic data, and technology markers. Each detection comes from a separate plugin—HTTPServer, Cookies, Open-Graph-Protocol—running pattern matches against the response.

The aggression-level system fundamentally alters scanner behavior. Level 1, dubbed ‘stealthy’, makes exactly one HTTP GET request to the target URL—suitable for OSINT on public websites where you can’t risk detection. Level 3 (‘Aggressive’) triggers additional requests if a level 1 plugin is matched, while level 4 (‘Heavy’) makes aggressive tests from all plugins for all URLs. A WordPress plugin at higher aggression levels might request /wp-login.php, /wp-content/plugins/, and favicon files, then hash the favicon against known WordPress versions. This conditional branching means aggressive scans generate substantially more traffic but produce definitive version identification.

The plugin architecture itself uses Ruby’s flexibility for pattern matching. Each plugin can define regex patterns, file existence checks, MD5 hashes, HTTP header analysis, and custom Ruby code. WhatWeb supports fuzzy matching and certainty awareness—plugins don’t just return boolean matches, they indicate confidence levels. This matters when multiple plugins detect conflicting CMS platforms; certainty scores determine which identification appears in output.

Output flexibility reveals WhatWeb’s positioning as infrastructure tooling, not just a CLI utility. Beyond human-readable formats, it supports direct database integration with MongoDB and ElasticSearch, SQL INSERT statements for relational databases, and MagicTree XML format for penetration testing report workflows. This isn’t accident—it’s designed for scanning multiple hosts and feeding results into analysis pipelines:

./whatweb --input-file targets.txt --log-json=results.json \
  --aggression 3

The documentation notes that performance tuning allows control over concurrent website scanning with automatic output optimization. However, the README explicitly warns that with high thread counts, cookie handling may impact performance—the documented workaround is --no-cookies, trading session tracking for parallelism.

Protocol handling shows maturity from real-world deployment. WhatWeb supports dual-protocol scanning where simple hostnames automatically test both HTTP and HTTPS. It chains through proxies including TOR for anonymized scanning, handles HTTP redirects with configurable follow behavior, supports custom headers for bypassing basic protections, and includes IDN (International Domain Name) support for non-ASCII domains. These aren’t headline features—they’re the friction points you discover after scanning diverse production environments.

Gotcha

Ruby as an implementation language creates deployment overhead compared to single-binary Go tools. You need a Ruby runtime, gem dependencies, and the full plugin directory structure—fine for Kali Linux where it ships pre-installed, awkward for containerized scanning infrastructure or air-gapped environments where you’re copying tools onto jump boxes.

The aggression-level system’s power becomes a liability on modern infrastructure. Aggressive scans (levels 3-4) generate HTTP request patterns that appear likely to trigger recognition by commercial WAFs, though specific WAF behavior is not documented in the README. The tool lacks explicitly documented adaptive throttling or retry logic for rate limiting scenarios—you would need to manage delays manually or accept that aggressive scans against protected targets may face challenges. Additionally, scanning large IP ranges at high aggression generates enough traffic to potentially trigger IDS/IPS alerts, which could defeat the purpose in red team scenarios where you need to remain undetected until exploitation. The documentation acknowledges this by positioning aggressive modes specifically for authorized penetration tests.

Verdict

Use WhatWeb if you need comprehensive web technology fingerprinting with tactical control over detection methods—it excels at authorized penetration testing reconnaissance where you can leverage aggressive modes for definitive version identification, or OSINT collection on public infrastructure where stealthy mode provides passive enumeration. The 1800+ plugin library and multiple output formats make it valuable when feeding reconnaissance into automated analysis pipelines or compliance reporting workflows. Skip it if you need lightweight tooling for casual reconnaissance (Wappalyzer or httpx are faster for basic checks), prefer modern compiled tools for large-scale internet scanning (Go-based alternatives may handle concurrency differently), or primarily scan infrastructure behind aggressive WAF protection where the aggressive modes may trigger blocking before returning useful results. The Ruby dependency and HTTP-focused design also make it less suitable for environments where you need a single-binary scanner or require protocol coverage beyond web technologies.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/urbanadventurer-whatweb.svg)](https://starlog.is/api/badge-click/cybersecurity/urbanadventurer-whatweb)