Back to Articles

VHostScan: Fuzzy Logic and Virtual Host Discovery for Penetration Testing

[ View on GitHub ]

VHostScan: Fuzzy Logic and Virtual Host Discovery for Penetration Testing

Hook

Most virtual host scanners give up when they hit a catch-all configuration that returns valid responses for every subdomain. VHostScan uses fuzzy logic to find the needles in that haystack.

Context

Virtual host enumeration is a fundamental reconnaissance technique in web application security testing. The problem is simple: multiple websites can run on a single IP address, differentiated only by the Host header in HTTP requests. To discover these hidden sites, you send requests with different Host values and analyze the responses. But here’s where traditional tools break down.

Modern web applications increasingly use catch-all configurations that return valid HTTP 200 responses for any subdomain, often with dynamically generated content that changes on each request. This happens with cloud platforms, CDNs, and frameworks that implement custom routing logic. When every request returns a 200 status with slightly different content, simple string comparison fails. You need intelligent differentiation between legitimately unique virtual hosts and catch-all noise. VHostScan emerged from real-world penetration testing needs, specifically targeting CTF environments like HackTheBox and bug bounty programs where virtual host discovery often reveals admin panels, staging environments, and development servers that weren’t meant to be public.

Technical Insight

Fuzzy Logic Engine

Target & Wordlist

--reverse-lookup

Additional Hosts

Host Header Variations

Response

--fuzzy-logic enabled

SequenceMatcher

Ratio >= Threshold

No fuzzy logic

Unique Response

Catch-all/Duplicate

CLI Input

Argument Parser

Request Scanner

Reverse DNS Lookup

HTTP/HTTPS Requester

Response Analyzer

Fuzzy Matcher

Similarity Comparison

Filter Decision

Valid VHost Detected

Discard Result

Output Formatter

Results: JSON/Grep/Normal

System architecture — auto-generated

VHostScan’s architecture centers on response similarity detection using fuzzy string matching. When you enable fuzzy logic mode with the --fuzzy-logic flag, the tool takes a base request to establish a fingerprint of the catch-all response, then compares subsequent responses using difflib’s SequenceMatcher algorithm from Python’s standard library. This calculates a similarity ratio between the base response and each scanned response.

Here’s a practical example of how you’d scan a target with catch-all detection:

# Basic scan with fuzzy logic enabled
python VHostScan.py -t example.com -w wordlists/virtual-host-scanning.txt --fuzzy-logic

# With custom similarity threshold (default is 50%)
python VHostScan.py -t example.com -w wordlist.txt --fuzzy-logic --fuzzy-logic-tolerance 70

# Scan with reverse DNS expansion to discover additional hosts
python VHostScan.py -t 192.168.1.100 -w wordlist.txt --fuzzy-logic --reverse-lookup

The fuzzy logic tolerance setting is critical. A value of 70 means responses must differ by at least 30% to be considered unique. Too low, and you get false positives from minor variations in dynamic content. Too high, and you miss valid virtual hosts with similar layouts. The default 50% works well for most scenarios, but cloud platforms with heavy template reuse often need 70-80%.

The tool’s pivoting capability solves a specific problem in penetration testing: when you’ve gained access to an internal network through a compromised host or VPN, you need to scan internal web servers, but HTTP Host headers must match the actual hostname for virtual host routing to work correctly. VHostScan handles this with separate real port and port specifications:

# Scanning through an SSH tunnel on local port 8080 that forwards to remote port 80
python VHostScan.py -t internal.example.com -p 80 --real-port 8080 -w wordlist.txt

This sends requests to localhost:8080 while setting the Host header to subdomain.internal.example.com:80, ensuring proper routing on the target server.

The reverse lookup feature adds another dimension. When enabled, VHostScan performs reverse DNS queries on the target IP, extracts discovered hostnames, and automatically adds related permutations to the wordlist. If reverse DNS reveals mail.company.com, the tool generates variations like mail-staging, mail-dev, mail-admin by combining the discovered prefix with common suffixes. This creates a feedback loop where DNS information expands the attack surface during the scan.

Wordlist handling demonstrates practical penetration testing experience. The repository includes curated wordlists like hackthebox-basic.txt and virtual-host-scanning.txt that prioritize high-probability targets. The wordlists support variable substitution with %s placeholders:

admin-%s
%s-admin
staging-%s
dev-%s
test-%s

When scanning with -t example.com, these expand to admin-example, example-admin, staging-example, etc. This pattern-based approach reduces wordlist size while maintaining coverage.

Output formatting caters to different workflows. Standard output is human-readable, but grepable mode (--grepable) produces colon-delimited output for parsing with awk or grep. JSON output (--json) enables integration with other tools in automated pipelines. The tool can also accept targets via STDIN, allowing chain operations like:

cat targets.txt | python VHostScan.py -w wordlist.txt --fuzzy-logic --json > results.json

Under the hood, VHostScan implements fingerprint evasion with custom User-Agent strings and the --user-agent flag. For WAF bypass scenarios, you can inject custom headers with --add-headers to mimic legitimate traffic patterns or exploit header parsing inconsistencies.

Gotcha

The single-threaded architecture is VHostScan’s most significant limitation. While the --rate-limit flag prevents overwhelming targets with requests, there’s no concurrent scanning option. For wordlists with thousands of entries, this means scan times measured in minutes or hours rather than seconds. Modern alternatives like ffuf can scan the same wordlist 10-50x faster using concurrent goroutines. If you’re scanning large infrastructure with 10,000+ subdomains to test, VHostScan’s thoroughness comes at a brutal time cost.

Fuzzy logic accuracy depends entirely on response consistency. If the target application generates truly random content on each request—like timestamps, session tokens, or random product listings—the similarity comparison breaks down. You’ll need to manually adjust tolerance values through trial and error, and even then, you might miss valid hosts or get false positives. The tool provides no automatic calibration or machine learning approaches to optimize the threshold. Dynamic single-page applications that render different content client-side also confuse the HTML comparison logic since VHostScan only analyzes the initial server response, not JavaScript-rendered content. Additionally, the tool focuses exclusively on HTTP Host header manipulation and doesn’t handle TLS SNI enumeration, meaning you’ll miss virtual hosts that only respond to HTTPS requests with proper SNI values. For subdomain takeover scenarios or CNAME-based virtual hosting, you’ll need completely different tooling.

Verdict

Use if: You’re conducting targeted penetration tests or CTF challenges where you’ve hit a catch-all virtual host configuration and need intelligent differentiation between legitimate hosts and noise. The fuzzy logic and reverse DNS features make this ideal for HackTheBox machines, OSCP exam environments, or bug bounty programs with complex routing. It’s also valuable when working through pivots and tunnels where you need precise port and header control. The curated wordlists and pattern substitution save time on focused engagements. Skip if: You need high-speed subdomain enumeration across large attack surfaces—tools like ffuf or gobuster will finish in a fraction of the time. Also skip if you’re primarily doing DNS-based subdomain discovery rather than virtual host enumeration, or if your targets use client-side rendering that VHostScan’s HTML comparison can’t properly analyze. For production environments with strict rate limiting or IDS/IPS monitoring, the single-threaded scanning is actually an advantage, but for internal network sweeps or time-boxed assessments, the speed penalty makes it impractical.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/codingo-vhostscan.svg)](https://starlog.is/api/badge-click/cybersecurity/codingo-vhostscan)