VHostScan: Finding Hidden Web Apps Behind Shared IP Addresses
Hook
A single IP address can host hundreds of web applications, each accessible only if you know the exact domain name. VHostScan finds them by exploiting how HTTP Host headers work—and it's smarter about catch-all responses than you might expect.
Context
Virtual hosting changed the internet by allowing multiple domains to share a single IP address. When your browser connects to a website, it includes a Host header telling the server which domain you're requesting. The server routes your request to the appropriate application based on this header. For penetration testers and security researchers, this architecture creates a problem: you might discover an IP address running a web server, but have no idea what other applications are hiding behind it.
Traditional subdomain enumeration tools rely on DNS records—they ask nameservers what domains exist. But virtual hosts don't always have DNS records, especially in internal networks, development environments, or misconfigured servers. A developer might configure nginx to respond to 'admin.internal.company.com' without creating a DNS entry, assuming obscurity provides security. Bug bounty hunters and CTF players encounter this constantly: you find an IP, you know there's more to discover, but DNS enumeration comes up empty. VHostScan solves this by brute-forcing Host headers directly against the IP address, bypassing DNS entirely. More importantly, it handles the hardest part: distinguishing real virtual hosts from catch-all configurations that return the same default page for any domain you throw at them.
Technical Insight
VHostScan's architecture centers on content comparison rather than simple response code analysis. Most naive virtual host scanners just check HTTP status codes—if you get a 200 OK, they flag it as a valid host. This fails immediately on servers with catch-all configurations that return 200 for every domain, serving a default page regardless of the Host header. VHostScan takes a different approach: it establishes a baseline by requesting nonsense hostnames, then compares subsequent responses against this baseline using content similarity algorithms.
The tool's detection logic works in layers. First, it makes a baseline request with a random Host header to capture what the default response looks like. Then it iterates through your wordlist, making requests for each potential hostname. For each response, it calculates a similarity ratio comparing the content to the baseline. If the content differs significantly, you've found a real virtual host. Here's the basic request flow:
# Simplified conceptual example (not actual VHostScan code)
import requests
from difflib import SequenceMatcher
def scan_vhost(ip, port, hostname):
# Make request with custom Host header
headers = {'Host': hostname}
url = f'http://{ip}:{port}/'
try:
response = requests.get(url, headers=headers, timeout=10)
return {
'hostname': hostname,
'status': response.status_code,
'content': response.text,
'length': len(response.content)
}
except requests.RequestException as e:
return None
def compare_responses(baseline, candidate):
# Use fuzzy matching to handle dynamic content
similarity = SequenceMatcher(
None,
baseline['content'],
candidate['content']
).ratio()
# Different status code or significantly different content
if (baseline['status'] != candidate['status'] or
similarity < 0.95):
return True # Likely a real vhost
return False
The fuzzy matching component is what separates VHostScan from simpler tools. Modern web applications inject timestamps, CSRF tokens, or random identifiers into their pages. A naive byte-for-byte comparison would flag these as different hosts even when they're serving the same default page. VHostScan uses the difflib library's SequenceMatcher to calculate how similar two pages are, letting you tune the threshold with the --fuzzy-logic flag. If a page is 95% similar but has a different timestamp in the footer, VHostScan correctly identifies it as the same catch-all page.
The reverse lookup feature adds another dimension. When enabled with -oN, VHostScan performs PTR record lookups on the target IP, discovering any DNS entries that point back to it. These discovered hostnames get added to the scan queue automatically. This creates a recursive discovery process: you start with an IP, find three hostnames through reverse DNS, scan those, perhaps discover more through wordlist matching, then use those as seeds for further discovery. It's particularly powerful in cloud environments where reverse DNS often reveals internal naming conventions you can exploit.
Pivoting support demonstrates VHostScan's sophistication for real-world penetration testing scenarios. The --real-port flag exists because of RFC2616's Host header specification: the header can include a port number, but that port doesn't have to match where you're actually connecting. This matters when you've pivoted through an SSH tunnel:
# You've tunneled the target's port 80 to your local 8080
ssh -L 8080:internal-server:80 pivot-host
# Scan through the tunnel, but tell VHostScan the real port
python VHostScan.py -t localhost -p 8080 --real-port 80 -w wordlist.txt
VHostScan connects to localhost:8080 but puts 'hostname:80' in the Host header, matching what the internal server expects. Without this feature, you'd need to modify the application itself or use complex proxy configurations.
The output format flexibility shows the tool was built by someone who actually chains pentesting tools together. The -oG flag produces grepable output, one finding per line, perfect for piping to other tools. The -oJ JSON output includes all response metadata—status codes, content lengths, similarity ratios—letting you build custom analysis pipelines. You can feed targets via STDIN, which means you can chain it after other discovery tools in a single pipeline without touching the filesystem.
Gotcha
VHostScan's biggest limitation is speed. The codebase doesn't implement concurrent requests—it's a sequential scanner that makes one HTTP request, waits for the response, compares it, then moves to the next entry. On a wordlist with 10,000 entries, even with fast response times, you're looking at significant scan duration. The --rate-limit flag exists to slow things down further when you need stealth, but there's no --threads flag to speed things up. For large-scale scanning, tools like ffuf or gobuster will finish in a fraction of the time, though you'll sacrifice VHostScan's sophisticated catch-all detection.
The fuzzy matching, while powerful, requires manual tuning that can be frustrating. The default similarity threshold works for straightforward cases, but highly dynamic applications—think single-page apps that fetch different content via JavaScript, or load balancers that inject varied headers—can produce both false positives and false negatives. You'll find yourself adjusting the threshold, re-running scans, and manually reviewing results. The --unique-depth parameter controls how much of the response to analyze (just headers, first N lines, full content), but finding the right value requires understanding your target's behavior. There's no automatic calibration—you need to experiment.
WAFs and rate limiting will shut you down quickly. Because VHostScan makes sequential requests from the same IP to the same target with varying Host headers, it creates an obvious signature. Cloud WAFs like Cloudflare or AWS WAF detect this pattern easily and will rate-limit or block you after a few hundred requests. The tool has no built-in evasion techniques, no user-agent rotation, no request jitter beyond the basic rate limit. For bug bounty programs with robust infrastructure protection, you might get a few results before being blocked, making it better suited for CTF environments or internal network assessments where defensive measures are minimal.
Verdict
Use if: You're performing penetration tests or CTF challenges where you need to discover virtual hosts on servers with catch-all configurations or dynamic default pages. VHostScan excels in OSCP lab environments, HackTheBox machines, and internal corporate networks where defensive measures are light and you need recursive discovery through reverse DNS lookups. It's ideal when you're pivoting through compromised hosts and need the real-port functionality, or when you're chaining tools together and need flexible input/output formats. Choose this when accuracy matters more than speed and you're willing to tune fuzzy-matching parameters. Skip if: You need high-speed scanning of large wordlists—gobuster or ffuf will be 10-100x faster with multi-threading. Avoid it for bug bounty programs on infrastructure with WAFs, where the sequential scanning pattern will get you blocked immediately. Don't use this for basic subdomain enumeration where passive DNS tools like subfinder or certificate transparency searches would work better. If your target doesn't have catch-all configurations and you just need to test a few obvious hostnames, the sophisticated detection logic is overkill—curl in a bash loop would suffice.