Testing Your Security Crawler Against Google’s Maze: A Deep Dive into security-crawl-maze
Hook
Most security crawlers discover less than 70% of the resources in a typical web application. Google built a maze to prove it—and to help you fix it.
Context
Web security scanners face a fundamental problem: you can’t test vulnerabilities in code paths you never discover. Traditional web crawlers built for search engines prioritize visible content and user-navigable links, but security scanners need something different entirely. They must find every possible resource reference—hidden forms, dynamically loaded scripts, edge-case HTML attributes, framework-specific routing patterns—because attackers will.
Before security-crawl-maze, teams building security tools had no standardized way to measure crawler completeness. Inspired by cure53’s HTTPLeaks research, which demonstrated how browsers leak information through obscure resource loading mechanisms, Google formalized the problem into a comprehensive testbed. The maze contains hundreds of HTML documents, each demonstrating a different method of linking resources, from the obvious () to the arcane (HTTP Link headers, framework-specific templating). Every discoverable resource ends with a ‘.found’ suffix, providing ground truth for validation. This isn’t about finding vulnerabilities—it’s about ensuring your crawler sees the entire attack surface before vulnerability detection even begins.
Technical Insight
Security Crawl Maze uses a Flask-based architecture where the directory structure itself documents what’s being tested. The genius lies in its self-describing organization: a test case at html/body/script/src.html tests the src attribute of the <script> tag within the <body> element. This hierarchical naming convention makes the codebase immediately navigable and eliminates ambiguity about what each test validates.
The Flask application serves static HTML files alongside an API endpoint that returns expected results in JSON format. When you run your crawler against the maze, you can programmatically validate completeness by comparing discovered resources against the ground truth:
import requests
import json
# Run your crawler against the maze
crawler_results = your_crawler.crawl('http://localhost:8080')
discovered_urls = set(crawler_results.get_urls())
# Fetch ground truth from the maze API
response = requests.get('http://localhost:8080/fetch-expected-results')
expected_results = response.json()
expected_urls = set(expected_results['expected_resources'])
# Calculate coverage
missed_resources = expected_urls - discovered_urls
coverage_percent = (len(discovered_urls) / len(expected_urls)) * 100
print(f"Coverage: {coverage_percent:.2f}%")
print(f"Missed {len(missed_resources)} resources:")
for url in sorted(missed_resources):
print(f" - {url}")
The test cases cover scenarios that regularly trip up even sophisticated crawlers. Consider the difference between static and dynamic framework content. A static test case for Angular might look like:
<!-- html/frameworks/angular/ng-href.html -->
<html ng-app>
<head>
<script src="https://ajax.googleapis.com/ajax/libs/angularjs/1.6.9/angular.min.js"></script>
</head>
<body>
<div ng-controller="LinkController">
<a ng-href="{{dynamicUrl}}">Dynamic Link</a>
</div>
<script>
angular.module('myApp', [])
.controller('LinkController', function($scope) {
$scope.dynamicUrl = '/angular-resource.found';
});
</script>
</body>
</html>
A naive crawler that doesn’t execute JavaScript will never discover /angular-resource.found. Even JavaScript-aware crawlers might fail if they don’t properly wait for Angular’s digest cycle or understand framework-specific templating syntax.
The Docker containerization process is particularly interesting. During build, the Dockerfile runs scripts that generate framework-specific test cases dynamically, compiling templates and transpiling modern JavaScript. This approach keeps the repository clean (storing templates rather than generated code) while ensuring the served content represents real-world framework behavior:
# Simplified excerpt from the build process
RUN python scripts/build_angular_tests.py && \
python scripts/build_polymer_tests.py && \
python scripts/compile_templates.py
The maze also tests HTTP header-based resource hints, which many crawlers ignore entirely. For example, one test case returns a response with Link: </header-resource.found>; rel=preload, testing whether the crawler recognizes resource hints outside HTML content. This reflects real-world scenarios where applications use HTTP/2 server push or preload hints for performance optimization—attack surface that exists outside the DOM.
Gotcha
The biggest limitation is that JavaScript framework test cases only work in the Docker containerized version. If you run the Flask app directly (which is simpler for development and debugging), you’ll only test static HTML scenarios. This creates a workflow friction: quick iteration requires the standalone Flask app, but comprehensive testing demands the full Docker build, which takes several minutes. For teams wanting to extend the maze with custom test cases, this split creates a decision point about whether new tests should work in both modes or accept the Docker dependency.
The maze also has a philosophical constraint worth understanding: it’s designed exclusively for code coverage, not content understanding or vulnerability detection. It won’t help you test whether your scanner can distinguish between a login form and a search form, or whether it properly handles authentication flows. You’ll discover that a resource exists, but testing what your scanner does with that resource requires different tools. Additionally, the relatively low star count (165) suggests this is a specialized tool with a small community. Don’t expect extensive third-party documentation, plugin ecosystems, or active community contributions. It’s a Google-maintained testing tool, not a community-driven project, which means updates follow Google’s internal priorities rather than community requests.
Verdict
Use if: You’re building or maintaining a web security scanner, DAST tool, or penetration testing framework where comprehensive resource discovery is critical. Use it if you need a standardized, reproducible benchmark to compare crawler implementations or prove coverage improvements. It’s essential if you’re dealing with modern JavaScript frameworks and need to validate that your crawler handles dynamic content correctly. Skip if: You’re building a traditional content-focused web crawler for search or archival purposes—the maze tests resource discovery, not content extraction or relevance. Skip it if you need a training environment for learning web vulnerabilities (use DVWA or WebGoat instead) or if you’re looking for runtime vulnerability detection rather than crawler path coverage. Also skip if you’re not prepared to invest in either a Docker-based testing workflow or acceptance that you’ll only test static HTML scenarios.