Dufflebag: How Bishop Fox Built a Distributed Secret Scanner for Public AWS Snapshots

Hook

Every day, AWS users accidentally make their EBS snapshots public—and with them, database credentials, API keys, and SSH private keys become searchable by anyone who knows where to look.

Context

Cloud misconfigurations are the gift that keeps on giving for security researchers and threat actors alike. One particularly insidious misconfiguration is the public EBS snapshot: a point-in-time copy of an EC2 instance's storage that users sometimes expose publicly, either through accident or misunderstanding of AWS's sharing model. These snapshots can contain complete filesystem images with application code, configuration files, database backups, and—most critically—secrets that developers never intended to expose.

The challenge isn't finding public snapshots (AWS makes enumeration trivial), but systematically processing them at scale. Each snapshot must be cloned to your account, converted to a volume, attached to a running EC2 instance, mounted with the correct filesystem type, recursively scanned for interesting files, and then cleaned up to avoid runaway costs. Doing this manually for even a dozen snapshots is tedious; doing it for thousands requires automation. Bishop Fox's Dufflebag solves this operational nightmare by treating EBS snapshot scanning as a distributed work queue problem, leveraging AWS Elastic Beanstalk's worker tier to parallelize the entire pipeline.

Technical Insight

System architecture — auto-generated

Dufflebag's architecture is elegant in its use of AWS-native services to handle the heavy lifting. At its core, it's an Elastic Beanstalk worker application written in Go that processes messages from an SQS queue. Each message represents a single EBS snapshot ID to scan. The worker tier automatically scales EC2 instances based on queue depth, meaning you can throw 10,000 snapshot IDs into the queue and Elastic Beanstalk will spin up workers to process them in parallel, subject to your AWS service limits.

The worker's main job is deceptively complex. When it receives a snapshot ID, it uses the AWS SDK to create a volume from that snapshot in the same availability zone as the worker instance, waits for the volume to become available, attaches it to the instance via the EC2 API, and then mounts it to the local filesystem. Here's where it gets interesting: Dufflebag doesn't scan every file. Instead, it implements a sophisticated filtering system to avoid the billions of uninteresting system files typical in OS filesystems.

The filtering logic uses three blacklist types defined in the code:

type FileBlacklist struct {
    Exact    map[string]struct{}
    Contains []string
    Prefix   []string
}

func (fb *FileBlacklist) IsBlacklisted(path string) bool {
    filename := filepath.Base(path)
    
    // Check exact match
    if _, exists := fb.Exact[filename]; exists {
        return true
    }
    
    // Check contains
    for _, pattern := range fb.Contains {
        if strings.Contains(filename, pattern) {
            return true
        }
    }
    
    // Check prefix
    for _, pattern := range fb.Prefix {
        if strings.HasPrefix(filename, pattern) {
            return true
        }
    }
    
    return false
}

This blacklist approach is critical for performance. Without it, you'd waste time scanning system binaries, libraries, and other files that will never contain secrets. The blacklist includes obvious culprits like .so files, /usr/bin/* executables, and common system directories. What passes through the filter gets scanned with regex patterns for common secret formats: AWS credentials, private keys, database connection strings, and API tokens.

When interesting content is found, Dufflebag uploads it to S3 with a clever naming scheme: <snapshot-id>/<blake3-hash-of-content>/<original-path>. The blake3 hash serves dual purposes—it's extremely fast to compute (important when processing gigabytes of files), and it provides automatic deduplication. If the same secret appears in multiple volumes, you'll only store it once per snapshot, making result analysis more manageable.

The worker architecture also handles cleanup gracefully. After scanning completes (or fails), the volume is unmounted, detached, and deleted to avoid ongoing EBS storage costs. The SQS message is deleted only after successful completion, meaning failed scans can be retried automatically by Elastic Beanstalk's dead-letter queue mechanism.

One subtle but important design choice: Dufflebag defaults to a 20-snapshot limit on the first run. This is implemented as a simple counter in the queue population logic, not as a runtime safety valve. It's a recognition that spinning up dozens of EC2 instances to process thousands of EBS volumes can rack up AWS bills quickly, and first-time users should test with a small batch before going all-in on large-scale reconnaissance.

Gotcha

Dufflebag's biggest limitation is regional constraint. EBS snapshots are region-specific resources, and Dufflebag only operates in the AWS region where you deploy it. If you want comprehensive coverage of public snapshots across all AWS regions, you'll need to deploy separate Elastic Beanstalk environments in each region, manage multiple SQS queues, and consolidate S3 results yourself. There's no built-in orchestration for multi-region deployments, which is a significant operational gap for thorough security research.

Cost management is another real concern. While the 20-snapshot default limit protects you initially, there's no runtime cost ceiling or automatic teardown. If you populate the queue with thousands of snapshots and walk away, Elastic Beanstalk will happily scale workers and churn through them until your queue is empty or you hit AWS service limits. You need to monitor CloudWatch metrics and manually scale down or delete the environment when you're done. The repository includes no Terraform modules or CloudFormation templates for lifecycle management, so you're on your own for cost controls beyond manual AWS console vigilance. Additionally, the secret detection patterns are hardcoded in Go source, so customizing what you're searching for requires modifying the code and redeploying—there's no runtime configuration file for search patterns.

Verdict

Use if: You're conducting offensive security research or red team assessments on AWS infrastructure and need to systematically scan public EBS snapshots for exposed credentials at scale. Dufflebag handles the complex orchestration of volume mounting, parallel processing, and cleanup that would otherwise require weeks of custom development. It's purpose-built for this reconnaissance task and does it well. Skip if: You need multi-region scanning without managing separate deployments, want configurable search patterns without code changes, or aren't prepared to actively monitor AWS costs during operation. This is a specialized offensive tool for security researchers who understand AWS billing and are comfortable with the operational overhead—not a turnkey solution for routine security auditing of your own infrastructure.

Dufflebag: How Bishop Fox Built a Distributed Secret Scanner for Public AWS Snapshots

Dufflebag: How Bishop Fox Built a Distributed Secret Scanner for Public AWS Snapshots

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Dufflebag: How Bishop Fox Built a Distributed Secret Scanner for Public AWS Snapshots

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

Glow: Why Rendering Markdown in the Terminal Shouldn't Require a Browser

Caldera: When Your Red Team Needs a Planning Algorithm, Not Just Another C2

Caldera: Building Adversary Emulation with Fact-Based Planning Engines

Inside Mathias Bynens' Dotfiles: The Blueprint for 30,000 macOS Developer Environments

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]