Back to Articles

AWSBucketDump: How a 400-Line Python Script Became a Bug Bounty Hunter's Secret Weapon

[ View on GitHub ]

AWSBucketDump: How a 400-Line Python Script Became a Bug Bounty Hunter’s Secret Weapon

Hook

Over 1,400 developers have starred a tool that does one thing extraordinarily well: find the AWS S3 buckets companies forget to secure, then grab everything interesting inside them.

Context

AWS S3 buckets are the internet’s filing cabinets, storing everything from static website assets to database backups. The problem? S3’s security model is notoriously confusing. A single misconfigured permission can expose gigabytes of sensitive data to anyone who knows the bucket’s name. For years, security researchers manually tested potential bucket names by constructing URLs and checking HTTP responses. This worked, but it was tedious and didn’t scale. Bug bounty hunters needed automation that could test thousands of potential names, identify accessible buckets, filter for valuable files, and optionally download evidence—all without requiring AWS credentials or complex SDK integration.

AWSBucketDump emerged as a purpose-built solution for this exact workflow. Created by Jordan Potti and refined by multiple contributors, it strips away everything unnecessary and focuses on speed and simplicity. Unlike comprehensive cloud security frameworks that audit dozens of AWS services, this tool does three things: enumerate buckets via brute-force wordlist attacks, grep bucket contents for interesting keywords, and optionally download matching files. Its 1,458 stars reflect its staying power in the security community, particularly among penetration testers who need results during time-boxed engagements and bug bounty hunters racing against other researchers.

Technical Insight

Retrieval

Filtering

Discovery

Bucket names

HTTP requests

XML responses

Bucket listings

Search terms

Matched files

Under threshold

Over threshold

Multi-threaded

Organized output

Wordlist Input

Multi-threaded Enumeration

S3 Endpoints

xmltodict Parser

Keyword Filter

Keyword File

Size Check

Download Queue

Skip

File Downloads

Local Storage

System architecture — auto-generated

AWSBucketDump’s architecture is refreshingly straightforward: it makes HTTP requests to S3 endpoints, parses XML responses using libraries like xmltodict, and uses multithreading for parallelization. No AWS SDK required. The tool constructs URLs in the format http://[bucketname].s3.amazonaws.com, which triggers S3 to return bucket listings as XML if publicly readable or specific error codes if private or non-existent.

Here’s the core workflow from the README:

python AWSBucketDump.py -l BucketNames.txt -g interesting_Keywords.txt -D -m 500000 -d 1

This command takes a wordlist (BucketNames.txt), checks each potential bucket name, greps bucket contents for terms in interesting_Keywords.txt (like “password”, “credential”, “backup”), downloads matching files under 500KB (-m 500000), and creates separate directories for each discovered bucket (-d 1). The -D flag enables downloading, which the README candidly warns “might fill your hard drive up.”

The tool processes S3’s XML bucket listings and applies keyword filters and file size constraints before queuing downloads. According to the README, there are “two threads for checking buckets and two buckets for downloading” by default, preventing slow file transfers from blocking bucket discovery.

The wordlist approach deserves attention. The README explicitly points to Daniel Miessler’s SecLists for generic wordlists and jhaddix’s enumall (with recon-ng and Alt-DNS) for targeted subdomain/bucket enumeration. This matters because effective bucket names rarely follow predictable patterns. A company called “Acme Corp” might use buckets named acme-prod-backups, acmecorp-assets, acme.logs, or backup.acme.io. The tool’s value directly correlates with wordlist quality—garbage in, garbage out.

Keyword filtering is where AWSBucketDump separates itself from simple bucket enumeration scripts. The -g flag accepts a wordlist of terms to grep against filenames and paths. If your keyword list includes “sql”, “dump”, “credentials”, and “pem”, the tool automatically flags files matching these patterns. This transforms bucket enumeration from reconnaissance into targeted data discovery. During a penetration test with limited time, you can’t manually review thousands of files—keyword filtering surfaces high-value targets automatically.

The maximum file size constraint (-m) prevents accidentally downloading multi-gigabyte files. Setting this to 500000 bytes (roughly 500KB) captures configuration files, credentials, source code, and documentation while skipping bulk data. This is crucial when you’re operating from a laptop during an engagement or paying for bandwidth on a VPS.

One design choice worth highlighting: the tool requires no AWS credentials. It operates entirely via anonymous HTTP requests to public endpoints. This means you can use it without an AWS account, API keys, or IAM permissions. The downside? You only discover truly public buckets, not those with misconfigured bucket policies that might grant unintended access to authenticated AWS users. For bug bounty work focused on external attack surface, this is perfect. For internal red team assessments, you’d want AWS CLI with compromised credentials instead.

Gotcha

AWSBucketDump was created with Python 3.6 and hasn’t seen significant updates since, which means it predates newer AWS security features that organizations now commonly deploy to prevent bucket enumeration. Modern AWS environments increasingly use CloudFront distributions or other CDN configurations instead of directly exposing S3 buckets, which can reduce the effectiveness of brute-force bucket enumeration.

Rate limiting is a known uncertainty. The README candidly admits “I honestly don’t know if Amazon rate limits this, I am guessing they do to some point but I haven’t gotten around to figuring out what that limit is.” The default configuration uses two threads for checking buckets and two for downloading. For production security testing, you may need to adjust threading and add delays to avoid detection or throttling.

The tool’s simplicity also means limited error handling features. Downloads happen without built-in validation, and there’s no apparent resume capability if operations are interrupted. The README warns that using the download feature “might fill your hard drive up,” suggesting you should carefully set the maximum file size parameter (-m) to avoid storage issues.

The dependency on wordlist quality is critical. As the README notes, the included example wordlists haven’t had “much time” put into refining them. For effective results, you’ll need high-quality, target-specific wordlists, which may require additional reconnaissance work using tools like enumall and Alt-DNS mentioned in the README.

Verdict

Use AWSBucketDump if you’re conducting authorized security assessments with tight deadlines and need quick wins. It excels during bug bounty reconnaissance when you want to test thousands of potential bucket names against a target organization, or during penetration tests where identifying publicly exposed data is a checklist item. The tool’s simplicity is its strength for one-off engagements and learning S3 security basics. Skip it if you’re performing defensive security work—AWS-native tools provide better visibility for your own infrastructure. The tool’s age (Python 3.6 era) and acknowledged gaps around rate limiting mean you should understand its limitations before deploying it in sensitive contexts. For authorized testing where speed matters more than stealth, AWSBucketDump remains a focused, practical option that does exactly what it claims: enumerate S3 buckets and find interesting files.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/jordanpotti-awsbucketdump.svg)](https://starlog.is/api/badge-click/cybersecurity/jordanpotti-awsbucketdump)