S3Scanner: How a 3,000-Star Go Tool Hunts Misconfigured Cloud Storage at Scale
Hook
A single misconfigured S3 bucket exposed 540 million Facebook user records in 2019. S3Scanner exists because this keeps happening—and because checking bucket permissions across multiple cloud providers is harder than it should be.
Context
Cloud object storage has become ubiquitous infrastructure, but its default-deny security model creates a dangerous pattern: developers who don’t understand AWS IAM policies often flip buckets to public just to make their application work, then forget to lock them down. The problem multiplied when every major cloud provider launched S3-compatible APIs—suddenly security teams needed to audit buckets across AWS, Google Cloud Storage, DigitalOcean Spaces, Linode Object Storage, and a dozen other services, each with slightly different API behaviors and permission models.
Existing solutions fell short. Native tools like aws-cli and gsutil only work with their respective providers. Comprehensive cloud security platforms like CloudMapper scan entire infrastructure but are overkill when you just need to validate bucket permissions. Bug bounty hunters and penetration testers needed something laser-focused: give me a list of bucket names, tell me which ones leak data, and do it fast across every S3-compatible API. S3Scanner fills that gap as a purpose-built security tool that does one thing exceptionally well—validate bucket accessibility and enumerate permissions across the fragmented landscape of S3-compatible storage.
Technical Insight
S3Scanner’s architecture revolves around a clean separation between input sources, scanning logic, and output destinations. Written in Go for its excellent concurrency primitives, the tool spawns multiple goroutines to scan buckets in parallel while respecting configurable thread limits. The core scanning engine tests four permission scenarios for each bucket: unauthenticated read access, authenticated read access, unauthenticated write access, and authenticated write access.
The input flexibility demonstrates thoughtful design for real-world workflows. You can feed bucket names via command-line arguments for quick spot checks, from a text file for batch processing recon outputs, or—most impressively—from a RabbitMQ queue for enterprise-scale distributed scanning:
# Simple file-based scan across multiple providers
./s3scanner -bucket-file candidates.txt -provider aws,gcp,digitalocean -threads 20
# Enterprise mode: consume from RabbitMQ, persist to PostgreSQL
./s3scanner -mq rabbitmq://scanner:password@queue.internal:5672/buckets \
-db postgres://user:pass@db.internal/s3findings \
-provider all -enumerate
The provider abstraction is particularly elegant. Rather than hardcoding AWS endpoints, S3Scanner defines a provider interface that maps provider names to their S3-compatible API endpoints. AWS buckets are tested against s3.amazonaws.com, GCP buckets against storage.googleapis.com, and so on. This means adding support for a new S3-compatible service requires just adding its endpoint mapping—no architectural changes needed.
Permission enumeration reveals the tool’s security research DNA. When S3Scanner finds an accessible bucket, it doesn’t just report “public read”—it attempts to list objects, download a test file, and upload a benign test object (then immediately delete it). This produces actionable findings: “Bucket allows anonymous users to list 47,293 objects including database backups” is far more useful than “bucket is public.”
The JSON output format makes S3Scanner composable with other security tools:
{
"bucket": "company-backups",
"provider": "aws",
"region": "us-east-1",
"exists": true,
"public": true,
"unauthenticated_read": true,
"unauthenticated_write": false,
"authenticated_read": true,
"authenticated_write": false,
"enumerated_objects": 47293,
"sample_objects": ["backup-2024-01-15.sql.gz", "users.csv"]
}
This structured output pipes cleanly into Elasticsearch for dashboarding, feeds vulnerability management platforms, or triggers automated remediation workflows. The PostgreSQL persistence option takes this further—scan results accumulate in a database where you can track bucket security posture over time, query for trends, and generate compliance reports.
The RabbitMQ integration reveals S3Scanner’s enterprise-ready architecture. Rather than running one massive scan job, security teams can deploy multiple scanner instances consuming bucket names from a shared queue. One process generates candidate bucket names (perhaps from DNS recon, GitHub scraping, or certificate transparency logs), another produces those names to RabbitMQ, and a fleet of S3Scanner workers consume and process them in parallel. This horizontal scaling pattern handles the reality that organizations discover thousands of potential bucket names daily and need continuous validation rather than periodic batch scans.
Gotcha
The object enumeration feature sounds great until you point S3Scanner at a bucket containing 10 million objects. The tool will dutifully attempt to list every single one, a process that can take hours or even days depending on API rate limits. There’s no built-in pagination limit or “enumerate first N objects” option, which means aggressive enumeration can make scans impractically slow. You’ll want to use the enumerate flag selectively, perhaps only after initial scans identify interesting buckets worth deeper inspection.
More fundamentally, S3Scanner only validates bucket names you already know—it has zero bucket discovery capabilities. You need separate tooling to generate candidate bucket names through techniques like DNS enumeration, subdomain brute-forcing, GitHub repository scanning, or certificate transparency log analysis. Tools like bucket-stream excel at discovery by monitoring CT logs for newly created buckets, while S3Scanner excels at validation. This means real-world workflows require chaining multiple tools together, and S3Scanner’s documentation could better acknowledge this reality and suggest integration patterns with discovery tools.
Verdict
Use S3Scanner if you’re conducting security assessments where you have candidate bucket names to validate—bug bounty programs, penetration tests, or continuous security monitoring in multi-cloud environments. It’s particularly valuable for security teams managing infrastructure across multiple cloud providers, where native tools fall short and you need consistent bucket security validation everywhere. The RabbitMQ and PostgreSQL integrations make it production-ready for enterprise security operations that need distributed scanning at scale. Skip it if you need bucket name discovery (combine it with bucket-stream or DNS recon tools instead), if you’re doing one-off checks where AWS CLI suffices, or if you need comprehensive cloud security auditing beyond just storage buckets—in those cases, look at CloudMapper or ScoutSuite. Also skip if your target buckets contain millions of objects and you need enumeration results quickly; the lack of pagination controls will frustrate you.