Back to Articles

Hunting .DS_Store Files at Scale: How internetwache's Scanner Exploits macOS Metadata Leaks

[ View on GitHub ]

Hunting .DS_Store Files at Scale: How internetwache's Scanner Exploits macOS Metadata Leaks

Hook

Every time a macOS developer deploys to production, there's a chance they're accidentally publishing a complete map of their server's file structure—and it's sitting at a predictable URL.

Context

When macOS creates or accesses a folder, the Finder automatically generates a .DS_Store file to store metadata like icon positions, view settings, and—critically—a list of all files in that directory. These hidden files are benign on a local machine, but they become an information disclosure goldmine when accidentally deployed to web servers. A single .DS_Store file can reveal API keys in filenames, backup files, admin panels, and entire directory structures that developers assumed were hidden.

Traditional security scanners focus on known vulnerabilities or common misconfigurations, but .DS_Store exposure falls into a unique category: it's not a software bug, but a deployment hygiene issue that persists because many developers don't realize macOS is creating these files. Manual discovery is tedious—you'd need to check every domain and subdirectory individually. This is where ds_storescanner enters: a purpose-built Go tool designed to scan thousands of domains concurrently, automatically parse the binary .DS_Store format, and recursively follow discovered directories to map out entire hidden file structures.

Technical Insight

GET /.DS_Store

200 OK

404/Error

Extract filenames

Recursive enabled

Depth < Max

Domain List File

Domain Queue

Worker Pool

Goroutines

HTTP/HTTPS Prober

Web Servers

.DS_Store Binary Parser

Results Output

Subdirectory Scanner

Found Files Report

System architecture — auto-generated

The scanner's architecture centers on three core components: a concurrent domain processor, an HTTP client with smart probing logic, and a .DS_Store binary parser. Let's break down how each works.

The concurrency model uses Go's goroutines with a configurable worker pool. When you feed it a list of domains, it doesn't just loop through sequentially—it spawns multiple workers that process domains in parallel. Here's how the core scanning logic operates:

// Simplified representation of the scanning pattern
func scanDomain(domain string, recursive bool, depth int) {
    urls := []string{
        fmt.Sprintf("http://%s/.DS_Store", domain),
        fmt.Sprintf("https://%s/.DS_Store", domain),
    }
    
    for _, url := range urls {
        resp, err := client.Get(url)
        if err != nil || resp.StatusCode != 200 {
            continue
        }
        
        // Parse .DS_Store binary format
        files := parseDSStore(resp.Body)
        fmt.Printf("[+] Found .DS_Store at %s\n", url)
        
        for _, file := range files {
            fmt.Printf("  - %s\n", file)
            
            if recursive && depth < maxDepth {
                newURL := fmt.Sprintf("%s/%s", url, file)
                scanDomain(newURL, recursive, depth+1)
            }
        }
    }
}

The tool's intelligence lies in how it handles the .DS_Store binary format. These aren't text files—they're binary property lists with a specific structure. The scanner implements a parser that reads the Allocator format (the internal structure used by .DS_Store), extracting filenames from the binary blob. This means it can take a 12KB binary file and extract a list like ['config.php.bak', 'admin_login.php', '.env.production'] without manual intervention.

What makes this particularly effective for large-scale scanning is the HEAD request optimization. Before downloading and parsing the entire .DS_Store file, the scanner can send a HEAD request to verify the file exists. This reduces bandwidth usage dramatically—instead of downloading potentially thousands of 404 responses, you're only pulling down confirmed hits. The HTTP client is configured with reasonable timeouts (typically 10 seconds) to prevent hanging on unresponsive servers, critical when scanning tens of thousands of domains.

The recursive scanning capability is where things get interesting from an attacker's perspective. When the scanner finds a .DS_Store file at example.com/.DS_Store and discovers a directory named api in the file list, it automatically queues example.com/api/.DS_Store for scanning. This continues up to a configurable depth (default 7 levels), effectively mapping out the entire exposed directory tree. In practice, this can reveal staging environments, version control directories accidentally left in production, or database backup folders that weren't in robots.txt.

The tool's command-line interface keeps things simple but flexible. You can control thread count (essential for avoiding detection or managing resource usage), toggle recursive mode, enable verbose output for debugging, and pipe in domain lists from stdin. This Unix philosophy approach makes it trivial to chain with other tools—grab the Alexa top million, filter for specific TLDs, and pipe directly into the scanner.

Gotcha

The scanner's simplicity is both its strength and its Achilles' heel. There's no built-in rate limiting, which means running it against a large target list at full concurrency will absolutely light up intrusion detection systems. If you're scanning production infrastructure at scale, you're likely to trigger alerts or get your IP banned before completing a comprehensive scan. The tool assumes you're either operating in a controlled environment or that you don't care about stealth—fine for bug bounty programs with broad scopes, problematic for more sensitive engagements.

Error handling is minimal. When a connection times out or a server returns a non-standard response, the scanner simply moves on. There's no retry logic, no exponential backoff, and no differentiation between 'server temporarily unavailable' and 'file doesn't exist.' This means your results will have gaps—potentially significant ones if you're scanning during peak traffic hours or across unreliable network conditions. The verbose mode helps with debugging, but it's still a fire-and-forget tool rather than a production-grade security scanner. Additionally, the .DS_Store parser, while functional, doesn't handle every edge case in the format specification. Corrupted files or non-standard implementations might cause parsing failures that get silently swallowed.

Verdict

Use if: You're conducting large-scale reconnaissance for bug bounty programs, need a fast way to audit multiple domains for .DS_Store exposure, or want a lightweight single-binary tool that integrates easily into existing security pipelines. It's particularly valuable when combined with subdomain enumeration tools—discover all subdomains of a target, then pipe them into ds_storescanner for comprehensive coverage. Skip if: You need production-grade reliability with retry logic and detailed logging, require output in structured formats like JSON for integration with vulnerability management platforms, need rate limiting for stealthy scanning, or want a more comprehensive web fuzzer that can check for .DS_Store files alongside other sensitive exposures (tools like ffuf or nuclei would be better choices). Also skip if you're expecting active maintenance—the repository hasn't seen significant updates, so compatibility with modern Go versions or new .DS_Store format variations isn't guaranteed.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/automation/internetwache-ds-storescanner.svg)](https://starlog.is/api/badge-click/automation/internetwache-ds-storescanner)