Building a Pastebin Breach Monitor with 50 Lines of Shell Script

Hook

While security vendors charge thousands for breach monitoring, a 50-line Bash script can tell you if your company's credentials are bleeding onto Pastebin—often before your official security tools catch it.

Context

Pastebin has become the de facto dumping ground for leaked credentials, database dumps, and stolen data. When attackers breach a system, they frequently paste their haul publicly to brag, share with collaborators, or sell to interested buyers. The problem? With millions of pastes created daily, finding your organization's leaked data is like searching for a needle in an ever-growing haystack.

Traditional approaches involve either paying for commercial breach monitoring services or building complex scrapers that constantly poll Pastebin's API, parse content, and analyze text for patterns. Both approaches have problems: commercial services are expensive and slow to alert, while building your own scraper means dealing with rate limits, IP bans, parsing challenges, and infrastructure costs. The pastebin-scraper tool takes a third path: leverage psbdmp.ws, a third-party service that's already doing the heavy lifting of monitoring Pastebin 24/7, and wrap it in a dead-simple shell script that security teams can audit, modify, and trust in under five minutes.

Technical Insight

System architecture — auto-generated

At its core, pastebin-scraper is an exercise in minimalism. The entire tool is a Bash script that makes HTTP requests to the psbdmp.ws API, parses JSON responses with jq, and organizes results into a directory structure. This architectural decision—building a thin wrapper rather than a full application—is what gives it both power and portability.

The entry point is straightforward. When you run ./pastebin-scraper.sh -d example.com, the script constructs a request to psbdmp.ws's domain search endpoint. Here's the core logic:

if [ "$DOMAIN" ]; then
    echo "[+] Searching for domain: $DOMAIN"
    OUTPUT_DIR="./output/domain_${DOMAIN}"
    mkdir -p "$OUTPUT_DIR"
    
    curl -s "https://psbdmp.ws/api/v3/search/$DOMAIN" | \
        tee "${OUTPUT_DIR}/raw_response.json" | \
        jq -r '.data[]? | "https://pastebin.com/\(.id)"' > "${OUTPUT_DIR}/paste_urls.txt"
    
    echo "[+] Found $(wc -l < ${OUTPUT_DIR}/paste_urls.txt) pastes"
fi

This snippet reveals several architectural choices worth unpacking. First, the use of curl -s keeps things silent—no progress bars or verbose output—which makes the tool pipeline-friendly. The tee command is clever: it saves the raw JSON response for later inspection while simultaneously piping it to jq for parsing. This dual-output strategy means you can re-process results without hitting the API again, respecting rate limits and preserving evidence.

The jq expression .data[]? | "https://pastebin.com/\(.id)" demonstrates why jq is the perfect companion for shell-based API clients. The ? operator makes the iterator fail gracefully if the data array doesn't exist, preventing the script from crashing on empty results or API errors. The string interpolation constructs full Pastebin URLs, which is immediately actionable—you can pipe the output directly to wget, curl, or a browser automation tool.

The directory structure decision is equally pragmatic. By creating separate directories for each search type and target (domain_example.com, email_admin@example.com), the tool naturally organizes results for repeated searches. If you're monitoring multiple domains in a bug bounty program or red team engagement, you can quickly see which targets have the most exposure:

./output/
├── domain_acme-corp.com/
│   ├── raw_response.json
│   └── paste_urls.txt (143 URLs)
├── domain_competitor.io/
│   ├── raw_response.json
│   └── paste_urls.txt (12 URLs)
└── email_admin@target.com/
    ├── raw_response.json
    └── paste_urls.txt (7 URLs)

What makes this approach powerful isn't sophistication—it's composability. Because the tool outputs newline-delimited URLs in plain text, you can immediately pipe results into other Unix tools. Want to fetch all paste contents? cat output/domain_*/paste_urls.txt | xargs -n1 curl -s > all_pastes.txt. Need to check if specific keywords appear? grep -i 'password\|api_key' all_pastes.txt. The tool doesn't try to be everything; it does one thing well and lets you build your own workflow around it.

The dependency on psbdmp.ws is both the tool's greatest strength and its Achilles heel. By offloading the hard work—continuous Pastebin monitoring, text extraction, indexing, and search—to a third-party service, the script remains trivial. You're not parsing HTML, handling CAPTCHAs, or managing API tokens. But you're also completely at the mercy of psbdmp.ws's availability and API stability. The trade-off is explicit: maximum simplicity in exchange for external dependency risk.

Gotcha

The tool's minimalism comes with real operational limitations that will bite you in production scenarios. There's zero error handling for API failures—if psbdmp.ws returns a 500 error, rate limits you, or changes its JSON structure, the script silently fails or produces corrupt output. You won't get alerts, retries, or fallback mechanisms. I tested this by intentionally mangling the API endpoint, and the script simply created empty output files without any warning.

Rate limiting is another blind spot. The script has no delay between requests, so if you're searching multiple domains in a loop, you'll likely hit psbdmp.ws's rate limits and get blocked. There's no exponential backoff, no request queuing, and no way to resume from where you left off. For ad-hoc searches of 1-3 targets, this is fine. For continuous monitoring or scanning dozens of domains, you'll need to wrap the script in your own rate-limiting logic or risk getting your IP temporarily banned. The tool also provides no content analysis—it gives you URLs, but you still need to manually review each paste or build additional tooling to extract credentials, classify sensitivity, or deduplicate results. If you're hunting for specific credential patterns across hundreds of pastes, expect significant manual effort or additional scripting.

Verdict

Use if: You need quick reconnaissance for specific domains or email addresses during bug bounty hunting, red team engagements, or incident response. The tool excels at one-off searches where you value speed, transparency, and minimal setup over comprehensive features. It's also perfect if you're building your own breach monitoring pipeline and want a simple component you can audit, modify, and integrate with other Unix tools. The shell script approach means you can read the entire codebase in under two minutes and trust exactly what it's doing—crucial for security work.

Skip if: You need production-grade reliability, automated monitoring, or large-scale searches. Without error handling, rate limiting, or content analysis, this tool will frustrate you in any scenario requiring robustness. If you're a security team needing continuous monitoring with alerts, deduplication, and automated credential extraction, invest in proper tooling like h8mail, commercial breach databases, or build a more sophisticated solution with retry logic and state management. Also skip if psbdmp.ws's coverage doesn't meet your needs—the tool is useless if the upstream service doesn't index the data sources you care about.

Building a Pastebin Breach Monitor with 50 Lines of Shell Script

Building a Pastebin Breach Monitor with 50 Lines of Shell Script

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

Building a Pastebin Breach Monitor with 50 Lines of Shell Script

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

Inside awesome-selfhosted: How a 292K-Star GitHub List Became the Self-Hosting Movement's Central Nervous System

Free-AI-Social-Media-Scheduler: A 2,000-Star Repository With Zero Lines of Code

jam-nodes: Type-Safe Workflow Nodes That Stop Before They Become an Orchestrator

Puppeteer: How Chrome's DevTools Protocol Became the Standard for Browser Automation

// CODEBASE INTELLIGENCE

Best for

Skip when