Back to Articles

subjs: How a 300-Line Go Tool Became Essential in the Bug Bounty Toolkit

[ View on GitHub ]

subjs: How a 300-Line Go Tool Became Essential in the Bug Bounty Toolkit

Hook

While most developers worry about JavaScript bundle size, security researchers have the opposite problem: they need to find every single JavaScript file across thousands of web pages as quickly as possible.

Context

In the bug bounty and penetration testing world, JavaScript files are gold mines. They contain API endpoints that aren't linked anywhere, hardcoded secrets that developers forgot to remove, internal URLs that reveal infrastructure, and logic flaws that expose vulnerabilities. But modern web applications can reference dozens of JavaScript files across thousands of pages, and manually cataloging them is impossible at scale.

Before tools like subjs, security researchers either built custom scrapers for each engagement or used general-purpose web crawlers that were too slow, too comprehensive, or too resource-intensive for this specific task. They needed something that sat between simple grep commands and full-featured web spiders—a tool that could take a list of 10,000 URLs, extract every JavaScript reference in minutes, and feed that output to analysis tools. That's the precise gap subjs fills: ultra-fast JavaScript file extraction as a pipeline component, not a standalone solution.

Technical Insight

Configuration

HTML response

script src tags

inline scripts

URL Input

stdin/file

URL Queue

buffered channel

Worker Pool

20 goroutines

HTTP Client

concurrent requests

HTML Tokenizer

golang.org/x/net/html

JS Extractor

URL Resolver

resolve relative paths

stdout

JS file URLs

timeout

user-agent

workers

System architecture — auto-generated

The beauty of subjs lies in what it doesn't do. Instead of trying to be a complete reconnaissance framework, it implements a classic concurrent worker pool pattern in Go to solve one problem exceptionally well. When you pipe URLs into subjs, it spawns a configurable number of goroutines (default: 20) that pull from a shared channel, fetch pages concurrently, parse HTML for script tags, and write results to stdout.

Here's how you'd typically chain it in a reconnaissance workflow:

# Discover all URLs for a target domain
echo "example.com" | gau | \
# Extract JavaScript files from those URLs
subjs --timeout 10 | \
# Remove duplicates
sort -u | \
# Analyze JS files for endpoints and secrets
while read url; do 
  curl -s "$url" | grep -oP '([a-zA-Z0-9_-]+\.)?[a-zA-Z0-9_-]+\.[a-z]+/[a-zA-Z0-9/_-]+'
done

The worker pool architecture is straightforward but effective. Each worker goroutine operates independently, making HTTP requests with a shared client that has timeout controls and custom user-agent strings. When a page is fetched, subjs uses Go's golang.org/x/net/html tokenizer to parse HTML without loading the entire DOM into memory—a critical optimization when processing thousands of pages.

The tool identifies JavaScript files through multiple extraction methods. It captures explicit script tag sources (<script src="/app.js">), inline script content that might contain dynamic imports, and relative URLs that it resolves against the base URL. This multi-pronged approach ensures you don't miss JavaScript loaded through less obvious means.

What makes subjs particularly suited for security work is its deliberate simplicity. It doesn't attempt to execute JavaScript, follow redirects indefinitely, or build a complex state machine. It makes one request per URL, extracts script references, and moves on. This design choice means it won't get stuck in infinite redirect loops, won't trigger rate limiting by making hundreds of requests per domain, and won't consume excessive memory by maintaining large state objects.

The concurrency model also demonstrates Go's strengths in I/O-bound operations. Since most of the time is spent waiting for network responses, having 20+ goroutines doesn't create CPU pressure—it just ensures that while one request is waiting for a response, others are being processed. This is why subjs written in Go outperforms equivalent Python or Bash scripts by orders of magnitude when processing large URL lists.

One subtle but important feature is the configurable timeout flag. In security reconnaissance, you're often dealing with targets that might be slow, misconfigured, or intentionally throttling responses. A 10-second timeout means you won't waste time on hanging requests, but you also won't miss JavaScript from legitimate but slow-loading pages. This balance is crucial when processing mixed-quality URL lists from automated discovery tools.

Gotcha

The biggest limitation of subjs is that it does zero deduplication. If the same JavaScript file is referenced across 500 pages in your URL list, you'll get 500 identical output lines. For large-scale operations, this means your pipeline must include sort -u or similar deduplication, and you need to be mindful of disk space when redirecting output to files. While this seems like an oversight, it's actually a deliberate Unix philosophy choice—let downstream tools handle deduplication so users can decide how to handle duplicates (maybe they want frequency counts, or they care about which page referenced which script).

Another significant constraint is the lack of JavaScript execution. Modern single-page applications often load additional scripts dynamically through JavaScript itself—webpack chunks loaded on-demand, dynamic imports, or scripts injected by framework code. Since subjs only parses the initial HTML response, it misses these dynamically-loaded scripts entirely. If you're targeting a React or Vue application with code-splitting, you'll need a headless browser solution like gospider (which uses chromedp) to catch everything. This makes subjs less suitable for comprehensive application mapping and more suited for rapid initial reconnaissance where speed matters more than completeness.

Error handling is also minimal by design. Failed requests are silently skipped with no retry logic, no logging of which URLs failed, and no differentiation between network errors, timeouts, and HTTP error codes. In a 10,000-URL input list, you won't know which 200 URLs failed without implementing your own logging wrapper. For production security workflows, you'd want to pipe stderr to a log file and potentially re-run failed URLs separately.

Verdict

Use if: You're performing web application reconnaissance at scale, need to extract JavaScript files from hundreds or thousands of URLs quickly, and already have a pipeline workflow with separate tools for URL discovery and JavaScript analysis. It's perfect for bug bounty hunters who need speed over completeness and are comfortable with Unix pipeline composition. Also use it if you're building security automation where a lightweight, single-purpose tool fits better than a heavyweight framework. Skip if: You need comprehensive JavaScript discovery including dynamically-loaded scripts (use gospider instead), you're working with small target lists where manual inspection is feasible, you want built-in deduplication and analysis features in one tool, or you need detailed error reporting and retry logic for production environments. Also skip if you're not already familiar with security reconnaissance workflows—subjs assumes you understand where it fits in a larger toolchain.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/lc-subjs.svg)](https://starlog.is/api/badge-click/developer-tools/lc-subjs)