Back to Articles

How jsluice Uses AST Parsing to Find URLs Hidden in JavaScript String Concatenation

[ View on GitHub ]

How jsluice Uses AST Parsing to Find URLs Hidden in JavaScript String Concatenation

Hook

When security researchers analyzed a Fortune 500's JavaScript bundles with regex-based tools, they found 47 API endpoints. When they ran jsluice on the same code, they found 312—because modern JavaScript hides URLs in string concatenation that regex simply cannot parse.

Context

JavaScript bundles are goldmines for security reconnaissance. Client-side code routinely contains API endpoints, authentication tokens, AWS keys, internal domain names, and debug flags that developers assumed would be safe once minified. For years, security researchers relied on regex-based tools like LinkFinder to extract these artifacts. The approach was simple: scan for patterns that look like URLs or secrets, report what you find.

But modern JavaScript broke this model. Frontend frameworks like React and Vue encourage dynamic URL construction through template literals and string concatenation. An API endpoint might be built from baseURL + '/api/' + version + '/users' where only the string literals are visible in static analysis. Regex sees fragments; it cannot understand syntax. It finds '/api/' but misses that it's part of a larger endpoint with query parameters constructed three lines later. The industry needed a tool that understood JavaScript as code, not as text—one that could reason about expressions, traverse syntax trees, and intelligently handle the unknowable runtime values that plague static analysis. BishopFox's jsluice was built to solve exactly this problem.

Technical Insight

The architectural leap jsluice makes is abandoning regex for abstract syntax tree (AST) traversal using go-tree-sitter, a Go binding for the Tree-sitter parsing library. When you feed JavaScript to jsluice, it doesn't scan text—it builds a complete syntax tree representing the code's structure. The analyzer then walks this tree, looking for semantically meaningful locations where URLs and secrets appear: assignments to window.location, arguments to fetch() or XMLHttpRequest.open(), object properties that match secret patterns.

Here's where it gets clever. When jsluice encounters a string concatenation expression like 'https://api.example.com/' + endpoint + '?key=' + apiKey, it can't know the runtime values of endpoint or apiKey. Instead of giving up, it substitutes the placeholder EXPR for each unknown variable. The result: https://api.example.com/EXPR?key=EXPR. This is a parseable, structurally valid URL that reveals the base domain, the path structure, and critically, the query parameter name key. Traditional regex would only extract the string literal fragments, missing entirely that this code constructs a URL with a key parameter.

Using jsluice as a library demonstrates this power:

import (
    "fmt"
    "github.com/BishopFox/jsluice"
)

func analyzeBundle(jsCode []byte) {
    analyzer := jsluice.NewAnalyzer(jsCode)
    
    // Extract URLs with context
    urls := analyzer.GetURLs()
    for _, u := range urls {
        fmt.Printf("URL: %s\n", u.URL)
        fmt.Printf("  Context: %s\n", u.Context)  // e.g., "fetch" or "location.href"
        fmt.Printf("  Method: %s\n", u.Method)    // HTTP method if detectable
        fmt.Printf("  Parameters: %v\n", u.Parameters)
    }
    
    // Find secrets with severity levels
    secrets := analyzer.GetSecrets()
    for _, s := range secrets {
        fmt.Printf("Secret: %s\n", s.Data)
        fmt.Printf("  Kind: %s\n", s.Kind)        // "bearer", "jwt", "aws-key", etc.
        fmt.Printf("  Severity: %s\n", s.Severity) // "low", "medium", "high"
    }
}

The real sophistication lies in the matcher system. Built-in matchers handle common patterns, but you can extend jsluice with custom Tree-sitter queries. These queries operate on the AST itself, letting you write patterns like "find all string literals assigned to variables named 'apiKey' within object literals." This is fundamentally more powerful than regex because you're querying structure and syntax, not character sequences.

For URL extraction, jsluice tracks context through parent AST nodes. When it finds a URL string as the first argument to fetch(), it knows this is likely a GET request (unless a second argument specifies otherwise). When the same URL appears in XMLHttpRequest.open('POST', url), it captures the POST method. This contextual awareness means you're not just getting a list of URLs—you're getting actionable intelligence about how they're used.

The parameter extraction demonstrates another key design decision. When jsluice encounters new URLSearchParams({userId: EXPR, token: EXPR}), it doesn't just report a URL with unknown parameters. It extracts the parameter names userId and token, telling you exactly what data structure the API expects. This transforms security testing from guesswork into informed exploration.

The tool also handles edge cases thoughtfully. It recognizes common JavaScript idioms like window.location.href, document.location.pathname, and even framework-specific routing helpers from React Router or Vue Router. It understands that import() statements may contain paths worth extracting. It knows that base64-encoded strings in specific contexts might be JWTs worth decoding and analyzing.

Gotcha

The EXPR placeholder approach is brilliant but creates a fundamental tension: you get broader coverage at the cost of some noise. When you see https://api.example.com/EXPR/EXPR?filter=EXPR, you know there's an endpoint with a two-segment path and a filter parameter, but you don't know valid values. In complex applications where path segments are computed from router configurations or loaded from external JSON, you might get https://cdn.example.com/EXPR/EXPR/EXPR/EXPR/file.js—technically correct but not actionable. You'll need to deduplicate and filter results, possibly correlating them with runtime observations or API documentation.

The reliance on syntactically valid JavaScript is a hard constraint. If you're analyzing heavily obfuscated code—the kind that uses control flow flattening, string array rotation, or Dead Code Injection—you'll need to deobfuscate first. Tools like webcrack or synchrony can help, but they're not perfect. Similarly, if the JavaScript is malformed (missing semicolons in ways that break AST construction, syntax errors from incomplete downloads), jsluice will fail to parse it entirely. Unlike regex tools that happily scan garbage and sometimes find patterns anyway, AST parsing is all-or-nothing. Finally, remember that static analysis is blind to runtime behavior. If an application fetches a configuration JSON that contains API endpoints, or uses eval() to dynamically construct URLs from encrypted strings, jsluice won't see those. It finds what's written in the code, not what the code might fetch or compute at runtime.

Verdict

Use jsluice if you're doing security reconnaissance on modern JavaScript applications during penetration tests, bug bounty programs, or supply chain audits—especially when you need to map API surfaces from client-side bundles. It excels in scenarios where developers use string concatenation, template literals, or framework helpers to build URLs dynamically. It's invaluable as a library component in automated security pipelines that need to extract and test endpoints from continuously deployed frontends. Use it when you need context-aware extraction: knowing whether a URL is fetched via GET or POST, what parameter names an API expects, or whether a secret appears in a high-risk context. Skip it if you're analyzing non-JavaScript files (use truffleHog or detect-secrets for polyglot scanning), working with heavily obfuscated code that hasn't been deobfuscated first, or need runtime analysis of dynamically loaded resources (pair it with a proxy or browser instrumentation instead). Skip it for one-off quick scans where LinkFinder's regex simplicity is good enough and you don't care about missing concatenated strings. The learning curve for extending matchers with Tree-sitter queries is steep, so if you need highly custom patterns and lack Tree-sitter experience, Semgrep might be more approachable despite being less JavaScript-specialized.