LinkFinder: Mining JavaScript for Hidden API Endpoints with Regex Precision

Hook

Modern web applications leak their entire API surface in client-side JavaScript—LinkFinder extracts every endpoint in seconds while your competitors are still clicking through browser DevTools manually.

Context

Web application security testing has a reconnaissance problem. Developers ship increasingly complex single-page applications where API endpoints are defined in JavaScript route handlers, imported from modules, or constructed dynamically. Traditional web crawlers can't see these endpoints because they're never rendered as clickable links. Manual inspection using browser DevTools works, but becomes impractical when you're analyzing dozens of JavaScript files across a target domain.

This gap is particularly painful for penetration testers and bug bounty hunters. Missing an obscure admin endpoint or undocumented API route could mean missing a critical vulnerability. Before LinkFinder, security researchers would grep through JavaScript files manually, write one-off regex scripts, or simply accept incomplete coverage. LinkFinder emerged from this frustration, offering a purpose-built tool that automates endpoint discovery with a pragmatic regex-based approach that prioritizes speed and ease of use over theoretical completeness.

Technical Insight

System architecture — auto-generated

LinkFinder's architecture is deceptively simple: normalize JavaScript, apply regex patterns, filter results, output findings. But this simplicity is a feature, not a limitation. The tool uses jsbeautifier to transform minified or obfuscated JavaScript into a consistent format before analysis, which dramatically improves regex matching accuracy without requiring heavyweight AST parsing.

The core engine applies four distinct regular expressions designed to catch different URL patterns. The first pattern matches full URLs with protocols (https://api.example.com/users), the second catches absolute or dotted paths (/api/users or ../config.json), the third identifies relative paths without slashes (users/profile), and the fourth handles query parameters and fragments. Here's a simplified view of how LinkFinder processes a typical JavaScript file:

import jsbeautifier
import re

def find_endpoints(js_content):
    # Beautify to normalize code structure
    beautified = jsbeautifier.beautify(js_content)
    
    # Pattern for relative paths (simplified)
    regex_str = r"(?:\"|')(((?:[a-zA-Z]{1,10}://|//)[^\"'/]{1,}\.[a-zA-Z]{2,}[^\"']{0,})|((?:/|\.\./)(?:[^\"'><,;| *()(%%$^/\\\[\]][^\"'><,;|()]{1,})|([a-zA-Z0-9_\-/]{1,}/[a-zA-Z0-9_\-/]{1,}\.(?:[a-zA-Z]{1,4}|action)(?:[\?|#][^\"|']{0,}|))|([a-zA-Z0-9_\-/]{1,}/[a-zA-Z0-9_\-/]{3,}(?:[\?|#][^\"|']{0,}|))|([a-zA-Z0-9_\-]{1,}\.(?:php|asp|aspx|jsp|json|action|html|js|txt|xml)(?:[\?|#][^\"|']{0,}|)))(?:\"|')"
    
    endpoints = []
    for match in re.finditer(regex_str, beautified):
        endpoint = match.group(1)
        if endpoint and not is_false_positive(endpoint):
            endpoints.append(endpoint)
    
    return endpoints

def is_false_positive(endpoint):
    # Filter common false positives
    blacklist = ['jquery', 'google-analytics', 'gstatic']
    return any(b in endpoint.lower() for b in blacklist)

The tool's input flexibility is where it really shines for real-world workflows. You can feed LinkFinder a single URL, a local JavaScript file, an entire directory, or even Burp Suite saved items. When analyzing a live URL, LinkFinder fetches the page, extracts all script tags and external JavaScript references, then recursively analyzes each file:

# Analyze a single JS file from URL
python linkfinder.py -i https://example.com/app.js -o results.html

# Process entire domain (crawl and analyze all JS)
python linkfinder.py -i https://example.com -d -o results.html

# Burp Suite integration
python linkfinder.py -i burp_saved_items.xml -b -o results.html

# Pipeline-friendly CLI output
python linkfinder.py -i app.js -o cli

The HTML output mode generates an interactive report that groups endpoints by their source file and highlights different URL types with color coding, making manual review efficient. The CLI output mode prints raw endpoints to stdout, perfect for piping into other tools or scripting automated workflows.

What makes LinkFinder's regex approach smart is the post-processing filter that removes common false positives. The tool maintains a blacklist of domains (like CDN providers and analytics services) and patterns (like common library names) that would otherwise pollute results. This filtering happens after extraction, keeping the core regex patterns broadly permissive while the filter handles precision.

The decision to use regex instead of Abstract Syntax Tree parsing is LinkFinder's most interesting architectural choice. AST-based tools like JSParser or Esprima can perfectly parse JavaScript's structure, understanding context and semantics. But this comes at a significant performance cost—AST parsing is 10-50x slower than regex matching. For security reconnaissance where you might scan hundreds of JavaScript files across dozens of domains, this speed difference is the difference between a 30-second scan and a 30-minute scan. LinkFinder accepts that regex will generate some false positives in exchange for being fast enough to actually use in practice.

Gotcha

LinkFinder's regex-based approach means you'll always get false positives. Strings that look like endpoints but are actually variable names, documentation examples, or dead code will appear in your results. Expect to spend time manually reviewing findings, especially when analyzing large applications. The tool will match "/api/users" whether it's a real endpoint or just a string literal in a comment.

More critically, LinkFinder struggles with modern JavaScript build processes. Webpack-bundled applications that use dynamic imports, environment-based configuration, or heavy string concatenation will hide endpoints from regex analysis. If your target constructs URLs like baseURL + '/' + resource + '/' + action, LinkFinder will only catch the fragments, not the complete endpoint. Obfuscated code beyond simple minification (like identifier mangling or string splitting) will similarly evade detection. The jsbeautifier normalization helps with standard minification but can't reconstruct URLs from encrypted strings or base64-encoded configuration.

The tool also has no concept of authentication or session handling. If JavaScript files are protected behind login walls or require specific cookies/headers to access, you'll need to manually download them first or configure LinkFinder with custom headers (which it supports, but requires additional setup). There's no JavaScript execution environment, so any endpoints generated at runtime by AJAX callbacks or WebSocket connections won't be discovered.

Verdict

Use if: You're conducting security assessments on web applications and need to quickly enumerate potential attack surface from JavaScript files. LinkFinder excels in bug bounty reconnaissance, penetration testing intake phases, and security audits where speed matters and you're comfortable doing manual review of results. It's particularly valuable when integrated into automated scanning pipelines or used alongside Burp Suite for comprehensive coverage. The HTML output makes it perfect for generating evidence in security reports. Skip if: Your target uses heavily obfuscated JavaScript that goes beyond standard minification, you need 100% precision without false positives (no regex-based tool can provide this), or you're analyzing modern frameworks where dynamic imports and runtime URL construction are prevalent. Also skip if you're looking for a point-and-click solution with zero configuration—LinkFinder requires command-line comfort and understanding of what you're looking for. In those cases, invest time in AST-based tools or manual analysis with browser DevTools.

LinkFinder: Mining JavaScript for Hidden API Endpoints with Regex Precision

LinkFinder: Mining JavaScript for Hidden API Endpoints with Regex Precision

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

LinkFinder: Mining JavaScript for Hidden API Endpoints with Regex Precision

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]