Building a Serverless Technology Detector with Uptain: Wappalyzer Meets AWS Lambda
Hook
Most technology detection tools miss half the story because they don't execute JavaScript. When you fetch a React app's HTML, you get an empty div and zero insights about what's actually running in the browser.
Context
Competitive intelligence, security audits, lead generation, and portfolio analysis all share a common need: identifying what technologies power a website. Is that competitor using React or Vue? What analytics tools track their users? Which CDN serves their assets? Wappalyzer pioneered this space with a crowdsourced database of detection patterns—regex signatures, DOM selectors, HTTP headers, and JavaScript globals that fingerprint thousands of technologies.
But using Wappalyzer at scale presents challenges. Running it locally requires Node.js infrastructure and maintenance. Browser extensions work for manual inspection but don't scale to automated workflows. The official Wappalyzer API exists but comes with usage limits and costs. Uptain emerged as a middle path: wrap Wappalyzer in a serverless function that anyone can deploy to AWS Lambda, providing on-demand technology detection without managing servers or paying for idle time.
Technical Insight
Uptain's architecture solves a deceptively complex problem: rendering modern JavaScript applications in a serverless environment, then analyzing the resulting DOM and network activity. The core challenge is that Lambda functions are ephemeral and stateless—they don't have persistent browser instances or GPU acceleration for rendering.
The solution uses Puppeteer with chrome-aws-lambda, a distribution of Chromium compiled specifically for Lambda's execution environment. Here's how a typical request flows through the system:
const chromium = require('chrome-aws-lambda');
const puppeteer = require('puppeteer-core');
const Wappalyzer = require('wappalyzer');
exports.handler = async (event) => {
const { url } = JSON.parse(event.body);
// Launch headless Chrome in Lambda
const browser = await puppeteer.launch({
args: chromium.args,
executablePath: await chromium.executablePath,
headless: chromium.headless,
});
const page = await browser.newPage();
// Navigate and wait for network idle
await page.goto(url, {
waitUntil: 'networkidle2',
timeout: 30000
});
// Extract HTML, scripts, and cookies
const html = await page.content();
const scripts = await page.evaluate(() =>
Array.from(document.scripts).map(s => s.src)
);
const cookies = await page.cookies();
await browser.close();
// Run Wappalyzer detection
const wappalyzer = new Wappalyzer();
const results = await wappalyzer.analyze({
url,
html,
scripts,
cookies
});
return {
statusCode: 200,
headers: { 'Access-Control-Allow-Origin': '*' },
body: JSON.stringify(results)
};
};
The critical architectural decision here is using networkidle2 as the wait condition. This tells Puppeteer to wait until there are no more than two network connections for at least 500ms. For single-page applications that lazy-load resources, this ensures JavaScript frameworks have initialized and injected their telltale signatures into the DOM before Wappalyzer analyzes the page.
Wappalyzer itself operates on pattern matching across multiple dimensions. It checks HTTP headers for server signatures ("X-Powered-By: Express"), searches HTML for meta tags and comments, looks for specific JavaScript globals ("window.jQuery"), and matches script source URLs against known CDN patterns. By running Wappalyzer after full page render, Uptain catches client-side technologies that static crawlers miss entirely.
The CORS headers (Access-Control-Allow-Origin: *) are particularly noteworthy. This design choice allows browser-based applications to call the API directly without a backend proxy. A Chrome extension could analyze any tab by sending a POST request to your deployed Lambda URL. A React dashboard could batch-analyze competitor websites entirely client-side. This openness trades security for convenience—there's no authentication layer in the base implementation.
Deployment leverages the Serverless Framework, which abstracts AWS infrastructure setup. The serverless.yml configuration specifies memory allocation (typically 1536MB to ensure adequate headroom for Chromium), timeout limits (30-60 seconds to prevent runaway executions), and API Gateway integration for HTTP access. Cold starts—the time Lambda takes to spin up a new container and load Chromium—typically add 3-5 seconds to the first request, then subsequent requests in the same container execute in under 2 seconds.
One elegant aspect of this architecture is resource efficiency. Unlike keeping a fleet of browser instances running 24/7, Lambda only charges for actual execution time. If you analyze 100 websites per day, you pay for perhaps 200 seconds of compute. The tradeoff is latency: each cold start downloads and initializes a 50MB+ Chromium binary, which makes Uptain unsuitable for latency-sensitive applications but perfect for batch processing or infrequent queries.
Gotcha
The most immediate limitation is Lambda's 15-minute execution timeout and 10GB memory ceiling. Complex websites with heavy JavaScript, infinite scroll, or aggressive lazy loading can exceed these limits. A page that takes 45 seconds to fully render locally might timeout entirely in Lambda, returning incomplete results or errors. There's no queue mechanism or retry logic in Uptain's base implementation—requests either succeed within the timeout or fail.
Puppeteer resource consumption is unpredictable. A simple blog might use 200MB of memory and finish in 3 seconds. A React admin dashboard with multiple third-party analytics scripts could consume 1.2GB and take 25 seconds. This variability makes capacity planning difficult. You'll need to overprovision memory (and thus pay more) to handle worst-case scenarios, or accept that some percentage of requests will fail.
Maintenance is another concern. The repository has minimal stars and no recent activity, suggesting it's a proof-of-concept rather than production-maintained software. Wappalyzer updates its detection signatures regularly as new technologies emerge, but Uptain's dependency versions are frozen at installation time. You're responsible for bumping Wappalyzer versions, testing compatibility with new chrome-aws-lambda releases, and handling breaking changes in the Puppeteer API. For teams wanting "deploy and forget" infrastructure, this maintenance burden may outweigh the benefits of serverless architecture.
Verdict
Use Uptain if you need occasional technology detection (under 1,000 requests/day), want zero-maintenance infrastructure between uses, and can tolerate 3-7 second response times including cold starts. It's excellent for proof-of-concepts, internal tools that analyze competitor stacks monthly, or browser extensions where users accept loading delays. The CORS-enabled design makes it ideal for client-side applications that can't run Wappalyzer directly in the browser. Skip it if you need sub-second response times, process thousands of URLs daily (costs escalate quickly at $0.20 per million Lambda requests plus memory charges), or require guaranteed uptime and support. For high-volume production use, self-host Wappalyzer with dedicated Puppeteer cluster infrastructure like Browserless.io, or pay for commercial APIs like BuiltWith that handle scaling and maintenance for you.