> your AI agent picks dependencies from memory; give it dated facts — try starlog.dev ↗ vet your agent's deps ↗ vibe-coding is fine. vibe-importing isn’t. — try starlog.dev ↗ vibe-importing isn’t fine ↗ your agent has never seen your private packages — try starlog.dev ↗ facts for private packages ↗ a linter for the dependencies your AI agent picks — try starlog.dev ↗ a linter for agent deps ↗

Back to Articles

Building a Technology Detection Microservice with Wappalyzer API

[ View on GitHub ]

Building a Technology Detection Microservice with Wappalyzer API

Hook

Every website you visit broadcasts dozens of technology choices through HTTP headers, JavaScript patterns, and DOM structures—and Wappalyzer API lets you capture all of it with a single API call.

Context

Technology detection has always been a manual, time-consuming process. Security teams pore over HTTP headers, developers inspect source code to understand competitor stacks, and sales teams struggle to identify prospects using specific technologies. Wappalyzer solved part of this problem with a browser extension that automatically identifies technologies, but integrating that detection capability into automated workflows remained friction-filled.

The browser extension approach works for one-off analysis, but falls apart when you need to scan hundreds of domains, integrate detection into CI/CD pipelines, or build competitive intelligence dashboards. The Wappalyzer NPM package exists, but requires Node.js expertise, dependency management, and custom wrapper code. Hunter.io recognized this deployment gap and created wappalyzer-api: a containerized REST API that transforms technology detection into a simple HTTP endpoint. It's the difference between manually checking each domain in a browser versus programmatically scanning your entire market segment in minutes.

Technical Insight

Docker Container

GET /extract?url=

Fetch page content

HTML, headers, scripts

Analyze content

Pattern matching

Detected technologies

JSON response

HTTP Client

Express API Server

Wappalyzer Engine

Target Website

Technology Signatures DB

System architecture — auto-generated

The architecture is deliberately minimal—a Node.js Express server wrapping the Wappalyzer core library inside a Docker container. The entire API surface consists of a single GET endpoint that accepts a URL and returns detected technologies as JSON. This simplicity is the point: it eliminates configuration complexity and makes integration trivial.

Getting started requires just two commands. First, pull and run the Docker container:

docker pull ghcr.io/hunter-io/wappalyzer-api:latest
docker run -p 3000:3000 ghcr.io/hunter-io/wappalyzer-api:latest

Then query any website:

curl "http://localhost:3000/extract?url=https://stripe.com"

The response reveals the technology stack in structured JSON:

{
  "url": "https://stripe.com",
  "technologies": [
    {
      "name": "React",
      "categories": ["JavaScript frameworks"],
      "icon": "React.svg",
      "website": "https://reactjs.org",
      "confidence": 100
    },
    {
      "name": "Webpack",
      "categories": ["Module bundlers"],
      "confidence": 100
    },
    {
      "name": "Node.js",
      "categories": ["Web servers"],
      "confidence": 100
    }
  ]
}

Under the hood, Wappalyzer uses pattern matching against a massive signature database. It analyzes HTTP response headers looking for server signatures, parses HTML for meta tags and script patterns, executes regex against inline JavaScript, and even checks CSS patterns and cookie names. Each technology in Wappalyzer's database includes detection rules like "if X-Powered-By header contains 'Express', confidence is 100%" or "if window.React exists in JavaScript, it's React."

The containerized approach solves several deployment headaches. Dependencies are frozen inside the image, eliminating "works on my machine" problems. You get consistent behavior across development, staging, and production. Scaling horizontally is trivial—just spin up more containers behind a load balancer. And because it's a standard HTTP API, any language can consume it:

import requests
import json

def analyze_competitor_stack(domain):
    response = requests.get(
        'http://localhost:3000/extract',
        params={'url': f'https://{domain}'}
    )
    data = response.json()
    
    frameworks = [tech['name'] for tech in data['technologies']
                  if 'JavaScript frameworks' in tech['categories']]
    
    return {
        'domain': domain,
        'frameworks': frameworks,
        'total_technologies': len(data['technologies'])
    }

# Scan entire market segment
competitors = ['shopify.com', 'bigcommerce.com', 'woocommerce.com']
for competitor in competitors:
    stack = analyze_competitor_stack(competitor)
    print(f"{stack['domain']}: {stack['frameworks']}")

This makes technology detection a composable building block. You can chain it with other services, cache results in Redis, feed detections into analytics pipelines, or trigger alerts when competitors adopt new technologies. The stateless design means each request is independent—perfect for parallel processing or serverless architectures.

The confidence scoring system deserves attention. Wappalyzer doesn't just return binary yes/no answers; each detection includes a confidence percentage. A confidence of 100% typically means explicit evidence like a version number in a header. Lower confidence might indicate indirect patterns like coding style or common library conventions. This granularity lets you filter results based on certainty thresholds, which matters when building security tooling or compliance reports where false positives have consequences.

Gotcha

The most glaring limitation is the complete absence of authentication or rate limiting. Deploy this publicly without additional protection and you're running an open proxy that anyone can abuse. Every request triggers a full page fetch from the target URL, making it trivial for attackers to use your instance for reconnaissance or DDoS reflection attacks. You'll need to add your own API gateway with authentication, implement IP whitelisting at the container level, or place it behind a VPN. This isn't a production-ready service out of the box—it's a component that requires security hardening.

Accuracy is entirely dependent on Wappalyzer's signature database. If a technology isn't in their detection rules, it won't be found. Custom internal frameworks, heavily obfuscated code, and newer technologies lag behind. False positives happen when sites intentionally spoof signatures or when patterns coincidentally match. For example, a site might use webpack but serve pre-rendered static HTML that hides all the typical webpack fingerprints. The API will miss it. Similarly, server-side rendering frameworks can mask client-side framework detection. There's no feedback mechanism to improve detections or add custom rules without forking the entire project and maintaining your own signature database. The single-URL-per-request model also means no built-in batching or result caching, so scanning thousands of domains requires custom orchestration and respectful rate limiting to avoid hammering target servers.

Verdict

Use if: You need programmatic technology detection integrated into security scanning workflows, competitive intelligence dashboards, or lead generation tools. You're comfortable adding your own authentication layer and deploying behind proper infrastructure. You want a language-agnostic API that any service can call without Node.js dependencies. You value deployment simplicity and don't need deep customization of detection logic. Skip if: You require enterprise-grade security features like built-in authentication, rate limiting, and audit logging without additional work. You need to customize detection signatures for proprietary technologies or want control over the scanning engine itself. You're scanning at massive scale where the per-request overhead matters (in which case, integrate Wappalyzer's library directly). You need historical technology data or change tracking rather than point-in-time snapshots.