Tarnish: Building a Static Analysis Pipeline for Chrome Extension Security

Hook

Chrome extensions have access to every password field, every API token, and every cookie in your browser—yet most are built by individual developers with zero security review. Tarnish is one developer's attempt to change that with automated static analysis.

Context

The Chrome Web Store hosts over 150,000 extensions with billions of installations, but unlike mobile app stores, there's no meaningful security review process. Extensions run with powerful APIs that can inject scripts into banking sites, intercept network requests, and access local storage across every domain you visit. A single XSS vulnerability in an extension becomes a universal XSS affecting every site the user browses.

Manual security reviews don't scale. Reading through obfuscated JavaScript, tracking down dangerous API usage, validating Content Security Policies, and checking for vulnerable dependencies takes hours per extension. Security researchers needed a way to triage extensions quickly, identifying the most dangerous patterns automatically so they could focus their expertise on complex logic flaws. Tarnish emerged from this frustration as a tool specifically designed for the unique threat model of browser extensions.

Technical Insight

System architecture — auto-generated

Tarnish's architecture is built around distributed task processing using Celery, Python's asynchronous task queue. When you submit an extension ID from the Chrome Web Store, the frontend posts to an API that creates a Celery task. Workers running on AWS ElasticBeanstalk spot instances pick up these tasks, download the CRX file, unpack it, and run a battery of analyzers. Redis serves as both the message broker for Celery and the results cache.

The choice of spot instances is clever cost optimization. Extension analysis is computationally intensive but not time-critical—users can wait 30 seconds for results. Spot instances can be 70-90% cheaper than on-demand instances, and ElasticBeanstalk's auto-scaling handles the orchestration. If AWS reclaims a spot instance mid-analysis, Celery's acknowledgment system ensures the task gets reassigned to another worker.

The real intelligence lives in the analyzer modules. Rather than generic JavaScript linting, Tarnish looks for extension-specific vulnerabilities. The dangerous functions analyzer doesn't just grep for eval—it tracks which manifest files declare which scripts, whether dangerous functions appear in content scripts (higher risk) versus background pages, and whether the extension's CSP would actually prevent exploitation:

// Simplified version of Tarnish's dangerous function detection
function analyzeDangerousFunctions(extensionFiles, manifest) {
  const dangerousPatterns = [
    { name: 'eval', pattern: /\beval\s*\(/, risk: 'high' },
    { name: 'Function constructor', pattern: /new\s+Function\s*\(/, risk: 'high' },
    { name: 'innerHTML', pattern: /\.innerHTML\s*=/, risk: 'medium' },
    { name: 'document.write', pattern: /document\.write\s*\(/, risk: 'medium' }
  ];

  const findings = [];
  
  for (const [filepath, content] of Object.entries(extensionFiles)) {
    const scriptType = classifyScript(filepath, manifest);
    
    for (const pattern of dangerousPatterns) {
      const matches = [...content.matchAll(pattern.pattern)];
      
      for (const match of matches) {
        findings.push({
          function: pattern.name,
          file: filepath,
          line: getLineNumber(content, match.index),
          context: scriptType, // 'content_script', 'background', 'popup'
          risk: scriptType === 'content_script' ? 'critical' : pattern.risk,
          snippet: extractCodeSnippet(content, match.index)
        });
      }
    }
  }
  
  return findings;
}

function classifyScript(filepath, manifest) {
  if (manifest.content_scripts?.some(cs => cs.js?.includes(filepath))) {
    return 'content_script';
  }
  if (manifest.background?.scripts?.includes(filepath)) {
    return 'background';
  }
  return 'other';
}

This context-aware analysis is crucial. An eval() in a background page that only processes data from the extension's own servers is concerning but not critical. The same eval() in a content script that touches untrusted DOM content is a universal XSS waiting to happen.

Tarnish also includes a CSP bypass detector that understands extension security models. Many extensions set strict CSPs like script-src 'self', but then whitelist CDNs for legitimate libraries. The analyzer checks if those CDNs have known bypasses—Angular on Google's CDN, for instance, can be exploited with ng-app and ng-csp attributes to execute arbitrary expressions. It also flags CSPs that use unsafe-eval, which defeats most of the security benefits.

The web_accessible_resources analyzer generates actionable fingerprinting code. Extensions must explicitly declare which internal resources can be accessed from web pages. This creates a fingerprinting vector—websites can probe for these resources to detect which extensions users have installed. Tarnish automatically generates the JavaScript needed to detect the extension:

// Tarnish output for fingerprinting web_accessible_resources
const fingerprintExtension = async () => {
  const resources = [
    'icons/icon128.png',
    'content/styles.css',
    'lib/jquery.min.js'
  ];
  
  for (const resource of resources) {
    const url = `chrome-extension://extensionid/${resource}`;
    try {
      const response = await fetch(url);
      if (response.ok) {
        return true; // Extension detected
      }
    } catch (e) {
      // Resource not accessible
    }
  }
  return false;
};

This kind of output transforms Tarnish from a simple vulnerability scanner into a research tool. Security researchers can immediately test whether their target extension is detectable, which matters for threat modeling—some attacks only work if you can identify which extensions a user has installed.

The Retire.js integration deserves mention. Tarnish doesn't just report that an extension uses jQuery 1.8.3; it tells you which specific files include it, whether those files are content scripts or background pages, and which CVEs apply. This specificity helps prioritize remediation—a vulnerable library that's only used in the options page has different impact than one in a content script.

Gotcha

The biggest limitation is right in the README: this is 'unpolished' infrastructure code. Setting up Tarnish requires provisioning ElasticBeanstalk environments, configuring Redis, setting up S3 buckets with correct CORS policies, building Docker images, and managing AWS credentials. The documentation assumes you're comfortable with AWS and Celery. There's no docker-compose file, no one-click deploy, no local development mode that works without external dependencies. For most developers, using the hosted version at thehackerblog.com/tarnish is the only practical option.

Static analysis has inherent blindspots. Tarnish can't detect logic flaws—an extension that properly escapes user input but still has a privilege escalation bug in its message passing logic will pass all checks. It can't analyze minified or obfuscated code effectively. It can't detect vulnerabilities that depend on runtime behavior, like race conditions or time-of-check-time-of-use bugs. Dynamic analysis tools that actually execute extensions in instrumented browsers would catch different classes of vulnerabilities, but Tarnish doesn't attempt that. It's a first-pass triage tool, not comprehensive security coverage.

The cost model also deserves scrutiny. While spot instances are cheaper, t2.medium instances (the minimum recommended) running continuously still cost real money. The README suggests this can be run as a public service, but doesn't provide cost estimates. For occasional personal use, spinning up infrastructure seems like overkill compared to just using Retire.js and grep with some custom scripts.

Verdict

Use if: You're conducting regular security reviews of Chrome extensions—as a bug bounty hunter, security researcher, or enterprise security team vetting extensions before deployment. The automated context (which files are content scripts, where dangerous functions appear, actionable fingerprinting code) saves hours per review. The architecture also works as a template if you're building similar distributed analysis systems for other artifact types. Skip if: You're reviewing a single extension or don't want to manage AWS infrastructure—just use the hosted version or cobble together Retire.js with manual code review. Also skip if you need comprehensive security assessment; static analysis is only one piece of a thorough extension audit, and Tarnish's findings should trigger deeper investigation, not replace it.

Tarnish: Building a Static Analysis Pipeline for Chrome Extension Security

Tarnish: Building a Static Analysis Pipeline for Chrome Extension Security

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Tarnish: Building a Static Analysis Pipeline for Chrome Extension Security

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]