GuardDog: How Datadog Built a Multi-Ecosystem Malware Scanner for Supply Chain Threats

Hook

In 2022, malicious PyPI packages targeting developers reached a record high of 3,000+ incidents. Traditional vulnerability scanners that only check CVE databases missed nearly all of them because zero-day malware doesn't have a CVE number yet.

Context

The software supply chain attack surface has exploded. Modern applications pull in hundreds of dependencies, and attackers have learned that compromising a single popular package is more efficient than targeting individual applications. The attack vectors have evolved beyond known vulnerabilities: typosquatting (registering packages named "reqeusts" instead of "requests"), dependency confusion, compromised maintainer accounts, and packages that exfiltrate environment variables during installation.

Traditional security tools weren't built for this threat model. Tools like pip-audit and npm audit excel at matching known CVEs against your dependency tree, but they're blind to novel malware. You need something that actually analyzes package behavior—examining source code for obfuscation techniques, checking if install scripts spawn reverse shells, detecting base64-encoded payloads, and validating package metadata for social engineering red flags. Datadog built GuardDog to fill this gap, drawing from their operational experience monitoring production supply chain attacks across customer environments.

Technical Insight

GuardDog's architecture splits malware detection into two parallel paths: source code analysis using Semgrep rules and metadata validation using custom Python logic. This hybrid approach catches both technical exploits hiding in code and social engineering tactics in package metadata.

The source code analysis leverages Semgrep—a static analysis tool that pattern-matches code using a YAML rule format. GuardDog ships with ecosystem-specific rule sets targeting behaviors like command execution from encoded strings, network requests to suspicious domains, and file system manipulation during package installation. Here's how you'd scan a suspicious PyPI package:

# Install GuardDog
pip install guarddog

# Scan a specific package from PyPI
guarddog pypi scan requests-plus

# Output in SARIF format for CI/CD integration
guarddog pypi scan requests-plus --output-format sarif > results.sarif

# Bulk scan all dependencies in your requirements file
guarddog pypi verify requirements.txt

Under the hood, GuardDog downloads the package source from the registry, unpacks it, and runs Semgrep rules against every Python file. The rules target patterns like this one, which detects base64 decoding followed by execution:

rules:
  - id: base64-exec
    pattern-either:
      - pattern: exec(base64.b64decode(...))
      - pattern: eval(base64.b64decode(...))
    message: "Base64 decoding followed by execution"
    severity: ERROR

The metadata analysis runs in parallel, examining package characteristics without touching source code. It checks for typosquatting by computing string similarity against popular package names, flags packages with recent maintainer changes, and validates email domains for common red flags (disposable email providers, recently registered domains). This catches attacks where the malicious payload isn't in the code yet—the attacker registers a convincing package name, builds reputation with benign releases, then ships malware in version 2.0 after developers have added it to their dependencies.

For npm packages, GuardDog additionally examines package.json lifecycle scripts—the preinstall, postinstall, and preuninstall hooks that run arbitrary commands. Attackers frequently abuse these to exfiltrate credentials during npm install. The tool flags any lifecycle script containing suspicious patterns: curl, wget, environment variable access (process.env), or encoded payloads.

The parallel execution model is critical for performance. Downloading and unpacking packages is I/O-bound, while Semgrep analysis is CPU-bound. GuardDog uses Python's concurrent.futures to pipeline operations—while Semgrep analyzes package A, the downloader is already fetching package B. For bulk verification of a 200-dependency requirements.txt, this parallelism cuts scan time from 15 minutes to under 3 minutes on a typical CI runner.

Integration points make GuardDog production-ready. The SARIF output format means it works with GitHub Security tab, GitLab security dashboards, and any SIEM that ingests SARIF. The JSON output includes severity scores, affected file paths, and remediation guidance—everything you need to triage findings in Jira or PagerDuty. For Datadog customers, there's native integration with the Datadog Agent to stream findings into Cloud SIEM alongside application logs and runtime telemetry, correlating supply chain signals with actual application behavior.

Gotcha

GuardDog's heuristic approach means you'll encounter false positives. Legitimate packages sometimes use base64 encoding for binary data or make network requests during setup for feature detection. A package that downloads ML model weights during installation will trigger the "network-request-in-setup" rule, even though the behavior is benign. You'll need to build an allowlist for your dependency tree and review findings manually—budget time for security triage, not just automated blocking.

Performance becomes a bottleneck at scale. Scanning a single package takes 2-5 seconds (download, unpack, analyze), which is fine for auditing suspicious packages interactively. But verifying a dependency tree with 300 packages means 10-15 minutes even with parallelism. If you're scanning on every pull request in a monorepo, this overhead adds up. The tool works best as a targeted security gate—scan new dependencies when they're added, or run nightly audits of your entire tree—not as a real-time check on every commit. There's no caching mechanism either, so rescanning the same package version repeats all the work.

Windows support is Docker-only because GuardDog relies on system tools (file unpacking utilities, Semgrep's Linux binaries) that don't have native Windows equivalents. If your development team or CI environment is Windows-native without Docker, you'll need to spin up Linux VMs or use WSL2, adding infrastructure complexity.

Verdict

Use GuardDog if you're building supply chain security gates for open-source dependencies and need ecosystem-aware malware detection beyond CVE scanning—particularly if you consume packages from PyPI or npm where malicious packages are frequent. It's ideal for security teams triaging suspicious packages reported by developers, implementing CI/CD checks when dependencies change, or conducting forensic analysis after incidents. The SARIF output and Datadog integration make it especially compelling if you already have security tooling that consumes standardized formats. Skip it if you need real-time, low-latency scanning that runs synchronously with package installation (Socket.dev's registry integration is better for that workflow), or if you're on Windows without Docker infrastructure. Also skip if your threat model is primarily known vulnerabilities rather than zero-day malware—traditional pip-audit or npm audit will be faster and more accurate for CVE detection. GuardDog shines when you're hunting for novel threats that haven't made it into vulnerability databases yet.

GuardDog: How Datadog Built a Multi-Ecosystem Malware Scanner for Supply Chain Threats

GuardDog: How Datadog Built a Multi-Ecosystem Malware Scanner for Supply Chain Threats

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

GuardDog: How Datadog Built a Multi-Ecosystem Malware Scanner for Supply Chain Threats

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]