Confused: Detecting Dependency Confusion Before Attackers Exploit Your Private Packages
Hook
In 2021, a security researcher earned $130,000 in bug bounties by uploading packages with names like "aws-api" and "azure-cli" to public registries, then watching as companies accidentally installed them. Your build pipeline might be next.
Context
Dependency confusion—also called namespace shadowing—exploits a subtle quirk in how package managers resolve dependencies. When your application declares a dependency, most package managers check both your configured private registry and public repositories like npm or PyPI. If a package exists in both locations, the manager typically chooses whichever has the higher version number. This behavior creates an attack vector: if you use an internal package called "payment-processor" that isn't registered on PyPI, an attacker can publish a malicious package with that exact name to the public registry. Bump the version to something absurdly high like 999.0.0, and suddenly every build that doesn't explicitly pin to your private registry will pull down the attacker's code instead of yours.
Alex Birsan's 2021 disclosure revealed this wasn't theoretical—he successfully executed dependency confusion attacks against Apple, Microsoft, PayPal, and dozens of other companies by programmatically identifying internal package names (often leaked through JavaScript source maps or GitHub repositories) and uploading harmless proof-of-concept packages to public registries. The companies' CI/CD pipelines automatically downloaded and executed his code. The security industry needed a defensive tool that could identify these vulnerable namespace gaps before attackers did. That's where Confused enters: a simple scanner that inverts the attacker's methodology, helping development teams audit their own dependencies for exposure.
Technical Insight
Confused's architecture is deliberately minimal. Written in Go and distributed as a single static binary, it operates as a manifest parser coupled with registry API clients. The tool reads dependency files in their native formats—requirements.txt for Python, package.json for Node.js, composer.json for PHP, pom.xml for Maven, and Gemfile.lock for Ruby—extracts package identifiers, then queries each ecosystem's public registry to determine whether those packages exist publicly.
Here's how you'd scan a Python project in a CI pipeline:
# Install confused (downloads ~5MB Go binary)
curl -L https://github.com/visma-prodsec/confused/releases/latest/download/confused-linux-amd64 -o confused
chmod +x confused
# Scan requirements.txt for dependency confusion vulnerabilities
./confused -l requirements.txt
# Example output:
# [!] Package 'internal-auth-lib' not found on PyPI
# [!] Package 'company-data-models' not found on PyPI
# [+] Package 'requests' found on PyPI
# [+] Package 'django' found on PyPI
The packages flagged with [!] represent potential attack vectors—these are dependencies your application expects to find, but which don't exist in the public registry. An attacker could register these names and potentially compromise your builds. For a more sophisticated scan that filters out known-safe namespaces, you'd use the -s flag with wildcard patterns:
# Skip packages matching our organization's namespace pattern
./confused -l package.json -s "@mycompany/*"
# Multiple namespace patterns
./confused -l requirements.txt -s "mycompany-*" -s "internal-*"
Internally, Confused maintains registry client implementations for each ecosystem. The PyPI client, for instance, hits the simple JSON API endpoint at https://pypi.org/pypi/{package_name}/json and checks for a 200 response. The npm client queries https://registry.npmjs.org/{package_name}. Maven is slightly more complex—it checks Maven Central's search API because Maven coordinates include groupId and artifactId, requiring both components to uniquely identify a package.
The multi-ecosystem approach is crucial because dependency confusion isn't language-specific. Modern applications often use Python for backend services, JavaScript for frontend builds, and Java for data processing—each with their own dependency manifests. A comprehensive audit requires checking all of them. Confused handles this by auto-detecting file types based on filename patterns, allowing you to run it against an entire repository:
# Recursively scan all supported manifest files
find . -name "requirements.txt" -o -name "package.json" -o -name "pom.xml" | while read file; do
echo "Scanning $file"
./confused -l "$file"
done
The tool's simplicity is both its strength and a design constraint. It doesn't attempt to parse complex version constraints, understand dependency graphs, or resolve transitive dependencies. It operates on the first-order principle: if this package name could be registered publicly by an attacker, you have exposure. This makes it fast (most scans complete in seconds) and predictable, but it also means the tool requires human interpretation. A flagged package might be a legitimate risk, or it might be a typo that would fail your build anyway.
Gotcha
Confused's biggest limitation is its inability to distinguish between legitimately private packages and false positives. If you've made a typo in your requirements.txt—say, "requets" instead of "requests"—Confused will flag it as potentially vulnerable because "requets" doesn't exist on PyPI. But that's not a dependency confusion risk; it's just a broken build. Similarly, if you've deprecated an internal package and removed it from your codebase but forgot to update a manifest file, Confused will flag it even though there's no actual attack surface.
The namespace filtering helps but introduces its own complications. npm's scoped packages (like @mycompany/auth) are protected by npm's organization system—you can't publish to someone else's scope. But Confused doesn't understand npm's permission model, so without the -s @mycompany/* flag, it will flag all your scoped packages as vulnerable even though they're effectively protected. This means you need to carefully configure skip patterns for each ecosystem's namespace mechanisms: @org/* for npm, com.mycompany.* for Maven, organizational prefixes for PyPI. Get these patterns wrong, and you'll either drown in false positives or miss real vulnerabilities.
Another critical gap: Confused can't detect active exploitation. It only identifies namespace opportunities. If an attacker has already published a malicious package with your internal name to PyPI, Confused will see that the package exists publicly and mark it as safe with a [+]. You'd need a different approach—like comparing package maintainer identities or analyzing the package contents—to detect whether a public package matching your internal name is legitimate or malicious.
Verdict
Use Confused if you're managing applications with private package dependencies and need a lightweight, CI/CD-friendly tool for continuous monitoring of dependency confusion risks. It's especially valuable for organizations using internal package registries who want a quick audit mechanism without investing in commercial security platforms. The multi-ecosystem support makes it practical for polyglot codebases, and the zero-dependency binary integrates cleanly into existing pipelines. Skip it if you exclusively use public packages with no internal dependencies, if you need deep supply chain analysis beyond namespace checking, or if you've already implemented comprehensive namespace claiming (defensively registering your internal package names as placeholders on public registries). Also skip if you can't invest time in tuning namespace filters—the false positive rate without proper configuration makes the tool more noise than signal. This is a reconnaissance tool, not a complete solution; treat it as one component of a broader supply chain security strategy.