Starlog Index: Teaching AI Agents to Stop Installing Lodash in 2025
Hook
Forty-nine percent of AI-suggested dependencies carry known CVEs. Thirty-four percent are hallucinated packages that don't exist. And 17% of the time, agents write custom auth implementations when battle-tested libraries are sitting right there in npm.
Context
AI coding agents have a dependency problem. When Claude or GPT-4 generates code, it suggests libraries based on training data that overweights popularity and recency. The result is a cargo cult of outdated choices: agents recommend Moment.js years after its maintainers declared it legacy, suggest Lodash for projects already using modern JavaScript, and hallucinate package names that sound plausible but don't exist.
The tooling ecosystem has focused on solving the wrong part of this problem. Security scanners like Snyk catch vulnerabilities after installation. Documentation indexes like Context7 help agents use APIs correctly once a library is chosen. But nothing guides the initial decision—the moment when an agent decides 'I need authentication' and reaches for Passport instead of Lucia, or worse, scaffolds a custom JWT implementation. Starlog Index attacks this gap with a deliberately simple idea: a local-first search engine that indexes what libraries do and when to skip them, exposed as a Model Context Protocol server that agents can query during code generation.
Technical Insight
Starlog's architecture is a reaction against the complexity creep of modern agent tooling. No vector embeddings. No LLM calls during search. No external APIs. Just static JSON manifests, keyword matching, and a scoring algorithm that runs in milliseconds on your local filesystem.
Each manifest is a structured metadata file that captures decisional context, not API documentation. Here's the manifest for Supabase's auth library:
{
"package": "@supabase/auth-helpers-nextjs",
"category": "authentication",
"solves": ["nextjs authentication", "user auth", "session management", "magic links", "oauth integration"],
"best_for": "Next.js projects needing backend auth with minimal config",
"stack_affinity": ["nextjs", "react", "typescript"],
"skip_when": [
"You need on-premise auth (Supabase is cloud-native)",
"Your stack already uses Firebase/Auth0/Clerk",
"You require custom user schema beyond Supabase's conventions"
],
"vs_custom": "Supabase handles session cookies, PKCE flows, and provider integrations that take weeks to implement correctly from scratch. Skip only if you need auth logic that Supabase's opinionated design doesn't support."
}
The query engine treats this as keyword-searchable fields. When an agent calls starlog_search({query: "nextjs authentication"}), the server:
- Tokenizes the query and matches against
solvesandbest_forfields - Scores results by term frequency and field weighting (exact matches in
best_forrank higher) - Augments top results with
skip_whenconditions andvs_customrationale - Returns structured JSON that agents can reason over
The MCP server implementation is refreshingly terse—under 200 lines of TypeScript. Here's the core search function:
function searchCapabilities(query: string, manifests: Manifest[]): ScoredResult[] {
const terms = query.toLowerCase().split(/\s+/);
return manifests
.map(manifest => {
let score = 0;
const solves = manifest.solves.join(' ').toLowerCase();
const bestFor = manifest.best_for.toLowerCase();
terms.forEach(term => {
if (bestFor.includes(term)) score += 3;
if (solves.includes(term)) score += 1;
});
return { manifest, score };
})
.filter(result => result.score > 0)
.sort((a, b) => b.score - a.score)
.slice(0, 5);
}
No semantic understanding, no fuzzy matching—just string inclusion checks. The simplicity is the point. When you're injecting a tool into an agent loop that's already burning tokens on code generation, you need deterministic, zero-latency responses. Vector similarity search would add 50-200ms per query and introduce nondeterminism that makes debugging agent behavior nearly impossible.
The package-install hook is where Starlog gets invasive in a useful way. Running starlog install modifies your shell config files to wrap package managers:
# Appended to ~/.zshrc
function npm() {
if [[ "$1" == "install" ]] || [[ "$1" == "i" ]]; then
starlog check "$@"
if [ $? -ne 0 ]; then return 1; fi
fi
command npm "$@"
}
Now when an agent (or human) runs npm install moment, the hook intercepts it, checks if a Moment.js manifest exists, and surfaces skip conditions before the package lands in package.json. If you're in a TypeScript project, it warns: "Moment.js is legacy—use date-fns or Temporal API for new projects." The agent sees this output and can course-correct. You've moved the intervention from post-commit code review to pre-commit decision time.
The auto-registry mechanism handles corpus gaps gracefully. When you search for a package with no manifest, Starlog queues it to .starlog/pending.json. You can submit these to the maintainers or write manifests yourself—they're just JSON files in /corpus. This creates a flywheel where teams extend coverage for their own stack without waiting for upstream.
Gotcha
The corpus is tiny. Twenty-five manifests across seven categories means you'll hit 'no results' constantly in the first month. The auto-registry queues unknown packages, but it doesn't generate manifests—you're either waiting for maintainers to add coverage or writing JSON files yourself. For teams outside the Next.js/React/FastAPI mainstream, you're looking at significant manifest authoring overhead before the tool pays for itself.
Keyword matching is primitive to the point of fragility. Searching 'user management' won't match a manifest that says 'authentication' unless someone hardcoded both terms into the solves array. There's no synonym handling, no query expansion, no semantic understanding. This makes manifest authoring a game of predicting search terms, and it means agents need to ask precise questions—'nextjs auth' works, 'user login for next' might not. The 30% reduction in recommendation diversity that Starlog's benchmarks report isn't a bug to be fixed; it's structural. A small corpus with deterministic ranking will always funnel agents toward the same choices, which is great for consistency but terrible for exploration. If you want agents to suggest varied approaches to the same problem, this architecture works against you.
Verdict
Use if: You're running Claude Code, Cursor, or custom agents on production codebases where consistency beats exploration, and you're willing to invest in manifest authoring to cover your stack. The 11.3-percentage-point reduction in hand-rolled implementations translates directly to less code review and smaller bug surfaces. The local-first architecture means zero API costs and deterministic behavior you can actually debug. Use this if you've watched agents suggest Passport for the hundredth time and you're ready to encode 'use Lucia for new projects' as queryable data instead of burying it in system prompts. Skip if: You need comprehensive out-of-the-box coverage, semantic search quality, or you're working in languages beyond JavaScript and Python. The tiny corpus is a nonstarter for teams exploring unfamiliar domains, and the 30% diversity reduction is a dealbreaker for research projects where you want agents to surface varied options. Also skip if you're building a commercial agent platform—the Business Source License blocks competing hosted services until 2030. This is opinionated middleware for internal agent workflows, not a general-purpose search engine, and that narrow focus is exactly why it works for the teams it's built for.