Smart Connections: Building Semantic Search for Obsidian Without Cloud Dependencies
Hook
A semantic search plugin that runs entirely in your note-taking app without sending a single byte to the cloud sounds impossible. Yet Smart Connections does exactly this with under 50KB of dependencies.
Context
Obsidian users face a paradox: the more notes you accumulate, the less useful your vault becomes. Traditional solutions include manual tagging, folder hierarchies, and keyword search—all of which break down at scale. Tags require discipline you won’t maintain. Folders force artificial taxonomies. Keyword search fails when you can’t remember the exact phrase you used six months ago.
AI embeddings promised a solution by enabling semantic search—finding notes by meaning rather than exact text matches. But existing tools required API keys, complex CLI setups, or external vector databases. They also meant sending your private notes to third-party servers. Smart Connections emerged as a privacy-first alternative: a plugin that generates embeddings locally, stores them alongside your vault, and surfaces related content automatically as you write. It’s semantic search without the surveillance.
Technical Insight
Smart Connections follows a dual-model architecture: a bundled local embedding model for zero-config operation, plus optional API integrations for users who want state-of-the-art accuracy. The local model uses a quantized transformer that’s been compressed to run in a JavaScript environment without heavy dependencies like TensorFlow or PyTorch. This is non-trivial—most embedding models assume a Python runtime with CUDA support.
The plugin maintains what it calls a ‘Smart Environment’—essentially an in-memory index of your vault with precomputed embeddings. When you open a note, Smart Connections tokenizes the content, generates a vector representation, and computes cosine similarity against all other notes. Results appear in a sidebar ranked by semantic relevance. The architecture avoids expensive re-computation by caching embeddings and only updating when note content changes.
Here’s how the plugin structures its core embedding flow:
class SmartEnvironment {
async load_embeddings() {
// Load cached embeddings from .smart-connections folder
const cache = await this.adapter.read(this.embeddings_file);
this.embeddings = JSON.parse(cache);
}
async get_nearest(source_embedding, opts = {}) {
const results = [];
for (const [key, item] of Object.entries(this.embeddings)) {
if (opts.exclude && opts.exclude.includes(key)) continue;
const similarity = this.cosine_similarity(
source_embedding,
item.vec
);
results.push({ key, similarity });
}
return results
.sort((a, b) => b.similarity - a.similarity)
.slice(0, opts.limit || 10);
}
cosine_similarity(vecA, vecB) {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
}
The codebase deliberately avoids heavyweight vector database abstractions. Embeddings are stored as JSON in a hidden .smart-connections folder within your vault. This makes the system auditable—you can inspect exactly what’s being stored—and portable, since embeddings travel with your vault if you sync via Git or Dropbox.
Performance optimization happens through chunking. Rather than embedding entire notes (which can exceed context windows), Smart Connections splits longer documents into semantic blocks—typically paragraphs or sections. Each chunk gets its own embedding, allowing the plugin to surface specific passages rather than just file-level matches. This granularity is crucial for research vaults where a single note might cover multiple unrelated topics.
The plugin also implements incremental indexing. On first run, it processes your entire vault, which can take minutes for thousands of notes. But subsequent updates only reprocess changed files, detected via Obsidian’s file modification events. This event-driven architecture keeps the index fresh without constant background work that would drain battery on mobile devices.
For users who want higher-quality embeddings, Smart Connections exposes an adapter pattern for API providers. You can swap the local model for OpenAI’s text-embedding-3-small, Cohere’s embed models, or any other provider that returns vector arrays. The adapter interface is intentionally simple—just a function that accepts text and returns a float array—making it easy to integrate new providers without forking the plugin.
Gotcha
Smart Connections’ local embedding model is fast and private, but it’s not magic. Accuracy lags behind cloud models like OpenAI’s text-embedding-3-large by a noticeable margin. In practice, this means you’ll occasionally see irrelevant notes surface in the Connections view, especially for abstract or nuanced queries. The model was trained on general web text, so domain-specific jargon in fields like medicine or law may not embed well.
The plugin also struggles with multilingual vaults. The bundled model primarily understands English, with degraded performance for other languages. If your notes mix languages—say, English with occasional German quotes—semantic search may miss connections that rely on those foreign phrases. API models like Cohere’s multilingual embeddings handle this better, but that reintroduces the cloud dependency.
Another limitation is the lack of metadata weighting. Smart Connections treats all text equally, but in reality, your note title is probably more semantically important than a random sentence in the middle. Some users work around this by manually duplicating important keywords, but the plugin doesn’t offer native support for boosting specific fields. Vector databases like Weaviate and Pinecone solve this with metadata filtering and hybrid search, but that level of sophistication would require architectural changes that conflict with Smart Connections’ minimalist philosophy.
Verdict
Use if: You have a large Obsidian vault (500+ notes) where rediscovery is painful, you value privacy enough to accept slightly lower embedding quality, or you want to experiment with semantic search without API costs and configuration overhead. It’s ideal for personal knowledge management, research archives, and anyone who writes extensively in Obsidian. The zero-setup experience makes it perfect for testing whether semantic search actually improves your workflow before committing to API subscriptions.
Skip if: You need production-grade embedding accuracy and don’t mind API costs, your vault is small enough that keyword search and manual linking still work fine, you primarily search for exact phrases rather than concepts, or you’re not already using Obsidian. Users who want programmatic access to embeddings outside the Obsidian UI should look at standalone solutions like txtai or LlamaIndex that offer more flexibility for custom integrations.