Smart Connections: How Obsidian Gets Semantic Search Without Breaking Mobile

Hook

Most AI-powered note tools force you to choose between privacy and capability. Smart Connections runs a transformer model in your browser, works offline on your phone, and processes your entire knowledge base without sending a single byte to the cloud.

Context

The Obsidian ecosystem exploded with AI plugins after ChatGPT's release, but nearly all of them hit the same wall: they either required API keys (leaking your notes to third parties), demanded complex local LLM setups with Ollama or Python environments (desktop-only, configuration hell), or simply didn't work offline. For users managing personal knowledge bases with sensitive research, journal entries, or proprietary information, this was a non-starter.

The fundamental tension was between semantic search capability and operational constraints. Vector embeddings—the technology that powers semantic similarity—traditionally required either cloud APIs or heavyweight local infrastructure. Smart Connections emerged from this gap with a radical premise: what if you could ship the embedding model as part of the plugin itself, using browser-native WebAssembly to run transformer models with zero configuration? This approach prioritizes the 80% use case—users with hundreds to thousands of notes who want "good enough" semantic search without compromising privacy or mobile access—over the 20% who need production-grade embedding quality.

Technical Insight

Smart Connections' architecture centers on TransformersJS, a JavaScript port of Hugging Face's transformers library that compiles models to WebAssembly and ONNX runtime. When you enable the plugin, it downloads a quantized version of a sentence-transformer model (typically all-MiniLM-L6-v2, compressed to ~25MB) and caches it locally. No Python, no conda environments, no model management—just JavaScript running in Obsidian's Electron context.

The embedding pipeline operates at block-level granularity, not file-level. When you open a note, Smart Connections chunks it into semantic blocks (paragraphs, list items, headings with their content). Each block gets vectorized independently and stored in a local .smart-env folder as JSON:

// Simplified embedding storage structure
{
  "path": "Projects/AI Research.md",
  "blocks": [
    {
      "text": "Transformer models use self-attention mechanisms...",
      "embedding": [0.023, -0.891, 0.445, ...], // 384-dimensional vector
      "offset": 156,
      "length": 89
    }
  ],
  "mtime": 1704067200000
}

When you query or view a note, the plugin computes cosine similarity between the current block's embedding and all other blocks in your vault. The similarity search is brute-force—no HNSW, FAISS, or advanced indexing—because for vaults under ~5,000 notes, linear search through pre-computed vectors is fast enough (sub-100ms on modern hardware). The codebase deliberately avoids external vector database dependencies to keep the bundle small and auditable.

The chat interface leverages a two-stage retrieval pattern. When you ask a question, Smart Connections first embeds your query, retrieves the top-k most relevant blocks (default k=20), then constructs a context window by expanding those blocks to include surrounding paragraphs. This context gets injected into the system prompt:

// Conceptual chat context construction
const relevantBlocks = await this.semanticSearch(userQuery, { limit: 20 });
const contextWindow = relevantBlocks.map(block => {
  return `From "${block.path}":\n${block.expandedText}`;
}).join('\n\n---\n\n');

const systemPrompt = `You are an assistant with access to the user's notes. Answer based on this context:\n\n${contextWindow}`;

For API-based chat (Claude, GPT-4, Gemini), the plugin streams responses using standard SDK clients. The local chat option uses a separate lightweight model (like Phi-3 or Gemini Nano when available via Chrome's built-in AI APIs), though this feature is experimental and varies by platform.

The mobile optimization is crucial. By using TransformersJS instead of native bindings or Python subprocesses, the same codebase runs identically on iOS and Android. Obsidian Mobile uses the same Electron-like environment (Capacitor), so the WASM/ONNX runtime works without modification. The plugin lazy-loads the model only when needed and uses Obsidian's worker thread APIs to keep embedding computation off the main thread, preventing UI jank during vault indexing.

The subscription model (Smart Connect Pro) unlocks features like inline suggestions—think Copilot but for your notes—which continuously monitors your cursor position and suggests related blocks as you type. This requires more aggressive caching strategies and real-time embedding of partial sentences, pushing the performance envelope of browser-based transformers.

Gotcha

The elephant in the room is embedding quality. The quantized all-MiniLM-L6-v2 model that ships with Smart Connections produces 384-dimensional vectors, while state-of-the-art models like OpenAI's text-embedding-3-large output 3,072 dimensions with significantly better semantic understanding. In practice, this means Smart Connections will miss nuanced conceptual relationships—if you're researching quantum entanglement and mention "spooky action at a distance" in one note and "EPR paradox" in another, a cloud-based embedding model would likely connect them, but the local model might not. The trade-off is explicit: zero configuration and privacy versus cutting-edge retrieval precision.

Performance degradation on large vaults is real. Around 8,000-10,000 notes, users report noticeable lag during searches (500ms-1s) because the plugin must compute cosine similarity against hundreds of thousands of block embeddings. The lack of approximate nearest neighbor indexing (like HNSW or product quantization) means scaling is linear with vault size. For researchers with decade-long note archives approaching 20,000+ files, Smart Connections becomes sluggish enough to disrupt flow state. There's no incremental indexing either—if you modify a note, the entire file gets re-chunked and re-embedded, which can feel wasteful for small edits.

The Pro subscription paywall for inline suggestions feels awkward in an open-source context. While the core embedding and search features are source-available and will always be free, the most compelling use case—proactive note surfacing while you write—requires $60/year. This creates a psychological friction where users invest time learning the tool, get hooked on the vision, then hit a paywall for the feature they actually want. It's a sustainable business model but muddies the "open-source" positioning.

Verdict

Use if: You have 500-5,000 notes in Obsidian and want semantic discovery without configuring Ollama, managing API costs, or sending your personal knowledge base to OpenAI. The mobile offline capability alone justifies it for researchers, writers, and PKM practitioners who think on the go. The zero-setup experience is unmatched—install, wait 30 seconds for model download, done. Also use if you're privacy-focused enough to accept "good enough" embeddings over state-of-the-art but cloud-dependent alternatives. Skip if: You need production-grade semantic search for critical research where missing connections costs you—just use a cloud embedding API with Pinecone or build a proper RAG pipeline. Also skip if your vault is under 200 notes (manual linking is faster), over 10,000 notes (performance craters), or if the Pro subscription for inline features bothers you philosophically. Finally, skip if you're exclusively desktop-based with technical chops—running Ollama + a dedicated vector DB gives you more control and better quality.

Smart Connections: How Obsidian Gets Semantic Search Without Breaking Mobile

Smart Connections: How Obsidian Gets Semantic Search Without Breaking Mobile

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Smart Connections: How Obsidian Gets Semantic Search Without Breaking Mobile

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]