Open Notebook: Building a Self-Hosted NotebookLM Clone with Multi-Provider AI
Hook
Google's NotebookLM can generate podcasts from your research notes, but only with two speakers and only on Google's servers. Open Notebook lets you create unlimited-speaker AI podcasts while keeping everything local—and it's just one of several features Google's version lacks.
Context
When Google launched NotebookLM in 2023, it showed how Large Language Models could transform research workflows. Upload documents, get cited answers, generate podcast summaries—all powered by Gemini. But researchers and companies with sensitive data hit immediate walls: no self-hosting, no model choice, limited customization, and everything flowing through Google's infrastructure.
Open Notebook emerged as the open-source response to these constraints. Built by developers who needed NotebookLM's document intelligence without the privacy trade-offs, it reimagines the research assistant as a self-hosted, multi-provider system. Instead of being locked to Gemini, it abstracts 18+ AI providers—OpenAI, Anthropic, Ollama, Azure, Vertex AI—into a unified interface. Instead of two-speaker podcasts, you can orchestrate multi-person dialogues. Instead of trusting Google with your documents, everything runs locally with encrypted credentials. It's not just a clone; it's a platform for building custom research workflows on your own infrastructure.
Technical Insight
The architectural heart of Open Notebook is Esperanto, a custom abstraction layer that normalizes disparate AI provider APIs into consistent interfaces for LLM chat, embeddings, speech-to-text, and text-to-speech operations. This isn't syntactic sugar—it handles provider-specific authentication flows, streaming formats, error codes, and capability differences. Here's how you'd configure multiple providers:
// Configuration supports 18+ providers with unified schema
const providers = {
llm: {
openai: { apiKey: process.env.OPENAI_KEY, model: 'gpt-4' },
anthropic: { apiKey: process.env.ANTHROPIC_KEY, model: 'claude-3-opus' },
ollama: { baseUrl: 'http://localhost:11434', model: 'llama2' }
},
embedding: {
openai: { model: 'text-embedding-3-large' },
voyage: { apiKey: process.env.VOYAGE_KEY, model: 'voyage-2' }
},
tts: {
elevenlabs: { apiKey: process.env.ELEVEN_KEY },
openai: { voice: 'alloy' }
}
};
The RAG pipeline processes documents through a configurable chunking strategy. PDFs get parsed with structure awareness (preserving headings and paragraphs), text files use recursive character splitting with semantic boundary detection, and audio files are transcribed with timestamp metadata. Each chunk gets embedded via your chosen provider and stored in SurrealDB with source lineage:
# Simplified chunking and embedding flow
async def process_source(source_id: str, content: bytes, source_type: str):
# Parse based on type
if source_type == 'pdf':
chunks = pdf_parser.extract_with_structure(content)
elif source_type == 'audio':
transcript = await stt_provider.transcribe(content)
chunks = chunk_with_timestamps(transcript)
else:
chunks = recursive_character_split(content, chunk_size=1000, overlap=200)
# Embed and store with metadata
for chunk in chunks:
embedding = await embedding_provider.embed(chunk.text)
await db.create('chunk', {
'source_id': source_id,
'text': chunk.text,
'embedding': embedding,
'metadata': {
'page': chunk.page_number,
'timestamp': chunk.timestamp,
'type': source_type
}
})
The podcast generation system is where Open Notebook differentiates itself. Instead of fixed two-speaker dialogues, it creates structured multi-turn conversations by prompting an LLM to generate scripts with speaker annotations, then synthesizes each turn with provider-specific voices. You define Episode Profiles that control speaker personalities, conversation dynamics, and narrative structure:
# Episode profile defines podcast structure
episode_profile = {
'speakers': [
{'name': 'Host', 'personality': 'curious, asks clarifying questions', 'voice_id': 'voice_1'},
{'name': 'Expert', 'personality': 'technical, detailed explanations', 'voice_id': 'voice_2'},
{'name': 'Skeptic', 'personality': 'challenges assumptions', 'voice_id': 'voice_3'}
],
'structure': 'intro -> deep_dive -> debate -> conclusion',
'duration_target': '15-20 minutes'
}
# LLM generates structured dialogue
script_prompt = f"""
Create a {episode_profile['duration_target']} podcast script with {len(episode_profile['speakers'])} speakers.
Base it on this research content: {notebook_content}
Follow this structure: {episode_profile['structure']}
Speaker personalities: {json.dumps(episode_profile['speakers'])}
Return JSON with format: [{{'speaker': 'Host', 'text': '...'}}]
"""
script = await llm_provider.generate(script_prompt)
audio_segments = []
for turn in script:
speaker = next(s for s in episode_profile['speakers'] if s['name'] == turn['speaker'])
audio = await tts_provider.synthesize(turn['text'], voice_id=speaker['voice_id'])
audio_segments.append(audio)
final_podcast = mix_audio(audio_segments)
SurrealDB serves as the schemaless datastore, chosen for its ability to handle heterogeneous document types while supporting graph relationships between notebooks, sources, notes, and chunks. The RocksDB backend provides local-first durability without PostgreSQL overhead, though this comes with trade-offs in ecosystem maturity. The REST API exposes the full data model, and the MCP (Model Context Protocol) integration lets Claude Desktop and VS Code query notebooks as external knowledge bases—essentially turning your research collection into a queryable context source for other AI applications.
The transformation system deserves mention: users create reusable prompt templates that process source material into derived notes. Each transformation maintains lineage tracking, so you can trace insights back through the transformation chain to original sources. This is crucial for research provenance—knowing not just where a fact came from, but how it was processed and interpreted.
Gotcha
Citations are functional but crude. While Open Notebook tracks chunk sources and surfaces page numbers in responses, there's no proper academic formatting, no citation graph analysis, and no detection when the same claim appears across sources. Google's NotebookLM still wins significantly on research integrity features—their source highlighting and direct quote extraction are far more sophisticated.
SurrealDB is a architectural risk you need to acknowledge. It's a young database (v2.0 was recent) with a small ecosystem. Backup strategies are unclear, migration paths to mature databases are nonexistent, and scaling beyond single-node is unproven. If SurrealDB hits a breaking change or you need multi-region deployment, you're stuck rewriting significant chunks of the system. For personal use or small teams this is manageable, but production deployments should seriously consider the maintenance burden. There are also zero collaborative features—no real-time sync, no multi-user permissions, no conflict resolution. Every user needs their own instance, which makes team research workflows painful.
Verdict
Use Open Notebook if you're handling sensitive research that can't touch Google's servers, need flexibility to swap between OpenAI, Anthropic, and local Ollama models, or want podcast generation with more than two speakers. The Esperanto abstraction makes multi-provider workflows trivial, and the MCP integration is genuinely clever for Claude Desktop power users. It's ideal for security teams building private knowledge bases, solo researchers who prioritize privacy, or developers who want a hackable RAG platform they can extend via REST APIs. Skip if you need production-grade collaborative features, academic-quality citations with proper source graphs, or aren't comfortable debugging Docker networking and database issues. If your workflow depends on real-time notebook sharing with colleagues or you want zero infrastructure management, just use Google's NotebookLM—it's free, polished, and works immediately. The two-speaker podcast limitation is real but the citation quality and UX are significantly better.