Building Question-Answering Systems Over Notion: A RAG Architecture Teardown
Hook
Before LangChain became the framework powering thousands of AI applications, its creator built this 200-star repository as a proof-of-concept—and accidentally created one of the clearest examples of RAG architecture you'll find on GitHub.
Context
In late 2022, teams faced a common problem: mountains of documentation locked in Notion databases, searchable only through keyword matching. You could manually read through pages of meeting notes and wiki entries, or resign yourself to Notion's built-in search missing the context you needed. Natural language interfaces existed, but building one required orchestrating multiple AI components: document chunking, embedding generation, vector storage, retrieval logic, and LLM integration.
Harrison Chase (hwchase17) created notion-qa as a practical demonstration of LangChain's capabilities—the framework he was simultaneously developing. Rather than theoretical examples, this project tackles a real-world use case: enabling semantic search and question-answering over exported Notion content. It uses the Blendle Employee Handbook as demonstration data, showing how internal knowledge bases could become conversationally accessible. The repository emerged during RAG's (Retrieval-Augmented Generation) early adoption phase, before the pattern became ubiquitous in production AI systems.
Technical Insight
The architecture follows a classic three-stage RAG pipeline: export, ingest, and query. Understanding each stage reveals fundamental patterns that appear in virtually every modern document Q&A system.
The ingestion phase (ingest.py) handles the heavy lifting of transforming raw documents into queryable knowledge. After exporting Notion pages as Markdown files, the script reads them, chunks the text, generates embeddings, and stores vectors in FAISS (Facebook AI Similarity Search). Here's the core ingestion logic:
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
# Load documents from the Notion export directory
loader = NotionDirectoryLoader("Notion_DB")
documents = loader.load()
# Split into chunks - critical for maintaining context windows
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
# Generate embeddings and create vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(texts, embeddings)
vectorstore.save_local("faiss_index")
The chunk_size parameter deserves attention. At 1000 characters, it balances two competing needs: chunks must be small enough that retrieved content stays focused, but large enough to preserve contextual meaning. Too small (say, 200 characters) and you retrieve sentence fragments that lack context. Too large (5000+ characters) and you dilute relevant information with noise, wasting tokens in your LLM context window.
The query phase implements semantic retrieval followed by answer synthesis. When a user asks "What is the vacation policy?", the system converts that question into an embedding vector, performs similarity search against the FAISS index, retrieves the top-k most relevant document chunks, then passes those chunks as context to OpenAI's API:
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI
# Load the persisted vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.load_local("faiss_index", embeddings)
# Create a retrieval chain
qa_chain = RetrievalQA.from_chain_type(
llm=OpenAI(temperature=0),
chain_type="stuff",
retriever=vectorstore.as_retriever()
)
# Query the system
response = qa_chain.run("What is the vacation policy?")
The chain_type="stuff" parameter reveals an important architectural decision. LangChain offers multiple chain types (stuff, map_reduce, refine), but "stuff" simply concatenates all retrieved documents into a single prompt. It's the simplest approach—and often the best for datasets like Notion exports where relevant information typically appears in one or two chunks. More sophisticated chain types like map_reduce handle cases where answers require synthesizing information from dozens of documents, but they introduce latency and complexity.
The temperature=0 setting ensures deterministic outputs. For factual Q&A over internal documentation, you want consistent, grounded responses rather than creative variations. Bump temperature to 0.7 and the model might embellish or hallucinate details not present in your Notion pages.
The repository includes a Streamlit interface (main.py) that demonstrates production-ready UI patterns. It maintains conversation history, displays source documents alongside answers, and provides a text input for follow-up questions. This transparency—showing which Notion pages informed each answer—is crucial for enterprise adoption. Users need to verify AI-generated answers against original sources, especially for policy or compliance questions.
Gotcha
The manual export process creates a fundamental limitation: your Q&A system operates on a static snapshot of Notion data. When someone updates the vacation policy or adds new documentation, those changes remain invisible until you re-export, re-ingest, and redeploy. There's no webhook integration, no sync mechanism, no incremental updates. For rapidly changing documentation, this quickly becomes unmanageable.
The OpenAI API dependency introduces cost and latency concerns that aren't immediately obvious. Every query triggers two API calls: one for embedding the question (ada-002 model) and another for answer generation (davinci or gpt-3.5-turbo). With ada-002 at $0.0001 per 1K tokens and GPT-3.5-turbo at $0.002 per 1K tokens, costs mount quickly at scale. A team running 1000 queries per day might spend $50-100 monthly just on API calls. The codebase offers no local LLM fallback or caching strategy to mitigate these costs. Additionally, since the code dates to 2022-2023, it uses older LangChain APIs that may require updates to work with current versions of the framework.
Verdict
Use if: You're learning RAG architecture and want a minimal, comprehensible example without production complexity. The codebase is small enough to read in one sitting, making it perfect for understanding how document chunking, vector storage, and retrieval chains fit together. It's also useful if you have a one-time need to make a static Notion export queryable and don't mind the manual workflow. Skip if: You need production-grade Notion integration with automatic syncing, cost optimization, or multi-tenant support. The manual export process and dated dependencies make this impractical for real products. Modern alternatives like Danswer or official LangChain templates provide better starting points with active maintenance, while direct Notion API integration (now well-documented) enables building custom solutions without the export bottleneck. Treat this repository as a teaching tool that crystallizes RAG fundamentals, not a foundation for shipping code.