Back to Articles

PageIndex: Why This RAG System Ditched Vector Databases for LLM Reasoning

[ View on GitHub ]

PageIndex: Why This RAG System Ditched Vector Databases for LLM Reasoning

Hook

What if your RAG system’s biggest problem isn’t the embeddings—it’s the assumption that ‘similar’ and ‘relevant’ mean the same thing? PageIndex throws out vector databases entirely and lets LLMs reason their way through documents instead.

Context

Traditional RAG systems follow a predictable pattern: chunk documents into overlapping segments, generate embeddings, store them in a vector database, then retrieve the ‘most similar’ chunks at query time. This approach has powered countless AI applications, but it has a fundamental flaw. Semantic similarity doesn’t equal relevance. When you ask ‘What was the revenue growth in Q3?’, a vector database might return chunks mentioning ‘Q3’ and ‘revenue’ without understanding that you need the specific calculation, not just paragraphs where those words appear together. The chunking process makes this worse—arbitrary splits break tables, separate context from data, and force you to tune chunk sizes and overlap parameters that feel more like dark magic than engineering.

PageIndex emerged from VectifyAI’s work on financial document analysis, where these limitations became painfully obvious. Financial analysts don’t scan documents for similar text—they navigate to specific sections, understand document structure, and reason about where information should be. PageIndex replicates this human approach: it parses documents into hierarchical tree structures that mirror natural organization (sections, subsections, tables), then uses LLM agents to navigate these trees through multi-step reasoning. Instead of asking ‘which embeddings are closest?’, it asks ‘where would an expert look for this information?’ The result is a system that achieves 98.7% accuracy on FinanceBench, a benchmark designed for complex financial question answering where traditional RAG systems struggle.

Technical Insight

Tree Structure

Extracts hierarchy

Stores nodes with metadata

Provides tree context

Asks: Which section?

Navigates tree

reasoning_depth iterations

Locates relevant nodes

Returns answer + trace

PDF/Text/Images

Document Parser

Semantic Tree Structure

Document Index

User Query

Reasoning Retriever

LLM Agent

Query Result

Answer with Metadata

Root Node

Section Nodes

Subsection Nodes

System architecture — auto-generated

PageIndex’s architecture centers on two components: the Document Parser that builds semantic trees, and the Reasoning Engine that navigates them. The parser processes documents (PDFs, text files, or even page images) and extracts hierarchical structure. Unlike traditional chunkers that slice text every N tokens, PageIndex identifies natural boundaries—headings, sections, tables—and preserves them as tree nodes. Each node contains metadata: title, page numbers, parent-child relationships, and content summaries. For vision-based parsing, it can process PDF page images directly without OCR, allowing the LLM to reason over visual layouts and tables that OCR often mangles.

Here’s what retrieval looks like in practice:

from pageindex import DocumentIndex, ReasoningRetriever

# Build the index (one-time operation)
index = DocumentIndex.from_pdf(
    "annual_report.pdf",
    mode="vision",  # OCR-free processing
    llm="gpt-4-turbo"
)

# The index is a tree structure:
# Root
# ├── Financial Highlights (pages 1-3)
# │   ├── Revenue Summary (page 2)
# │   └── Expense Breakdown (page 3)
# ├── Operations Review (pages 4-10)
# └── Risk Factors (pages 11-15)

# Query with reasoning-based retrieval
retriever = ReasoningRetriever(index, llm="gpt-4-turbo")
result = retriever.query(
    "What was the primary driver of increased operating expenses?",
    reasoning_depth=3  # Allow 3 levels of tree navigation
)

print(result.answer)
print(f"Found in: {result.sections}")  # e.g., ["Financial Highlights > Expense Breakdown"]
print(f"Pages: {result.page_numbers}")  # Full traceability
print(f"Reasoning trace: {result.reasoning_steps}")  # See the agent's decision process

The magic happens during retrieval. When you submit a query, the ReasoningRetriever doesn’t compute embeddings. Instead, it presents the LLM with the tree structure and asks: ‘Which section should we explore?’ The LLM might respond: ‘The query asks about operating expenses, so we should look in Financial Highlights > Expense Breakdown.’ The system retrieves that section’s content, and the LLM can request deeper navigation: ‘I see mentions of R&D spending increases. Let me check the Operations Review > R&D Initiatives section for details.’ This multi-step reasoning continues until the LLM determines it has sufficient context, then generates the final answer.

The reasoning trace is fully transparent. You can inspect every navigation decision, which sections were considered and rejected, and why the final sections were chosen. This is transformative for high-stakes applications where ‘the embedding said so’ isn’t an acceptable explanation.

For documents without clear hierarchical structure, PageIndex includes a hybrid mode that creates semantic sections by having the LLM analyze content and propose logical divisions:

index = DocumentIndex.from_text(
    long_article_text,
    structure_mode="llm_inferred",  # Let LLM propose structure
    section_prompt="Organize this into logical sections for a technical reader"
)

The system also handles multi-document scenarios elegantly. Traditional RAG systems throw all chunks into one vector space, making cross-document reasoning difficult. PageIndex maintains separate trees per document and uses a meta-reasoning step to decide which documents to search:

multi_doc_index = DocumentIndex.from_directory(
    "./financial_docs/",
    index_strategy="per_document"
)

result = retriever.query(
    "Compare revenue growth between Q1 and Q3 reports",
    cross_document=True  # Enable cross-document reasoning
)
# The LLM first identifies relevant documents, then navigates each

Performance-wise, PageIndex trades latency for accuracy. Building an index requires LLM calls to generate section summaries and structure metadata—expect 30-60 seconds for a 50-page document with GPT-4. Retrieval involves 2-5 LLM calls depending on reasoning depth, adding 3-10 seconds compared to sub-second vector lookups. For applications where getting the right answer matters more than instant response, this is a worthy trade. You can also cache indexes and use cheaper models for simple queries while reserving frontier models for complex reasoning.

Gotcha

The LLM-first approach creates real cost and latency challenges. Every retrieval operation burns tokens—potentially thousands if your document trees are large or queries require deep navigation. At $0.01 per 1K tokens (GPT-4 pricing), a query that explores 5 sections with 2K tokens each costs $0.10. Scale that to thousands of daily queries and you’re looking at serious API bills. Traditional vector retrieval costs pennies per million queries by comparison. Latency compounds the problem: users expecting Google-speed responses will find 5-10 second retrieval times frustrating. You can optimize with smaller models (GPT-3.5-turbo, Claude Haiku), but then reasoning quality suffers—the whole value proposition depends on sophisticated LLM capabilities.

PageIndex also inherits all the reliability issues of agentic systems. LLMs sometimes hallucinate during tree navigation, claiming sections exist that don’t, or getting stuck in reasoning loops where they revisit the same nodes. The system includes guardrails (maximum reasoning steps, visited node tracking), but these feel like band-aids on LLM non-determinism. Documents without clear structure pose problems too. If you’re indexing social media posts, chat logs, or stream-of-consciousness writing, there’s no hierarchical tree to extract. The LLM-inferred structure mode helps, but results are inconsistent. Finally, queries requiring information synthesis across many disconnected sections can overwhelm context windows. ‘Summarize all risk factors mentioned anywhere in this 200-page document’ might require retrieving dozens of sections, blowing past token limits.

Verdict

Use if: You’re building applications where accuracy and explainability trump speed and cost—financial analysis, legal document review, medical literature research, compliance checking. PageIndex excels with professional documents that have clear structure (reports, manuals, academic papers) and queries requiring domain expertise to identify relevant sections. The transparent reasoning traces are essential when humans need to verify AI decisions, and the 98.7% FinanceBench accuracy suggests it’s genuinely better than vector RAG for complex analytical tasks. Also use it if you’re already paying for frontier LLM APIs and can absorb the extra token costs, or if your documents have complex visual layouts that OCR destroys. Skip if: You need sub-second response times, are cost-constrained (the LLM calls add up fast), working with unstructured text without clear sections, or building consumer-facing features where traditional RAG’s ‘good enough’ accuracy at 10x lower cost is the smarter choice. Also skip if your queries are simple keyword lookups or you’re using smaller models that can’t reason reliably—PageIndex’s value disappears without strong LLM capabilities backing it.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/data-knowledge/vectifyai-pageindex.svg)](https://starlog.is/api/badge-click/data-knowledge/vectifyai-pageindex)