Frfr: Why Pre-Extracting Facts Beats Retrieval for High-Stakes Document Q&A
Hook
What if the answer to RAG hallucination isn't better retrieval algorithms, but skipping retrieval entirely?
Context
Ask a retrieval-augmented generation system about compliance requirements buried in a 300-page security audit, and you're gambling. Maybe it finds the right chunk. Maybe semantic search ranks a superficially similar paragraph higher. Maybe the LLM confidently synthesizes an answer from irrelevant context. For security auditors, legal teams, and compliance officers, "the model probably got it right" isn't acceptable risk management.
Traditional RAG architectures optimize for speed: chunk documents, generate embeddings, retrieve top-k matches at query time, stuff them into an LLM prompt. This works beautifully for customer support chatbots where occasional mistakes are forgiven. It fails catastrophically when someone makes a procurement decision based on a hallucinated compliance claim. Frfr inverts the entire workflow—spend compute upfront to extract every verifiable fact from a document once, then answer queries by filtering that pre-verified knowledge base. You trade query-time speed for guarantee that every answer points to explicit source text that a human can audit.
Technical Insight
Frfr's architecture rests on three foundational bets: facts are cheaper to verify once than retrieve correctly a thousand times, filesystems are underrated databases for single-user workflows, and WebAssembly can eliminate dependency hell.
The extraction pipeline starts with semantic chunking. Unlike naive character-count splitting that decapitates sentences mid-thought, Frfr's PDF parser (running entirely in WebAssembly via go-pdfium) respects paragraph boundaries. Each chunk gets sent to Claude with a structured prompt demanding facts, evidence quotes, specificity scores, and source citations. The Go backend parallelizes this across 20 workers:
// Simplified from fact extraction logic
type Fact struct {
Statement string `json:"statement"`
Evidence []string `json:"evidence"`
SourceChunks []int `json:"source_chunks"`
SpecificityScore float64 `json:"specificity_score"`
ControlFamily string `json:"control_family"`
// ... 3 more metadata fields
}
func extractFacts(chunks []Chunk, workers int) []Fact {
jobs := make(chan Chunk, len(chunks))
results := make(chan []Fact, len(chunks))
for w := 0; w < workers; w++ {
go func() {
for chunk := range jobs {
facts := callClaudeExtract(chunk)
results <- filterBySpecificity(facts, 0.7)
}
}()
}
// ... coordinate collection
}
Each extracted fact is a JSON object with eight metadata fields. The evidence array contains verbatim quotes from source text. The source_chunks array maps back to specific page ranges. The specificity_score filters vague extractions—"The system should be secure" scores low and gets dropped; "TLS 1.3 must be enforced for all external API endpoints per section 4.2.1" scores high and persists. Claude validates these during extraction, not during query time when you're waiting for an answer.
The storage model is deliberately primitive: each session becomes a directory tree with chunks/, facts/, and summaries/ subdirectories containing JSON files. No PostgreSQL schemas, no MongoDB indexes, just files you can grep, git commit, or inspect in a text editor. When querying, the system batches 150 facts at a time into parallel Claude API calls:
func queryFacts(question string, facts []Fact) <-chan QueryResult {
resultStream := make(chan QueryResult)
batchSize := 150
go func() {
defer close(resultStream)
for i := 0; i < len(facts); i += batchSize {
batch := facts[i:min(i+batchSize, len(facts))]
// Parallel SSE streaming to frontend
go func(b []Fact) {
answer := callClaudeQuery(question, b)
resultStream <- QueryResult{
Answer: answer.Text,
Citations: answer.FactIDs,
Confidence: answer.Score,
}
}(batch)
}
}()
return resultStream
}
Server-sent events stream progress to the React frontend as each batch completes. Click a citation, and the UI reconstructs the original chunk text with evidence quotes highlighted—not because it embedded the text in the fact (storage explosion), but because it maintained the chunk-to-fact mapping.
The WebAssembly PDF parsing deserves special attention. Most Go PDF libraries either wrap C dependencies (CGO cross-compilation nightmares) or implement half-broken pure-Go parsers. Frfr compiles go-pdfium to WebAssembly, runs it in a headless browser context, and extracts text without a single native dependency. The Electron build script compiles the Go backend to platform-specific binaries (darwin-arm64, linux-x64, etc.) and bundles them with the Vite-built frontend. Double-click the app, and it spawns the Go HTTP server as a child process—no "install Go runtime" prerequisite for end users.
The architectural gamble is that Claude's context window and batch processing can handle realistic document sizes. A 300-page security audit might yield 2,000 facts. At 150 facts per batch, that's 14 parallel Claude calls per query. With SSE streaming, the first batch returns answers in ~3 seconds while others trickle in. You're optimizing for "thorough within 30 seconds" rather than "instant but unreliable."
Gotcha
The filesystem-as-database design collapses under collaboration. Two people can't query the same session simultaneously without race conditions. No locking, no transactions, no merge conflict resolution. This is fundamentally a single-user tool—fine for an auditor working solo, unacceptable for a team triaging findings together.
The 150-fact batch size is a magic number tuned for Claude's rate limits and context window circa 2024. Process a 50-page document and you're batching 300 facts into two calls where one would suffice, burning API quota. Process a 2,000-page regulatory framework and you might hit rate limits or timeout before all batches complete. There's no adaptive batching logic that adjusts to document size or API performance. Incremental updates are nonexistent—edit one paragraph in a PDF, and you re-extract every fact from scratch. The system has no diff logic to detect unchanged chunks. For living documents that evolve weekly, extraction costs spiral.
Claude is a single point of failure with zero fallback. If Anthropic deprecates the model version, changes API schemas, or experiences an outage, Frfr stops working. There's no abstraction layer for swapping LLM providers, no local model fallback, no cached degradation mode. The hardcoded metadata schema (control family, specificity score, etc.) betrays its security audit origins—you can't extend it for legal citations or medical terminology without forking the codebase.
Verdict
Use if: You're querying high-stakes documents where wrong answers have consequences (security audits, compliance attestations, RFP responses, legal discovery), you need human-auditable citations that map to exact source text, you work solo or with small teams willing to share session directories via Git, and you can afford 5-10 minutes of upfront extraction per document in exchange for deterministic query results. The filesystem design makes debugging transparent and version control trivial. Skip if: You need real-time collaboration, sub-second query latency, incremental document updates, or multi-user access controls. Also skip if your documents exceed Claude's context window after chunking, you require vendor-agnostic LLM swappability, or you're building a customer-facing product where Electron's 200MB bundle size and lack of code signing automation are dealbreakers. This is a power tool for individual knowledge workers, not a platform for SaaS products.