WrenAI: Why Text-to-SQL Needs a Semantic Layer to Stop Hallucinating Business Logic
Hook
Your LLM generates perfect SQL—syntactically valid, runs without errors—but calculates revenue using the wrong join. WrenAI solves this by teaching the LLM your business logic before it writes a single query.
Context
The text-to-SQL landscape is crowded with tools that dump your database schema into an LLM prompt and hope for the best. This works for trivial queries, but breaks down the moment your business has nuanced definitions: ‘active users’ might exclude internal accounts, ‘revenue’ might require specific currency conversions, or ‘customer’ might need a three-table join that only the data team knows about. WrenAI, an open-source GenBI agent from Canner, tackles this with a semantic layer—a Modeling Definition Language (MDL) that encodes metrics, relationships, and access rules once, then constrains every LLM-generated query to respect those definitions.
With 14,834 GitHub stars and built primarily in TypeScript, WrenAI isn’t just another SQL translator. It’s a full-stack platform spanning a Next.js UI for semantic modeling, a Python/FastAPI AI service orchestrating RAG-based query generation, and a Rust query engine (wren-engine) built on Apache DataFusion that resolves semantic abstractions before hitting your actual database. The project supports 12+ data sources (PostgreSQL, BigQuery, Snowflake, ClickHouse, DuckDB, Trino, MySQL, Microsoft SQL Server, Oracle, Redshift, Databricks, and Athena) and works with any LLM provider—OpenAI, Anthropic, Google, AWS Bedrock, Azure, Databricks, DeepSeek, Groq, or local Ollama models. Unlike traditional BI tools that require analysts to hand-craft dashboards, WrenAI generates SQL, charts, summaries, and reports conversationally, positioning itself as a GenBI agent rather than a query builder.
Technical Insight
WrenAI’s architecture is a three-layer stack designed to keep LLM outputs grounded in business reality. The user-facing layer is wren-ui, a Next.js application with Apollo GraphQL managing both the chat interface and the semantic modeling canvas. This is where analysts define MDL models—schemas that declare not just table structure, but calculated metrics, canonical joins, and row-level security rules. For example, defining a ‘monthly_recurring_revenue’ metric once in MDL prevents the LLM from improvising different formulas across queries.
The intelligence layer is wren-ai-service, a Python FastAPI application handling the conversational flow. When a user asks ‘What was Q4 revenue by region?’, the service breaks this into steps: semantic search via Qdrant (a vector database) retrieves relevant MDL definitions and historical query patterns using RAG; the LLM (swappable between providers) generates SQL constrained by those definitions; if the query fails, a self-correction loop re-prompts with error context; finally, the service generates chart specifications and natural-language summaries alongside the raw results. The README emphasizes frontier models (GPT-4o, Claude Sonnet, Gemini Pro) for production accuracy, though smaller models work with degraded precision.
The most architecturally interesting piece is wren-engine, a standalone Rust service built on Apache DataFusion. Unlike typical text-to-SQL tools that map directly to database dialects, wren-engine acts as an intermediary query planner. It ingests semantic queries (referencing MDL metrics and relationships), resolves them against the semantic layer’s business logic, then translates to native SQL for the supported data sources. This separation means the LLM never sees raw schemas—it reasons about abstract business concepts, and wren-engine handles the messy physical-to-logical mapping. Here’s a simplified flow from the README’s architecture diagram:
User Question → wren-ui (GraphQL)
↓
wren-ai-service (RAG retrieval + LLM prompting)
↓
wren-engine (semantic resolution + DataFusion query planning)
↓
Target Database (PostgreSQL, BigQuery, etc.)
A concrete example of MDL in action: suppose your business defines ‘active customers’ as users with a purchase in the last 90 days, excluding test accounts. In raw DDL-to-SQL, an LLM might guess this logic inconsistently. With WrenAI, you define this once in MDL as a metric with specific filters. Now every query referencing ‘active customers’ automatically inherits this logic. The LLM doesn’t hallucinate filters; it just references the metric, and wren-engine injects the correct SQL.
The project also exposes an API for embedding—documented at wrenai.readme.io—allowing developers to integrate query generation into SaaS products or custom agents. The README links to a live Streamlit demo on HuggingFace demonstrating API-driven chart generation, showing WrenAI can function as a backend service rather than just a standalone BI tool. The API approach is particularly useful for teams building vertical SaaS: you encode domain-specific metrics in MDL once, then let end users query conversationally without exposing raw database access.
One subtle design choice: WrenAI treats the LLM as a swappable reasoning module rather than a hard dependency. The wren-ai-service configuration accepts any OpenAI-compatible endpoint, meaning you can point it at Ollama for self-hosted deployments, Azure OpenAI for enterprise compliance, or Groq for speed. This provider-agnostic stance keeps the tool viable even as LLM economics shift—you’re not locked into a single vendor’s API.
The Rust/DataFusion query engine is worth highlighting as a case study beyond WrenAI itself. While DataFusion typically powers analytics engines like Ballista or InfluxDB, wren-engine uses it for semantic-to-physical query translation across heterogeneous sources. The engine maintains connector adapters for each supported database, translating DataFusion’s logical plans to vendor-specific SQL dialects. This makes wren-engine a reusable component—teams building custom data platforms could theoretically fork it as a standalone semantic query layer, decoupled from WrenAI’s UI and LLM orchestration.
Gotcha
WrenAI’s semantic layer is both its superpower and its upfront tax. This is not a zero-config solution—you can’t point it at a raw database and immediately start asking questions with high accuracy. Someone (likely a data analyst or analytics engineer) must define MDL models: mapping tables, declaring metrics, specifying joins, setting access controls. For teams without established data modeling practices, this is net-new work. The README is upfront about this trade-off, positioning the semantic layer as governance rather than convenience. If your use case is ad-hoc exploration of unfamiliar schemas, you’ll need to invest time in semantic modeling first.
Accuracy scales with LLM capability, and the README explicitly recommends frontier models for best results. While WrenAI works with Ollama or smaller cloud models, you’ll hit precision limits—complex multi-step reasoning, ambiguous questions, or edge-case metric definitions will degrade with weaker models. In production, this translates to cost and latency trade-offs: GPT-4o or Claude Sonnet API calls add up at scale. The self-correction loop (where failed SQL gets re-prompted) helps but doesn’t eliminate errors—you’re still probabilistically generating code.
Connector coverage, while broad, has gaps. The README lists 12+ data sources with community voting driving the roadmap. If your stack runs on supported platforms like PostgreSQL, BigQuery, Snowflake, or Databricks, you’re covered; if you’re on a less common data source, you may be waiting for community demand to prioritize it. The GitHub discussions show active connector requests, but there’s no guaranteed timeline. For teams with niche or proprietary data stores, this could be a blocker unless you’re willing to contribute a connector yourself (wren-engine’s architecture makes this feasible but non-trivial).
Finally, WrenAI is a self-hosted or cloud platform, not a library you npm install into an existing app. The Docker-based deployment is straightforward, but you’re running multiple services (UI, AI service, engine, Qdrant vector DB). For small teams, this operational overhead might outweigh the value compared to lighter-weight alternatives. The cloud offering (getwren.ai) sidesteps this, but then you’re using Canner’s hosted infrastructure—acceptable for many use cases, but worth evaluating for air-gapped or highly regulated environments where the self-hosted path may be preferable.
Verdict
Use WrenAI if you have established data models and need governed, repeatable text-to-SQL where business definitions matter—teams migrating from legacy BI tools who want to democratize data access without sacrificing accuracy, or SaaS builders embedding analytics where you control the semantic layer and let customers query conversationally. The wren-engine component is particularly interesting for platform teams building custom data infrastructure who need a semantic query planner decoupled from specific BI tools. Skip WrenAI if you need zero-config ad-hoc querying against unfamiliar schemas—the semantic layer setup is mandatory, not optional. Also skip if your data source isn’t in the supported list and you can’t wait for community prioritization, or if you’re unwilling to invest in frontier LLM APIs for production-grade accuracy (though Ollama works, your mileage will vary). For lightweight prototyping or embedding basic text-to-SQL into an existing app without the overhead of running multiple services, a simpler solution may get you results faster, even if it lacks WrenAI’s governance and chart-generation depth.