Back to Articles

Dash: A Self-Learning Data Agent That Remembers Its Mistakes

[ View on GitHub ]

Dash: A Self-Learning Data Agent That Remembers Its Mistakes

Hook

OpenAI’s internal data teams don’t just use GPT-4 to write SQL—they’ve built a system that learns from every failed query and accumulates institutional knowledge like a senior analyst who never forgets. Now there’s an open-source version.

Context

Text-to-SQL has a dirty secret: the demos look magical, but production deployments fall apart within weeks. An LLM can generate syntactically valid SQL from “show me Q3 revenue,” but it doesn’t know that the revenue table has a six-month lag in data ingestion, that “active users” excludes internal test accounts via a specific flag, or that joining orders to customers requires a specific date range filter to avoid a Cartesian explosion that killed production last quarter. These aren’t schema problems—they’re institutional knowledge problems. Traditional solutions involve writing exhaustive documentation, building semantic layers, or training analysts. But documentation goes stale, semantic layers become bottlenecks, and analysts are expensive.

Dash approaches this differently by implementing what its creators call “GPU-poor continuous learning”—a system that doesn’t fine-tune models but instead builds a progressively smarter retrieval layer. Inspired by OpenAI’s internal tooling, Dash treats every successful query as a pattern to remember and every failure as a lesson to encode. Over time, it accumulates six distinct layers of context: schema definitions, human annotations about business logic, proven query patterns that worked, institutional documentation pulled via Model Context Protocol, error corrections that fixed past mistakes, and runtime database introspection. The result is a data agent that gets better with use, grounding its SQL generation in the collective experience of your entire data team rather than hoping an LLM magically intuits your business rules.

Technical Insight

Hybrid Search

User Question

Context Retriever

Schema Introspection

Business Annotations

Query Patterns

MCP Servers

Error Learnings

Data Sampling

PostgreSQL Database

Notion/Confluence/Wikis

Vector DB

Embeddings + Keywords

Context Assembly

LLM SQL Generator

Generated SQL Query

Query Results

System architecture — auto-generated

Dash’s architecture centers on a hybrid retrieval system that queries multiple context stores before generating SQL. When you ask a question, the system doesn’t immediately prompt an LLM—it first fans out search queries across six context repositories. Table schemas come from runtime introspection of your PostgreSQL database. Human annotations are curated business rules like “revenue excludes returns” or “always filter deleted_at IS NULL.” Query patterns are previously successful SQL statements tagged by intent. Institutional docs are pulled through MCP (Model Context Protocol) servers that can connect to Notion, Confluence, or internal wikis. Error learnings capture the before/after of failed queries and their fixes. Finally, runtime introspection samples actual table data to understand distributions and edge cases.

The retrieval layer uses hybrid search—combining dense embeddings for semantic similarity with keyword matching for precise technical terms. This matters because “revenue” and “income” are semantically similar, but “user_id” and “customer_id” might be completely different columns that keyword search can distinguish. Here’s what the context assembly might look like in practice:

# Simplified context retrieval flow
class ContextRetriever:
    def gather_context(self, question: str) -> Dict[str, List[str]]:
        contexts = {
            "schema": self.get_relevant_tables(question),
            "annotations": self.search_business_rules(question),
            "patterns": self.find_similar_queries(question),
            "docs": self.mcp_search(question),
            "learnings": self.check_error_history(question),
            "introspection": self.sample_data(question)
        }
        return contexts

    def find_similar_queries(self, question: str) -> List[QueryPattern]:
        # Hybrid search: embeddings + keywords
        embedding = self.embed(question)
        semantic_matches = self.vector_db.search(embedding, top_k=5)
        keyword_matches = self.keyword_index.search(
            self.extract_entities(question), top_k=5
        )
        return self.rerank(semantic_matches + keyword_matches)

Once context is assembled, Dash constructs a grounded prompt that includes all six layers, then asks the LLM to generate SQL. The critical insight is that the LLM sees not just “here’s a schema,” but “here’s a schema, here’s why this column is weird, here’s three similar questions we answered successfully, here’s a doc explaining our fiscal calendar, and here’s an error someone hit last week when they forgot to filter test accounts.”

When a query succeeds, Dash stores it as Knowledge—a validated pattern that future queries can reference. When a query fails, the Agno Learning Machine kicks in. It captures the error, prompts the LLM to diagnose and fix it, executes the corrected query, and if successful, stores the before/after pair as a Learning. This creates a self-improving feedback loop:

# Learning capture on error
class LearningMachine:
    def handle_error(self, question: str, failed_sql: str, error: str):
        # Ask LLM to fix based on error message
        fix_prompt = f"""
        This query failed:
        {failed_sql}
        
        Error: {error}
        
        Diagnose the issue and provide corrected SQL.
        """
        corrected_sql = self.llm.generate(fix_prompt)
        
        # Try the fix
        result = self.execute(corrected_sql)
        if result.success:
            # Store as learning
            self.learnings.add({
                "question": question,
                "wrong_sql": failed_sql,
                "error": error,
                "correct_sql": corrected_sql,
                "lesson": self.extract_lesson(error, corrected_sql)
            })
            return result

The separation between Knowledge (curated patterns) and Learnings (error corrections) is architecturally significant. Knowledge represents explicit expertise—queries a human validated and tagged. Learnings represent discovered expertise—patterns the system found by making and fixing mistakes. This dual memory prevents the system from blindly trusting its own corrections while still benefiting from automated improvement.

Dash goes one step further than raw SQL generation by interpreting results. After executing a query, it sends the results back to the LLM with business context and asks for a natural language summary. Instead of returning “Revenue: $1.2M,” it might say “Q3 revenue was $1.2M, down 15% from Q2, primarily due to the product recall documented in our October incident report.” This contextual interpretation is what transforms Dash from a SQL writer into a data analyst proxy.

The system is containerized with Docker and designed to integrate with os.agno.com, a web interface for managing knowledge, reviewing learnings, and monitoring agent performance. The tight integration means you’re not just deploying a library—you’re adopting an operational workflow where curating context becomes part of your data team’s routine.

Gotcha

Dash’s self-learning promise comes with a steep setup cost. The system won’t magically understand your data on day one—it needs you to populate annotations, curate initial query patterns, and connect institutional documentation. Cold-start performance will be mediocre until you’ve seeded enough context. This means Dash is a compound investment: early queries require manual intervention to validate and store as knowledge, but after weeks or months of use, the accumulated patterns make it genuinely smart. If you’re evaluating it with a quick two-day proof of concept, you’ll miss the entire value proposition.

The architecture is also tightly coupled to the Agno ecosystem. While the code is open source, it’s designed to work with os.agno.com for UI and management, and there’s no clear path for fully air-gapped or vendor-independent deployment. The learning machine and knowledge storage are PostgreSQL-based, which is flexible, but the MCP integrations and web interface assume you’re comfortable with Agno’s hosted or self-hosted infrastructure. If you’re looking for a drop-in library you can embed in an existing application without external dependencies, Dash isn’t that. It’s a full-stack data agent with opinions about how knowledge should be managed and queries should be reviewed. Additionally, the system appears PostgreSQL-specific for runtime introspection—if you’re working with MySQL, Snowflake, or non-SQL data sources like APIs or document stores, you’ll hit compatibility walls quickly.

Verdict

Use if: You have a data team that repeatedly answers similar analytical questions, you’re willing to invest in curating domain knowledge upfront, and you need a system that compounds in value as it accumulates institutional memory. Dash shines when your SQL complexity comes from business logic rather than schema complexity—when the hard part isn’t writing joins but remembering that “active subscriptions” has seven edge cases and three historical definition changes. It’s ideal for organizations with significant tribal knowledge where onboarding new analysts takes months because “you just have to know” how the data works. Skip if: You need quick ad-hoc queries without context investment, you’re working with non-PostgreSQL databases or non-SQL data sources, you require fully vendor-neutral tooling without external dependencies, or you can’t commit to the operational overhead of managing a knowledge base. This isn’t a lightweight SQL autocomplete—it’s an infrastructure investment that pays dividends over quarters, not days.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/agno-agi-dash.svg)](https://starlog.is/api/badge-click/ai-agents/agno-agi-dash)