Onyx: Building a Self-Hosted AI Platform That Doesn’t Lock You Into Vendor Hell
Hook
Most open-source ChatGPT alternatives are just thin wrappers around OpenAI’s API. Onyx took the harder path: building a full-stack RAG platform that treats LLM providers as swappable commodities while solving the gnarly problem of enterprise document permissioning.
Context
The AI tooling landscape has a vendor lock-in problem. You build on OpenAI, you’re married to their rate limits and pricing. You build on Anthropic, you’re rewriting when Claude’s context window changes. Enterprise teams wanting ChatGPT-like functionality face an impossible choice: use cloud providers and lose data control, or cobble together LangChain scripts that break every library update.
Onyx emerged from this frustration as a self-hostable alternative that doesn’t compromise on features. It’s not just another chat UI—it’s a complete AI platform designed for scenarios where you need serious retrieval augmented generation, not toy demos. The system supports enterprise-scale deployments where teams search across Confluence, Jira, GitHub, Slack, and 40+ other sources simultaneously, with document permissions that mirror the source applications.
Technical Insight
At its core, Onyx implements a multi-container architecture written in Python that separates concerns cleanly. The backend handles document ingestion and indexing, the frontend serves the chat UI, and a vector database manages embeddings. But the interesting decisions happen in three areas: hybrid search, connector architecture, and LLM abstraction.
The hybrid search implementation combines vector embeddings with knowledge graphs. Unlike pure vector search that often misses exact keyword matches, or pure keyword search that fails on semantic queries, Onyx’s approach indexes documents with both strategies. When you search, it performs parallel retrieval operations and merges results. The knowledge graph layer adds entity relationships extracted during indexing, enabling queries to traverse organizational structures.
The connector ecosystem is where Onyx’s enterprise focus shines. Each of the 40+ connectors doesn’t just pull documents—it extracts metadata and access control lists. The system preserves permissions from source applications, so when a user queries the system, Onyx filters retrieved documents against their permission set before feeding context to the LLM. This means private Slack channels stay private, even in AI responses—a requirement that eliminates most lightweight alternatives.
The LLM abstraction layer enables working with all major providers (OpenAI, Anthropic, Gemini) and self-hosted options (Ollama, vLLM). Switching between providers appears to be a configuration change rather than a code rewrite, allowing teams to avoid vendor lock-in.
The agentic features build on this foundation. The Deep Research capability chains multiple LLM calls with intermediate search steps, exploring tangential queries through multi-step agentic search. The MCP (Model Context Protocol) integration lets agents invoke external tools and actions. The Code Interpreter feature enables executing code to analyze data, render graphs, and create files, while Image Generation provides prompt-based image creation.
Deployment flexibility comes from the containerized design. For small teams, the quickstart installation script starts all services with sensible defaults. For enterprises, the system supports Kubernetes deployments for large teams, along with Terraform options and cloud-specific guides for AWS EKS and Azure. The architecture supports airgapped deployments where no data leaves your network—critical for defense contractors or healthcare providers under strict compliance regimes.
Gotcha
Onyx’s feature richness is a double-edged sword. The system requires running multiple services including a vector database, background indexing workers, a web backend, and a frontend. Based on the documentation mentioning support for ‘tens of millions of documents’, initial document indexing at enterprise scale can be resource-intensive. If you’re just wanting to chat with PDFs on your laptop, this is engineering overkill.
The connector permissioning, while powerful, assumes your source applications have well-defined access controls. If your Google Drive is a permissions mess where everyone can see everything anyway, Onyx’s ACL mirroring just adds complexity without security benefit. And while the documentation covers deployment scenarios, the operational burden of managing the vector database, monitoring indexing pipelines, and debugging connector failures is real. This isn’t a “deploy and forget” system—it’s infrastructure that needs operational maturity to maintain.
Verdict
Use if: You’re building for a team or enterprise that needs ChatGPT-class AI with complete data sovereignty. Your knowledge lives across multiple SaaS apps (among the 40+ supported connectors including Confluence, Notion, Slack, Google Drive) and you need unified search that respects source permissions. You want to avoid vendor lock-in and need the flexibility to switch between cloud LLMs and self-hosted models as economics or compliance requirements shift. You have the operational maturity to run containerized systems in production and need features like SSO, RBAC, and user management. Skip if: You want a simple personal AI assistant without enterprise baggage. You’re fine with managed solutions like Onyx Cloud and don’t need self-hosting. Your use case is just “chat with my LLM” without serious retrieval needs—simpler alternatives exist for basic multi-provider support without the RAG overhead. You’re on resource-constrained hardware where running vector databases isn’t practical.