Back to Articles

pgai: PostgreSQL Extensions That Turn Your Database Into a Vector Embedding Pipeline

[ View on GitHub ]

pgai: PostgreSQL Extensions That Turn Your Database Into a Vector Embedding Pipeline

Hook

What if keeping your vector embeddings synchronized with your data was as simple as creating a database index—no application code, no manual syncing, just a SQL statement?

Context

Building RAG applications typically means managing two painful problems: keeping embeddings synchronized with changing source data, and orchestrating calls to external embedding APIs without overwhelming rate limits or causing timeouts. The conventional approach puts this burden on application code—you write workers that monitor data changes, batch records, call OpenAI or other providers, handle failures, implement exponential backoff, and update embedding tables. This scattered logic lives outside your database, creating a synchronization gap between your source data and vectors.

PostgreSQL already has pgvector for storing and querying embeddings, but it's passive storage. You still need infrastructure to generate those embeddings and keep them fresh as data changes. Timescale's pgai bridges this gap by treating embedding generation as a database concern, similar to how materialized views or full-text search indexes work. Instead of writing Python workers that poll tables and call APIs, you declare what should be embedded using SQL, and pgai handles the entire pipeline—monitoring changes, batching requests, managing API quotas, retrying failures, and writing results back. It transforms embedding generation from application plumbing into database infrastructure.

Technical Insight

The architecture centers on a declarative abstraction called a "vectorizer." You define what data to embed using a SQL statement that looks similar to creating an index, and pgai spins up background workers that maintain embeddings automatically. Here's a concrete example of vectorizing blog posts:

-- Create a vectorizer for blog content
SELECT ai.create_vectorizer(
  'blog_posts'::regclass,
  destination => 'blog_embeddings',
  embedding => ai.embedding_openai('text-embedding-3-small', 768),
  chunking => ai.chunking_recursive_character_text_splitter('content', 512, 50),
  formatting => ai.formatting_python_template('Title: $title\n\n$content'),
  scheduling => ai.scheduling_default()
);

This single statement sets up an entire pipeline. The destination is where embeddings land. The embedding parameter specifies the provider and model—pgai supports OpenAI, Ollama, Voyage AI, and others. The chunking option handles splitting large documents into manageable pieces using strategies like recursive character splitting or token-based splitting. The formatting parameter lets you template how source columns combine before embedding, and scheduling controls how aggressively the system processes changes.

Under the hood, pgai uses PostgreSQL's LISTEN/NOTIFY mechanism and a work queue table to track pending embeddings. When you insert or update a row in blog_posts, triggers add entries to the queue. Separate vectorizer worker processes—Python daemons you run alongside your database—poll this queue, batch requests to respect API rate limits, generate embeddings, and write results to blog_embeddings. The workers are stateless; all coordination happens through the database, making them horizontally scalable.

The embedding table gets automatically created with columns for the source record ID, chunk content, embedding vector, and metadata. You can then query it using pgvector's similarity search:

-- Find semantically similar blog posts
SELECT 
  bp.title,
  1 - (be.embedding <=> ai.openai_embed('text-embedding-3-small', 'database performance tips')) AS similarity
FROM blog_embeddings be
JOIN blog_posts bp ON be.blog_post_id = bp.id
ORDER BY be.embedding <=> ai.openai_embed('text-embedding-3-small', 'database performance tips')
LIMIT 5;

What makes this approach powerful is automatic synchronization. Update a blog post's content, and pgai detects the change, re-embeds the modified text, and updates the vectors—no application code needed. The system handles partial failures gracefully: if an API call fails due to rate limiting or timeouts, the worker marks that item for retry with exponential backoff. This resilience is critical in production where embedding APIs can be flaky.

Beyond basic vectorization, pgai includes a Semantic Catalog feature for text-to-SQL use cases. You can add natural language descriptions of tables and columns:

-- Add semantic descriptions to schema
SELECT ai.enable_semantic_catalog();
SELECT ai.set_table_description('blog_posts', 'Contains all published blog articles with metadata');
SELECT ai.set_column_description('blog_posts', 'published_at', 'UTC timestamp when article went live');

AI agents can then query this catalog to understand your schema and generate accurate SQL, improving text-to-SQL translation accuracy. The catalog is just another pgai vectorizer under the hood, embedding schema metadata for semantic retrieval.

The Python library component provides helper functions for calling embedding and LLM APIs directly from application code, along with utilities for managing vectorizer workers. While you can run workers manually with pgai vectorizer worker, Timescale Cloud offers managed workers that eliminate operational overhead—the platform runs and scales workers automatically, monitoring the queue depth and adjusting capacity.

Gotcha

The primary operational complexity is managing vectorizer workers. Unless you're using Timescale Cloud's managed service, you need to deploy and monitor separate Python processes that run the embedding pipeline. These workers aren't part of PostgreSQL itself—they're external daemons that connect to your database. In Kubernetes, this means additional deployments. On EC2 or bare metal, it's systemd units or process supervisors. This decoupled architecture provides scalability and fault isolation (a worker crash doesn't affect your database), but it adds deployment surface area. You need to handle worker authentication, network access to both Postgres and external APIs, secret management for API keys, and monitoring for queue lag.

API costs and rate limits are inherited directly from your embedding provider. If you're using OpenAI's embeddings, you'll pay per token embedded, and those costs scale with data volume and update frequency. A table with millions of rows and frequent updates can generate significant monthly bills. The chunking configuration directly impacts costs—splitting documents into smaller chunks means more embedding calls. pgai batches requests and respects rate limits to avoid quota exhaustion, but it can't eliminate the underlying consumption. You need to carefully consider which data truly needs to be embedded and how frequently it changes. For append-only data like logs or historical records, this is manageable. For frequently updated transactional tables, costs can spiral quickly. Additionally, since pgai depends on external services, outages or API degradations from providers like OpenAI directly impact your embedding freshness. The system will queue and retry, but you're fundamentally dependent on third-party availability.

Verdict

Use pgai if you're building RAG or semantic search on PostgreSQL and want embedding generation to be infrastructure rather than application code. It's particularly compelling for applications where data changes frequently and keeping embeddings synchronized is otherwise painful—think content management systems, knowledge bases, or customer support databases. The declarative approach eliminates boilerplate worker code and provides production-grade error handling out of the box. It's ideal if you're already on Timescale Cloud (managed workers are a huge win), or if you're comfortable operating Python worker processes alongside your database. Skip it if you need absolute control over embedding logic and prefer handling synchronization in application code, or if the operational overhead of running separate workers doesn't fit your deployment model. Also consider alternatives if you're working at massive scale where dedicated vector databases like Pinecone or Weaviate provide better performance characteristics, or if you need vector database features beyond what pgvector offers. For teams that live in SQL and want embeddings to feel like native database functionality rather than external API calls scattered through application code, pgai is a remarkably clean abstraction.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/data-knowledge/timescale-pgai.svg)](https://starlog.is/api/badge-click/data-knowledge/timescale-pgai)