Building Search-Augmented Chat with Exa and OpenAI o3-mini: A Function Calling Deep Dive
Hook
While most developers bolt search onto chatbots as an afterthought, OpenAI’s o3-mini was designed from the ground up to reason about when and how to use tools—making search integration feel less like a hack and more like natural conversation flow.
Context
The challenge with traditional chatbots has always been their knowledge cutoff date. Ask GPT-4 about yesterday’s news, and you’ll get an apologetic disclaimer. Early solutions involved manually detecting when users asked time-sensitive questions, then awkwardly switching to a search API. The result felt disjointed—like talking to someone who kept pausing to Google things on their phone.
OpenAI’s function calling changed this paradigm by letting models autonomously decide when they need external data. But implementation remained complex, requiring developers to wire up tool definitions, parse structured outputs, execute functions, and feed results back to the model. The exa-o3mini-chat repository demonstrates a cleaner approach: combining OpenAI’s o3-mini (a reasoning-optimized model) with Exa’s AI-native search API through the Vercel AI SDK’s abstraction layer. The result is a reference implementation that handles the orchestration complexity while remaining transparent enough to understand what’s happening under the hood.
Technical Insight
The architecture centers on the Vercel AI SDK’s streamText function, which manages the entire function calling lifecycle. Unlike basic chat implementations that simply pass messages back and forth, this approach defines tools that the model can invoke mid-conversation. Here’s the core pattern from the repository:
import { streamText, tool } from 'ai';
import { openai } from '@ai-sdk/openai';
import Exa from 'exa-js';
import { z } from 'zod';
const result = await streamText({
model: openai('o3-mini'),
messages,
tools: {
search: tool({
description: 'Search the web for current information',
parameters: z.object({
query: z.string().describe('The search query'),
numResults: z.number().optional().default(5),
}),
execute: async ({ query, numResults }) => {
const exa = new Exa(process.env.EXA_API_KEY);
const results = await exa.searchAndContents(query, {
numResults,
text: true,
});
return results.results;
},
}),
},
maxSteps: 5,
});
This code encapsulates several architectural decisions worth unpacking. First, the tool definition uses Zod for runtime schema validation—the model receives this schema and must structure its function call arguments accordingly. When o3-mini determines it needs current information, it generates a JSON object matching the schema rather than trying to answer from its training data.
Second, the maxSteps parameter limits how many tool calls can occur in a single interaction. With reasoning models like o3-mini, this prevents runaway loops where the model might repeatedly search and refine queries. The default behavior chains these steps: user question → model realizes it needs search → executes search → model reasons about results → provides answer.
The Exa API integration is particularly clever. Unlike traditional search APIs that return titles, URLs, and snippets optimized for human readers, Exa’s searchAndContents method returns full-text content preprocessed for LLM consumption. This means o3-mini receives clean, relevant paragraphs instead of metadata it would need to interpret:
// Exa returns structured data like this:
{
results: [
{
title: "Article Title",
url: "https://...",
text: "Full cleaned article text...",
score: 0.95
}
]
}
On the frontend, the repository uses React Server Components with the useChat hook from Vercel AI SDK to handle streaming. This is where the implementation shines for user experience:
const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
api: '/api/chat',
onError: (error) => {
console.error(error);
},
});
The streaming setup means users see tokens appearing in real-time, even during the reasoning phase. When o3-mini decides to search, the frontend can display intermediate states (“Searching for…”) before the final answer streams in. This transparency makes the tool usage feel intentional rather than mysterious.
The API route itself is minimal because the Vercel AI SDK handles response streaming automatically. The framework detects when tools are called, executes them server-side, feeds results back to the model, and streams everything to the client with appropriate boundaries:
export async function POST(req: Request) {
const { messages } = await req.json();
const result = await streamText({
// ... configuration shown above
});
return result.toDataStreamResponse();
}
This toDataStreamResponse() method is doing heavy lifting—it serializes the entire conversation flow (user message, tool calls, tool results, model responses) into a stream that useChat on the client can parse. The protocol handles markers for when tool calls start and end, allowing the frontend to render different UI states without manual parsing.
One subtle but important detail: the repository doesn’t persist conversations. Each request includes the full message history in the POST body, making the API stateless. This simplifies deployment and scaling but means context windows grow with conversation length. For production applications, you’d want to implement conversation summarization or selective context pruning after a certain number of turns.
Gotcha
The most significant limitation is that this implementation only supports a single tool with basic search functionality. In real-world applications, you’d likely want multiple tools (calculator, code execution, database queries), and managing the orchestration complexity increases non-linearly. The o3-mini model needs clear, distinct tool descriptions to choose correctly, and poorly defined tools lead to the model making wrong calls or hallucinating function arguments.
There’s also no conversation memory beyond the current session. Refresh the page, and your chat history disappears. The repository serves as a demo, so this makes sense for simplicity, but production deployments would need database integration, user authentication, and session management. Additionally, the streaming implementation doesn’t handle network interruptions gracefully—if the connection drops mid-stream, there’s no resume capability. You’d need to implement retry logic and partial response recovery for reliability. Cost management is another unaddressed concern: o3-mini pricing combined with Exa API costs means an extended conversation with multiple searches could become expensive quickly, yet there’s no rate limiting or budget enforcement in the codebase.
Verdict
Use if: You’re building a prototype or MVP that needs current web information in conversational AI responses, you want to understand how function calling works with streaming architectures, or you need a clean starting point for a search-augmented chatbot that you plan to extend with authentication and persistence. This repository excels as a learning resource and foundation—the code is minimal enough to understand completely in an afternoon but demonstrates production-quality patterns. Skip if: You need a production-ready application with user management, conversation history, multi-model support, or complex tool orchestration. You’re better served by frameworks like LangChain or more complete chat UIs like Open WebUI. Also skip if you’re working with non-streaming use cases or want to avoid vendor lock-in to Vercel’s infrastructure, as the implementation assumes deployment to Vercel’s edge runtime and uses their SDK throughout.