Back to Articles

Building Company Intelligence Pipelines with Exa.ai and Parallel Search Orchestration

[ View on GitHub ]
19
AI-Assisted Full Provenance Report →
Cursor
AI Provenance badge [![AI Provenance](https://starlog.is/badge/provenance/exa-labs/company-researcher.svg)](https://starlog.is/provenance/exa-labs/company-researcher)

Building Company Intelligence Pipelines with Exa.ai and Parallel Search Orchestration

Hook

What if you could gather a company's founders, funding history, social media presence, and product details in under 10 seconds—without writing a single web scraper?

Context

Traditional company research is death by a thousand tabs. You start with a LinkedIn search for founders, jump to Crunchbase for funding rounds, scan Twitter for product announcements, check Reddit for customer sentiment, and piece together a mental model of the business. Each platform has its own interface, search syntax, and rate limits. Building a custom scraper for each source means maintaining fragile CSS selectors that break with every redesign, handling authentication flows, and managing IP rotation to avoid blocks.

Company Researcher reimagines this workflow by treating Exa.ai as a unified search API across the entire web. Instead of scraping individual platforms, you construct targeted search queries that Exa routes to specific domains (linkedin.com, crunchbase.com, twitter.com) and returns structured results. The tool fires off 10+ parallel searches, uses AI to summarize findings, and presents a comprehensive company profile in seconds. It's less about building a new data source and more about orchestrating existing ones intelligently.

Technical Insight

The architecture is deceptively simple: a Next.js App Router application that makes the Exa API do the heavy lifting. The core innovation is in query construction and parallel orchestration. When you submit a company URL, the system extracts the domain and constructs specialized search queries for different data types. Here's how it fetches founder information:

const foundersResult = await exa.searchAndContents(
  `founders of ${companyName} site:linkedin.com OR site:crunchbase.com`,
  {
    type: "auto",
    numResults: 5,
    text: { maxCharacters: 1000 },
    livecrawl: "always",
    category: "company"
  }
);

The livecrawl: "always" parameter is critical—it forces Exa to fetch fresh content rather than serving cached results, ensuring you get current information about recent hires or departures. The domain filtering (site:linkedin.com OR site:crunchbase.com) leverages Google-style search operators to target authoritative sources without building LinkedIn or Crunchbase scrapers.

The real power emerges when you fire all searches in parallel using Promise.all. The codebase orchestrates separate queries for company subpages, news articles, social media mentions, GitHub activity, YouTube presence, and financial data simultaneously:

const [subpagesData, newsData, linkedInData, twitterData, redditData] = 
  await Promise.all([
    searchCompanySubpages(companyUrl),
    searchCompanyNews(companyName),
    searchLinkedIn(companyName),
    searchTwitter(companyName),
    searchReddit(companyName)
  ]);

Each search function follows the same pattern: construct a domain-specific query, set appropriate result limits, request content summaries. The Twitter search, for example, uses site:twitter.com ${companyName} to find official accounts and mentions. Reddit searches employ site:reddit.com ${companyName} to surface customer discussions and sentiment.

Once raw results arrive, the Vercel AI SDK pipes them to Claude for summarization. Instead of dumping 50 search snippets onto the user, the system generates digestible sections like "Company Overview," "Recent News," and "Social Presence." Here's the AI integration:

import { anthropic } from '@ai-sdk/anthropic';
import { generateText } from 'ai';

const summary = await generateText({
  model: anthropic('claude-3-5-sonnet-20241022'),
  prompt: `Summarize this company information: ${JSON.stringify(results)}`,
  maxTokens: 2048
});

The UI layer uses TailwindCSS with shadcn/ui components for a clean presentation. Each data point links back to Exa's playground, turning the tool into interactive documentation. You can click any result to see the exact API call that generated it, making this both a functional research tool and a learning resource for Exa's capabilities.

The serverless deployment on Vercel means zero infrastructure management. The Next.js App Router handles API routes for search endpoints, streaming responses back to the client as results arrive. There's no database—everything is ephemeral, fetched fresh on each request. This keeps the architecture stateless and eliminates data staleness, though it also means you're paying API costs on every lookup.

What makes this approach elegant is the inversion of control. Instead of building scrapers that navigate site structure, you're building search queries that describe what you want. Exa handles the crawling, parsing, and content extraction. When LinkedIn redesigns their profile page, your code doesn't break—Exa's infrastructure absorbs that complexity.

Gotcha

The elephant in the room is API dependency and cost. This tool is fundamentally a wrapper around Exa.ai, which charges per search and per content retrieval. A single company lookup might fire 15+ API calls, and if you're researching dozens of companies daily, costs escalate quickly. There's no free tier that supports the livecrawl feature, so you're committing to paid API access before the first query runs. You also need an Anthropic API key for summarization, compounding the cost structure.

Data quality is bounded by Exa's crawling capabilities and the public web. Stealth startups with minimal online presence return sparse results. Private companies that don't publicize funding rounds won't have financial data. The tool is only as good as what Google and other search engines can index. You'll get excellent coverage for well-known SaaS companies with active social media, but niche B2B firms or international businesses with non-English web presence may have gaps. There's also no data validation layer—if Exa returns outdated information or the AI misinterprets search results, you won't catch it without manual verification. For high-stakes decisions like investment due diligence, you'd still need to cross-reference findings with primary sources.

Verdict

Use Company Researcher if you're performing frequent company intelligence gathering (investor research, sales prospecting, competitive analysis) and value speed over exhaustive accuracy. It's ideal for teams already invested in the Exa.ai ecosystem or those evaluating it as a search infrastructure replacement. The parallel orchestration pattern demonstrated here is worth studying even if you don't use Exa—swap in Brave Search API or SerpAPI and the architecture remains valid. Skip it if you need specialized data not available through public search (private financials, detailed org charts, proprietary metrics), have strict data provenance requirements for compliance, want to avoid ongoing API costs, or research companies with limited web presence where search-based approaches fall short.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-dev-tools/exa-labs-company-researcher.svg)](https://starlog.is/api/badge-click/ai-dev-tools/exa-labs-company-researcher)