> your AI agent picks dependencies from memory; give it dated facts — try starlog.dev ↗ vet your agent's deps ↗ vibe-coding is fine. vibe-importing isn’t. — try starlog.dev ↗ vibe-importing isn’t fine ↗ your agent has never seen your private packages — try starlog.dev ↗ facts for private packages ↗ a linter for the dependencies your AI agent picks — try starlog.dev ↗ a linter for agent deps ↗

Back to Articles

MindSearch: Building a Multi-Agent Search Engine That Thinks Like a Human Researcher

[ View on GitHub ]

MindSearch: Building a Multi-Agent Search Engine That Thinks Like a Human Researcher

Hook

While ChatGPT answers your question once, MindSearch asks itself 20 more questions in parallel—mimicking how a human researcher actually explores a topic by following multiple threads of inquiry simultaneously.

Context

The emergence of conversational AI created a new problem: single-shot question answering is fundamentally different from research. When you ask ChatGPT about quantum computing applications, it synthesizes its training data into one coherent response. When a human researcher explores the same topic, they formulate sub-questions, explore tangential concepts, compare contradicting sources, and iteratively refine their understanding. This gap between how AI answers and how humans research became glaringly obvious with tools like Perplexity AI and SearchGPT, which augment LLMs with real-time web search but still largely operate in sequential, single-path reasoning modes.

MindSearch, developed by the InternLM team and backed by academic research (arXiv:2407.20183), takes a fundamentally different approach. Rather than treating search as a single augmentation step before generating an answer, it implements a multi-agent system where one agent plans a research strategy by decomposing your query into sub-questions, while multiple search agents execute those sub-queries in parallel. The result is a graph of interconnected insights that mirrors human research behavior—following leads, exploring contradictions, and synthesizing information from multiple perspectives. This isn't just a theoretical exercise; the system runs in production on the Puyu platform, demonstrating that research-grade multi-agent architectures can operate at scale.

Technical Insight

At its core, MindSearch implements a coordinator-worker pattern with a graph-based reasoning layer. The WebPlanner agent acts as the coordinator, maintaining a directed acyclic graph (DAG) of sub-queries. When you submit a question like "How do neural scaling laws impact LLM training costs?", the planner doesn't just search for that exact phrase. Instead, it decomposes the query into nodes: "What are neural scaling laws?", "Current LLM training costs by model size", "Relationship between parameters and compute requirements", and "Cost optimization strategies". Each node becomes a task for WebSearcher agents, which execute in parallel thanks to the asynchronous architecture.

The system is built on Lagent v0.5, a multi-agent framework that handles the orchestration complexity. Here's how you might configure a basic MindSearch instance with a custom LLM backend:

from lagent.llms import INTERNLM2_META, AsyncLMDeployServer
from mindsearch.agent import MindSearchAgent
from mindsearch.planner import InternLMXCPlanNode

# Configure your LLM backend
llm = AsyncLMDeployServer(
    model_name='internlm2.5-chat-7b',
    url='http://localhost:23333',
    meta_template=INTERNLM2_META,
    max_new_tokens=4096,
    temperature=0.7
)

# Initialize the multi-agent system
agent = MindSearchAgent(
    llm=llm,
    planner_cls=InternLMXCPlanNode,
    searcher_cfg=dict(
        api_key='your_search_api_key',
        engine='DuckDuckGo'  # or Bing, Brave, Google Serper
    ),
    max_workers=8  # parallel search agents
)

# Execute a research query
response = await agent.forward(
    query="Explain the tradeoffs between mixture-of-experts and dense transformers",
    search_depth=3
)

The architecture's power lies in its abstraction layers. The search engine layer supports multiple providers through a unified interface, so switching from DuckDuckGo to Bing requires only a configuration change. Similarly, the LLM abstraction supports InternLM, GPT-4, Claude, or any OpenAI-compatible API, making the system remarkably portable across different infrastructure setups.

The FastAPI backend exposes this agent coordination through WebSocket and HTTP endpoints. The /solve endpoint accepts a query and streams back reasoning updates as agents discover new information. This streaming approach is critical—rather than waiting for all parallel searches to complete, users see incremental progress as each WebSearcher agent returns results. The frontend can render this as a living graph of expanding knowledge:

// React frontend connecting to MindSearch backend
const ws = new WebSocket('ws://localhost:8002/solve');

ws.send(JSON.stringify({
  query: "How does RLHF training affect model capability?",
  model: "internlm2.5-chat-7b"
}));

ws.onmessage = (event) => {
  const update = JSON.parse(event.data);
  
  if (update.type === 'plan_update') {
    // WebPlanner added new nodes to the reasoning graph
    renderGraphNodes(update.nodes);
  } else if (update.type === 'search_result') {
    // A WebSearcher agent returned findings
    updateNodeContent(update.node_id, update.content);
  } else if (update.type === 'synthesis') {
    // Final answer synthesized from all searches
    displayFinalAnswer(update.answer);
  }
};

What makes this architecture particularly elegant is how it handles search dependencies. Some sub-questions can only be formulated after initial searches complete. The graph structure allows the WebPlanner to add nodes dynamically—if a search about "RLHF training methods" reveals a new technique like Constitutional AI, the planner can inject a new node to explore that concept specifically. This adaptive planning mirrors how human researchers discover new angles mid-research.

The concurrent execution model deserves special attention. Traditional search-augmented generation systems process searches sequentially or in simple batches. MindSearch's async implementation with configurable max_workers means eight WebSearcher agents can query different search engines simultaneously, aggregate results, and feed them back to the planner while other searches are still in flight. In practice, this reduces research time for complex queries from minutes to seconds, despite generating significantly more comprehensive results than single-path systems.

Gotcha

The documentation assumes more familiarity with multi-agent systems than most developers possess. Setting up a production deployment requires manually editing .env files with LLM endpoints, search API keys, and model names—there's no interactive configuration wizard or comprehensive deployment guide. You'll spend time reverse-engineering the example configurations to understand parameters like search_depth, max_workers, and agent timeout settings. The project provides a docker-compose.yml for quick starts, but scaling beyond a single instance requires understanding how the async agent pool behaves under load, which isn't documented.

Language optimization presents another challenge. While the system supports English, the underlying InternLM models are primarily trained on Chinese corpora, and much of the prompt engineering appears optimized for Chinese query decomposition. English queries work, but you may notice the WebPlanner generates sub-questions that feel slightly unnatural or misses nuanced query reformulations that a GPT-4-based planner would catch. Switching to GPT-4 or Claude as your LLM backend resolves this but introduces API costs and latency that the local InternLM deployment avoids. There's a tradeoff between language quality and infrastructure control that isn't clearly articulated in the documentation.

The multi-agent approach also introduces debugging complexity. When a research query produces unexpected results, tracing the issue requires understanding which agent made which decision at which graph node. The system logs agent actions, but correlating log entries across parallel async operations to reconstruct the reasoning path takes effort. Error handling for search API failures or LLM timeouts exists, but graceful degradation isn't always predictable—sometimes a single failed search node can cascade into incomplete synthesis.

Verdict

Use MindSearch if you're building research-oriented AI applications where depth matters more than speed, you need self-hosted infrastructure with full control over LLM and search backends, or you're already invested in the InternLM ecosystem and want a production-ready search layer. It's particularly valuable for internal knowledge tools, competitive intelligence platforms, or academic research assistants where the multi-perspective reasoning justifies the operational complexity. The ability to swap LLM and search providers makes it future-proof as better models emerge. Skip if you need a plug-and-play solution with extensive documentation and hand-holding—the setup curve is steep and assumes backend development expertise. Also skip if sub-second response times are critical; the parallel search architecture is faster than sequential approaches but still measures in seconds, not milliseconds. For simple search augmentation or customer-facing chatbots where users expect instant responses, lighter RAG pipelines or commercial APIs like Perplexity's API offer better UX with less operational overhead.