Back to Articles

Building a Self-Hosted OSINT Terminal: Inside Crucix's 27-Source Intelligence Aggregator

[ View on GitHub ]

Building a Self-Hosted OSINT Terminal: Inside Crucix’s 27-Source Intelligence Aggregator

Hook

Most of the world’s real-time intelligence—satellite fires, radiation levels, flight paths, conflict zones—is completely public. You just need to know where to look, and have the patience to check 27 different APIs every 15 minutes.

Context

The open-source intelligence landscape is fragmented by design. NASA publishes fire detection data, radiation monitoring stations provide readings, flight tracking services stream live positions, and conflict databases catalog events. Each dataset lives in its own API with its own authentication scheme, rate limits, and data format. If you’re a journalist tracking environmental incidents, a trader correlating geopolitical events with commodity prices, or a researcher monitoring global health alerts, you’re stuck with dozens of browser tabs and manual cross-referencing.

Crucix solves this by running a local aggregation layer that polls all 27 sources in parallel, normalizes the data, and streams updates to a WebGL-powered dashboard via Server-Sent Events. The architecture is deliberately minimal—one dependency (Express), no database, no message queue—because it prioritizes self-hosting over enterprise scalability. You run node server.mjs, and within 60 seconds you have a live intelligence terminal showing fire detections, radiation readings, maritime chokepoint activity, and conflict event clusters, all geocoded on a rotating 3D globe.

Technical Insight

Crucix’s architecture centers on a parallel sweep pattern that runs every 15 minutes. According to the README, the server queries all 27 sources concurrently, with each sweep taking approximately 30-60 seconds on the initial run. Results are written to timestamped JSON files in ./runs/, which serve as both the persistence layer and response cache. The README emphasizes that this file-based approach requires no database setup, making deployment straightforward for single-user scenarios.

The dashboard uses Server-Sent Events (SSE) rather than WebSockets for live updates. This design choice means the connection is HTTP-based and unidirectional (server to client), which the README notes works through corporate proxies that might block WebSocket upgrades. The trade-off is that command features like /brief and /sweep require separate POST requests instead of bidirectional messaging.

The frontend is built without React, Vue, or Svelte—it’s vanilla JavaScript that opens an EventSource connection to /events on page load and updates the DOM when new sweep data arrives. The 3D globe uses Globe.gl for WebGL rendering, with a toggle option for a flat map view. When users switch to ‘VISUALS LITE’ mode, the dashboard disables expensive visual effects like backdrop filters, animations, and on mobile devices, forces flat map mode to improve performance.

The optional LLM integration connects to the sweep pipeline—after each sweep completes, if LLM_API_KEY is configured in .env, the system can generate alerts and actionable insights. However, the README is clear that ‘the core dashboard functionality doesn’t depend on it’—Crucix works as a pure aggregator without any AI components.

The decision to store sweeps as flat JSON files rather than using a traditional database appears intentional for the target use case. For real-time monitoring at 15-minute intervals, the filesystem approach keeps the architecture simple and makes the data easily accessible via standard command-line tools.

Gotcha

The biggest friction point is API key acquisition. While the README lists 27 data sources, it provides an .env.example template but doesn’t detail the specific approval processes for each API. Some sources may require individual registrations with varying approval timelines. The README notes which keys are required versus optional, but sources without valid keys will return empty arrays during sweeps.

The initial load experience requires patience. The README explicitly warns that when you first start the server, the dashboard displays a ‘First sweep in progress…’ message. That first sweep takes 30-60 seconds because it’s querying 27 APIs in parallel, and the dashboard shows an empty interface until the sweep completes. Once the first sweep finishes, subsequent updates happen seamlessly via SSE every 15 minutes, but users should expect that initial wait.

The LLM features are positioned as optional add-ons rather than core functionality. The README mentions /brief and /sweep commands and integration with Telegram and Discord webhooks, but warns prominently that ‘Crucix has not launched any official token’ and flags potential scams. The AI capabilities require separate setup beyond just running the server, and users interested in customizing alert generation would need to work with the source code directly.

If npm run dev fails silently on your system (particularly on Windows PowerShell), the README provides a workaround: run node --trace-warnings server.mjs directly, or use the included node diag.mjs diagnostic tool to troubleshoot Node version, module imports, and port availability issues.

Verdict

Use Crucix if you’re an OSINT analyst, journalist, or trader who needs continuous monitoring of multiple global data streams and values self-hosting. The cross-domain correlation potential—seeing multiple types of events on one screen—provides genuine situational awareness benefits. The minimal dependency footprint (just Express) and file-based storage make it straightforward to deploy. With 8,223 GitHub stars, Crucix has attracted significant community interest. If you’re comfortable spending time collecting API keys and can wait through the initial sweep setup, this is a compelling open-source intelligence aggregator.

Skip Crucix if you need plug-and-play OSINT tools with zero configuration, prefer cloud-hosted dashboards, or only monitor single-domain data where specialized tools would be simpler. Also skip it if you expect the LLM features to work immediately out of the box—they’re optional power-user additions rather than core functionality. The primary value proposition is the aggregation and visualization of 27 diverse data sources into a single, self-hosted dashboard that refreshes automatically every 15 minutes.

// QUOTABLE

Most of the world's real-time intelligence—satellite fires, radiation levels, flight paths, conflict zones—is completely public. You just need to know where to look, and have the patience to check ...

[ Tweet This ]
// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/calesthio-crucix.svg)](https://starlog.is/api/badge-click/developer-tools/calesthio-crucix)