Crucix: Building a Self-Hosted OSINT Aggregator That Monitors 27 Intelligence Sources
Hook
While most developers debate which cloud provider to use, Crucix processes radiation monitors, wildfire satellites, and flight corridors in a single Node.js process with Express as its only dependency.
Context
Intelligence gathering used to require dedicated teams, expensive subscriptions, and disparate tools. An analyst tracking geopolitical events might monitor ACLED for conflicts, FlightRadar24 for unusual air traffic, FIRMS for satellite fire detection, and social media for emerging narratives—each in a separate browser tab, with no way to correlate events across domains. The cognitive overhead is crushing: Did that spike in military flights correlate with the protest detected two hours earlier? Why are fire alerts clustering near critical infrastructure? These questions remain unanswered when data lives in silos.
The explosion of open-source intelligence (OSINT) APIs should have solved this. Instead, it created a new problem: integration hell. Each source has its own authentication scheme, rate limits, data formats, and reliability profiles. Building a unified dashboard means writing 27 different API adapters, handling their quirks, and keeping the whole system synchronized. Crucix addresses this by providing a pre-integrated aggregation layer that consolidates heterogeneous intelligence sources into a single JSON structure, complete with real-time visualization and intelligent alerting. It’s designed for the solo researcher, independent journalist, or small security team who needs enterprise-grade situational awareness without enterprise budgets or cloud lock-in.
Technical Insight
Crucix’s architecture revolves around three core components: a polling orchestrator, a delta-based cache system, and a Server-Sent Events pipeline that pushes updates to connected clients. The orchestrator runs every 15 minutes, fanning out parallel HTTP requests to 27 distinct APIs using native Node.js fetch. Each source adapter returns normalized data with standardized fields—latitude, longitude, severity, source, timestamp—which the system merges into a unified JSON structure.
The elegance lies in how it handles state. Rather than storing everything in a database, Crucix maintains a simple file-based cache (data/last_sweep.json) that tracks the previous polling cycle. On each sweep, it computes deltas: new events that weren’t present before. This approach keeps memory usage constant regardless of how long the system runs, and it makes the alerting logic trivial—just diff the current sweep against the cached state.
Here’s how the core sweep mechanism works:
// Simplified sweep orchestrator
async function runSweep() {
const sources = [
fetchFIRMS(), // NASA satellite fires
fetchADSB(), // Flight tracking
fetchACLED(), // Armed Conflict Location
fetchGDELT(), // Global news events
// ... 23 more sources
];
const results = await Promise.allSettled(sources);
const events = results
.filter(r => r.status === 'fulfilled')
.flatMap(r => r.value);
const lastSweep = loadCache('last_sweep.json');
const newEvents = events.filter(e =>
!lastSweep.some(old =>
e.id === old.id ||
(distance(e.lat, e.lng, old.lat, old.lng) < 0.1 &&
e.timestamp - old.timestamp < 3600000)
)
);
saveCache('last_sweep.json', events);
if (newEvents.length > 0) {
evaluateAlerts(newEvents);
broadcast(newEvents); // SSE push to clients
}
}
The alerting system operates in two modes: LLM-enhanced and rule-based fallback. When an OpenAI API key is configured, Crucix sends new events to GPT-4 with a prompt asking it to evaluate significance, assign priority (FLASH/PRIORITY/ROUTINE), and generate natural language summaries. This semantic evaluation catches nuanced patterns that simple rules would miss—like correlating unusual military flight activity with protests in the same region.
Without an LLM, the system falls back to heuristic rules based on event type and clustering. Multiple events of the same type within a 50km radius within an hour trigger a PRIORITY alert. Single high-severity events (like radiation spikes or large wildfires) generate FLASH alerts. This dual-mode design means the system remains functional even in air-gapped environments where external API calls aren’t possible.
The visualization layer uses Globe.gl for the 3D rendering, but the data pipeline is framework-agnostic. Events are broadcast via Server-Sent Events, which means any client—web dashboard, mobile app, or custom script—can subscribe to the real-time feed:
// Server-side SSE endpoint
app.get('/events/stream', (req, res) => {
res.setHeader('Content-Type', 'text/event-stream');
res.setHeader('Cache-Control', 'no-cache');
res.setHeader('Connection', 'keep-alive');
const clientId = Date.now();
clients.set(clientId, res);
req.on('close', () => clients.delete(clientId));
});
function broadcast(events) {
const payload = `data: ${JSON.stringify(events)}\n\n`;
clients.forEach(client => client.write(payload));
}
The bot interfaces add bidirectional communication. Rather than just pushing alerts, Crucix responds to Telegram and Discord commands like /sweep (trigger immediate poll), /sources (list active APIs), or /find [query] (search cached events). This transforms a passive dashboard into an interactive assistant. The Telegram implementation uses long-polling to avoid webhook complexity, while Discord uses the official gateway with slash command registration. Both bots maintain their own client connections to the SSE stream, effectively turning chat apps into alternative frontends.
What’s remarkable is the dependency footprint. The package.json lists only Express. Everything else—HTTP client, JSON parsing, file I/O, even the Discord and Telegram bot logic—uses built-in Node.js capabilities. This minimalism isn’t just aesthetic; it reduces supply chain risk, simplifies auditing, and ensures the system remains stable as the JavaScript ecosystem churns through breaking changes.
Gotcha
The 15-minute polling interval is both a feature and a limitation. For monitoring slow-moving geopolitical trends or economic shifts, it’s perfectly adequate. But if you’re tracking a rapidly developing crisis—a military escalation, a breaking pandemic event, or a flash crash—those 15 minutes feel eternal. You can trigger manual sweeps via bot commands or the web UI, but there’s no built-in logic for adaptive polling that increases frequency when anomalies are detected. Building that would require more sophisticated state management than the current file-based cache provides.
API key management is entirely manual. Crucix includes a configuration file (config.example.json) with placeholders for 27 different API credentials, but acquiring those keys is your responsibility. Some sources like Yahoo Finance work without authentication, but most require signing up for developer accounts, navigating documentation, and managing rate limits. The repository doesn’t include scrapers or unofficial access methods—it’s strictly above-board API usage. This is ethical and legally sound, but it means significant setup time before you see meaningful data. There’s no graceful onboarding flow; you either configure everything upfront or get incomplete results.
Verdict
Use Crucix if you’re building situational awareness infrastructure that must run on-premise, need to correlate intelligence across multiple domains (geopolitical, environmental, economic, social), or want a privacy-first alternative to commercial OSINT platforms with subscription fees and data-sharing clauses. It’s ideal for investigative journalists tracking conflict zones, independent researchers studying climate patterns, or security teams in regulated industries where cloud services aren’t permitted. The minimal dependencies and AGPL license make it particularly suited for academic environments or organizations that require reproducible, auditable intelligence workflows. Skip it if you need sub-minute alerting latency for time-critical scenarios, lack the technical background to configure 27 different APIs and debug Node.js applications, or prefer turn-key SaaS solutions with customer support and guaranteed uptime. It’s also not the right choice if you need collaborative features like shared annotations, team-based access controls, or audit logging—Crucix is fundamentally a single-user tool, though you can expose the SSE stream to multiple clients.