Back to Articles

Crucix: Building Your Own OSINT Intelligence Platform in Node.js

[ View on GitHub ]

Crucix: Building Your Own OSINT Intelligence Platform in Node.js

Hook

While everyone's building yet another ChatGPT wrapper, one developer created a tool that monitors wildfires from space, tracks military flights over conflict zones, and alerts you to radiation spikes—all running on a single Node.js server in your basement.

Context

Open-source intelligence (OSINT) has exploded from a niche tradecraft used by journalists and researchers into something everyone needs. Wildfire smoke drifting into your city? That's FIRMS satellite data. Unusual flight patterns near a geopolitical hotspot? ADS-B tracking. Supply chain disruptions you could have anticipated? Economic indicators and shipping data. The problem isn't lack of public data—it's that monitoring 27 different sources means 27 browser tabs, 27 different UIs, and zero ability to correlate events across domains.

Existing solutions fall into two camps: expensive commercial platforms like Maltego that cost thousands annually and lock you into their ecosystem, or heavyweight open-source stacks like OpenCTI that require Elasticsearch, Redis, RabbitMQ, and a weekend to configure. Crucix takes a radically different approach: a single Node.js process, minimal dependencies, and a philosophy that your intelligence infrastructure shouldn't require more complexity than the problems you're investigating. It's OSINT monitoring designed like a Unix tool—do one thing well, run anywhere, own your data.

Technical Insight

Crucix's architecture reveals how much you can accomplish with modern Node.js before reaching for heavy dependencies. The core runs on Express with native Node 22+ features doing the heavy lifting. Top-level await eliminates callback soup when orchestrating 27 simultaneous API calls. Built-in fetch replaces axios or node-fetch. The entire data pipeline—from API polling to SSE streaming—fits in under 3,000 lines.

The polling system uses a simple but effective interval-based architecture. Every 15 minutes, a dispatcher triggers 27 independent data fetchers, each responsible for a single source. Here's the simplified pattern from the FIRMS fire data fetcher:

// sources/firms.js
export async function fetchFIRMS() {
  const response = await fetch(
    `https://firms.modaps.eosdis.nasa.gov/api/area/csv/${process.env.FIRMS_KEY}/VIIRS_SNPP_NRT/world/1`
  );
  const csv = await response.text();
  const fires = parseCSV(csv).map(row => ({
    lat: parseFloat(row.latitude),
    lon: parseFloat(row.longitude),
    confidence: row.confidence,
    timestamp: new Date(row.acq_date + 'T' + row.acq_time),
    source: 'FIRMS'
  }));
  
  // Persist to filesystem cache
  await fs.writeFile(
    './data/firms.json',
    JSON.stringify({ updated: Date.now(), data: fires })
  );
  
  return fires;
}

Each fetcher returns normalized GeoJSON-compatible objects. The normalization is crucial—whether it's radiation readings from Safecast, conflict events from ACLED, or ship positions from AIS feeds, everything gets transformed into { lat, lon, type, metadata, timestamp }. This lets the dashboard rendering engine treat all sources uniformly while preserving source-specific metadata for drill-downs.

The real-time push mechanism uses Server-Sent Events instead of WebSockets. This choice is brilliant for read-heavy, low-interaction use cases. The server maintains an SSE connection per client and broadcasts updates whenever new data arrives:

// server.js
app.get('/events', (req, res) => {
  res.setHeader('Content-Type', 'text/event-stream');
  res.setHeader('Cache-Control', 'no-cache');
  
  const sendUpdate = (data) => {
    res.write(`data: ${JSON.stringify(data)}\n\n`);
  };
  
  // Send initial state
  sendUpdate({ type: 'init', sources: getAllCachedData() });
  
  // Register for future updates
  eventBus.on('update', sendUpdate);
  
  req.on('close', () => {
    eventBus.off('update', sendUpdate);
  });
});

No WebSocket handshake complexity, automatic reconnection in browsers, and simple HTTP/2 compatibility. For a dashboard where the server dictates updates and clients rarely send data back, SSE is perfect.

The LLM integration is where Crucix transitions from passive dashboard to active intelligence assistant. When anomalies are detected—a spike in fire activity, unusual flight patterns, or correlated events across sources—the system can optionally send context to an LLM (OpenAI, Anthropic, or local models via Ollama) for analysis:

// intelligence/analyzer.js
export async function analyzeAnomaly(events) {
  const context = `
Detected events in past hour:
- ${events.fires.length} new fires in region ${events.region}
- ${events.flights.length} military aircraft nearby
- ACLED reports ${events.conflicts.length} conflict events

Provide brief intelligence assessment and recommend actions.
  `.trim();
  
  const response = await fetch('http://localhost:11434/api/generate', {
    method: 'POST',
    body: JSON.stringify({
      model: 'mistral',
      prompt: context
    })
  });
  
  return await response.json();
}

The system can then route these briefs to Telegram or Discord webhooks, turning your intelligence platform into an active monitoring assistant. This is particularly powerful for traders monitoring economic indicators or researchers tracking environmental patterns—the LLM surfaces connections across domains you might miss staring at raw data.

Visualization uses vanilla WebGL through Three.js for the 3D globe, with a fallback Canvas2D renderer for the flat map. The dual-mode approach is pragmatic: the globe looks impressive and helps with spatial reasoning across continents, but the flat map performs better on older hardware and allows easier coordinate-based filtering. All rendering happens client-side from cached JSON, keeping the server stateless and allowing you to kill and restart the process without losing dashboard state.

Gotcha

The 15-minute polling interval is both a feature and a limitation. For monitoring slow-moving phenomena—deforestation, infrastructure changes, economic trends—it's perfect. But if you're trying to track rapidly evolving situations like active military engagements or breaking news, you'll always be 0-15 minutes behind. There's no streaming mode, no webhook ingestion for sources that support push notifications. The architecture is fundamentally pull-based.

API key management becomes tedious at scale. While some sources like GDELT and ACLED are fully open, premium feeds require registration and rate-limited keys. FIRMS needs NASA EARTHDATA credentials, ADS-B flight tracking needs paid access for complete coverage, and economic data sources often have freemium tiers that cap requests. You'll spend your first hour hunting down API keys before you see any data. The .env.example file lists 27 different environment variables, and there's no graceful degradation—if a source fails due to auth issues, that entire feed goes dark until you fix it.

The single-server architecture means no high availability. If your Node process crashes (and with 27 external API dependencies, network issues are inevitable), monitoring stops. There's no built-in retry logic with exponential backoff, no circuit breakers for flaky sources, no leader election for running multiple instances. For personal use or small research teams, this is fine—you restart the process and move on. For anything mission-critical, you'd need to wrap it in systemd with auto-restart or build your own redundancy layer.

Verdict

Use Crucix if you're an OSINT researcher, investigative journalist, geopolitical analyst, or quantitative trader who needs to monitor multiple global data sources from your own infrastructure without recurring SaaS costs or cloud vendor lock-in. It's exceptional for correlation analysis across domains—connecting wildfire data with wind patterns and flight diversions, or tracking conflict events alongside economic indicators and satellite imagery. The self-hosted model means you control the data, the refresh rates, and the LLM integration without external dependencies. Skip it if you need sub-minute latency for time-critical alerting, lack the technical comfort to manage API keys and troubleshoot Node.js issues, or require enterprise-grade reliability with automatic failover. Also reconsider if you're monitoring fewer than 5-6 sources—at that scale, you're better off with individual dashboards or a lightweight Grafana setup rather than running a full intelligence aggregation platform.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-agents/calesthio-crucix.svg)](https://starlog.is/api/badge-click/ai-agents/calesthio-crucix)