Back to Articles

Marinus: Mapping Your External Attack Surface with Open-Source Intelligence

[ View on GitHub ]

Marinus: Mapping Your External Attack Surface with Open-Source Intelligence

Hook

Your organization probably has dozens of forgotten subdomains, expired certificates, and shadow IT assets exposed to the internet right now. The question is: will you discover them before an attacker does?

Context

Traditional asset management tools work from the inside out—they scan your internal networks, query your configuration management databases, and catalog what you already know you own. But this approach has a critical blindspot: it can’t tell you what an external attacker sees when they point their reconnaissance tools at your domain. Enter the external attack surface problem.

Marinus takes the opposite approach. Instead of looking inward, it aggregates publicly available intelligence from the same sources attackers use: Certificate Transparency logs, DNS enumeration, port scanning services like Censys and Shodan, and threat intelligence feeds. Adobe built Marinus to answer a deceptively simple question: ‘What does our organization look like from the outside?’ The tool systematically discovers subdomains, TLS certificates, IP addresses, and services that might have been provisioned by well-meaning developers who spun up a demo environment, acquired through a merger, or simply forgotten after a project ended. For large enterprises with distributed engineering teams, this external perspective is often the only way to find shadow IT before it becomes a security incident.

Technical Insight

Presentation Layer

Storage Layer

Data Collection Layer

Initial domains

Fetch DNS, certs,

security data

Raw intelligence

Store normalized

records

Recursive discovery

subdomains, IPs

Query data

Serve UI/APIs

HTTPS

View footprint

analysis

Root Domains

Seeds

Python Cron Scripts

Third-Party APIs

Censys, VirusTotal, CT Logs

MongoDB 4.x

Normalized Data

Node.js Web Server

REST APIs + UI

nginx Reverse Proxy

optional

Users/Analysts

System architecture — auto-generated

Marinus operates as a three-tier architecture that separates data collection, storage, and presentation. The data collection layer consists of Python scripts designed to run as cron jobs, each targeting a specific third-party intelligence source. These scripts authenticate to services like Censys, VirusTotal, UltraDNS, and various Certificate Transparency log servers, pulling down records related to your registered domains and storing normalized results in MongoDB.

The collection strategy is recursive and deliberately comprehensive. You start by seeding Marinus with your organization’s root domains—say, example.com and example.net. The DNS enumeration scripts then query passive DNS databases and Certificate Transparency logs to discover subdomains like dev.example.com or forgotten-project.example.net. For each discovered subdomain, Marinus queries DNS records (A, AAAA, CNAME, MX, NS, TXT), IP geolocation, WHOIS data, and ASN information. The TLS certificate collectors pull certificates from CT logs and active scanning services, extracting Subject Alternative Names (SANs) that often reveal additional subdomains. Here’s a simplified example of how the certificate discovery process works:

# Simplified from Marinus' CT log collection
import requests
from pymongo import MongoClient

def fetch_ct_certificates(domain, ct_log_url):
    """Query Certificate Transparency log for domain certificates"""
    params = {
        'identity': domain,
        'output': 'json',
        'expand': 'subject,san'
    }
    
    response = requests.get(f'{ct_log_url}/ct/v1/search', params=params)
    certificates = response.json()
    
    discovered_domains = set()
    
    for cert in certificates:
        # Extract SANs (Subject Alternative Names)
        sans = cert.get('extensions', {}).get('san', [])
        for san in sans:
            if san.endswith(domain):
                discovered_domains.add(san)
    
    return discovered_domains

def store_certificates(mongo_client, certificates):
    """Store discovered certificates in MongoDB with metadata"""
    db = mongo_client.marinus
    
    for cert_data in certificates:
        doc = {
            'domain': cert_data['domain'],
            'issuer': cert_data['issuer'],
            'not_before': cert_data['validity']['not_before'],
            'not_after': cert_data['validity']['not_after'],
            'san': cert_data.get('san', []),
            'fingerprint': cert_data['fingerprint'],
            'source': 'ct_log',
            'collected_at': datetime.utcnow()
        }
        
        db.certificates.update_one(
            {'fingerprint': doc['fingerprint']},
            {'$set': doc},
            upsert=True
        )

This approach reveals infrastructure you might not know exists. A developer who created demo.example.com for a customer presentation six months ago probably got a Let’s Encrypt certificate, which was logged to CT servers. Marinus will find it, even if your internal CMDB has no record of it.

The Node.js web tier provides both a REST API and browser UI for querying this aggregated intelligence. The API layer uses Passport.js for authentication and includes Swagger documentation, making it straightforward to integrate Marinus data into your existing security workflows. The UI provides search, filtering, and visualization capabilities—you can search for all certificates expiring in the next 30 days, find all subdomains resolving to a specific IP range, or identify services running outdated TLS versions.

What makes Marinus architecturally interesting is its embrace of eventual consistency and batch processing. The system doesn’t try to provide real-time updates; instead, it acknowledges that external intelligence sources update on their own schedules (CT logs propagate within hours, Censys scans run weekly, passive DNS databases update continuously). Cron jobs run at configurable intervals—daily for most collectors, weekly for expensive API calls—and each execution is idempotent, updating existing records or inserting new ones. This design trades freshness for reliability and cost efficiency. You won’t catch an attacker in real-time, but you will spot configuration drift, policy violations, and forgotten assets that represent persistent risk.

The modular collector design is another strength. Each data source is an independent Python script with its own configuration. If you don’t have a Censys API key, simply don’t enable those collectors. If you only care about certificate management, run just the CT log scripts. This modularity extends to MongoDB schema design—each data source writes to its own collection with a consistent metadata wrapper (source, timestamp, zone/domain), allowing the web tier to federate queries across heterogeneous data types without complex joins.

Gotcha

Marinus carries significant technical debt that will impact your deployment experience. The repository includes both Python 2.x and Python 3.x scripts, with some critical collectors still running on Python 2—a language that reached end-of-life in January 2020. Migrating these scripts to Python 3 will likely be your first task, which means auditing dependencies, testing API integrations, and potentially debugging undocumented behaviors. Adobe has clearly been through this migration internally, but the public repository reflects an awkward transition state.

The operational overhead extends beyond Python versions. Making Marinus useful requires obtaining and configuring API credentials for potentially dozens of services: Censys, Shodan, VirusTotal, PassiveTotal, SecurityTrails, CloudFlare, Infoblox, and more. Many of these services have usage limits on free tiers that make them impractical for large-scale monitoring. A single Censys search query returning 10,000 results could consume your monthly allocation in minutes. Budget for commercial API tiers if you’re monitoring more than a handful of domains. The repository includes configuration templates, but expect several days of credential hunting and quota negotiation before your first successful collection run. Additionally, the low GitHub star count (62) and limited commit activity suggest this isn’t a thriving open-source community. You’ll be largely on your own for troubleshooting and updates.

Verdict

Use if: You’re responsible for external attack surface management at a large enterprise with distributed teams, need to discover shadow IT and forgotten infrastructure, want an ‘attacker’s eye view’ of your public footprint, and have the resources to manage multiple API integrations plus Python 2.x deprecation issues. Marinus excels at aggregating disparate public intelligence sources into a queryable database, making it valuable for compliance reporting, certificate lifecycle management, and discovering acquisition-related infrastructure sprawl. Skip if: You need real-time threat detection, lack budget for commercial API tiers across multiple services, want a low-maintenance solution with active community support, or already have comprehensive external asset visibility through commercial platforms like Censys ASM or Detectify. The Python version split and integration complexity make this a tool for teams with strong DevOps capabilities and clear use cases, not organizations looking for turnkey attack surface monitoring.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/adobe-marinus.svg)](https://starlog.is/api/badge-click/developer-tools/adobe-marinus)