Marinus: Adobe's External Attack Surface Mapper for Shadow IT Discovery

Hook

Your organization probably has hundreds of forgotten subdomains, expired certificates, and shadow IT services exposed to the internet right now. Marinus is Adobe's answer to the question: what can an attacker learn about us before they even knock on the door?

Context

Modern enterprises leak information constantly. Every DNS record, TLS certificate, third-party SaaS integration, and cloud service creates digital breadcrumbs that attackers systematically collect. When a vulnerability drops in a popular JavaScript library or a third-party service gets breached, security teams face an impossible question: where are we using this across our entire infrastructure?

Adobe built Marinus to solve this external reconnaissance problem at scale. Unlike traditional asset management systems that require agents or network access, Marinus approaches security from an attacker's perspective—it crawls public data sources to build a comprehensive map of your organization's external attack surface. The tool aggregates Certificate Transparency logs, DNS records, Censys scans, VirusTotal intelligence, and other OSINT sources into a unified MongoDB database, then exposes this data through a Node.js web interface. It's designed for incident responders who need to quickly answer questions like "which of our 500 subdomains are still using Log4j?" or "what assets did we acquire from that company we purchased last year?"

Technical Insight

System architecture — auto-generated

Marinus operates as a three-tier architecture that decouples data collection, storage, and presentation. Python scripts run on cron schedules to poll third-party APIs, normalize the responses, and upsert records into MongoDB. A separate Node.js/Express application queries this database and renders both a web UI and RESTful API endpoints documented through Swagger.

The Python collection layer is modular—each script focuses on a specific data source. For example, the Certificate Transparency collector queries sources like crt.sh and Facebook's CT API to discover all certificates issued for your domains. Here's a simplified example of how the CT collection pattern works:

import requests
import re
from libs3 import MongoConnector

def fetch_ct_logs(domain):
    url = f"https://crt.sh/?q=%.{domain}&output=json"
    response = requests.get(url)
    certs = response.json()
    
    mongo = MongoConnector.MongoConnector()
    ct_collection = mongo.get_collection('ct_certs')
    
    for cert in certs:
        # Extract SANs and normalize
        sans = extract_sans(cert['name_value'])
        
        record = {
            'domain': domain,
            'issuer': cert['issuer_name'],
            'not_before': cert['not_before'],
            'not_after': cert['not_after'],
            'sans': sans,
            'updated': datetime.now()
        }
        
        # Upsert to avoid duplicates
        ct_collection.update_one(
            {'sha256': cert['sha256']},
            {'$set': record},
            upsert=True
        )

This pattern repeats across collectors for DNS enumeration, Censys host scanning, and threat intelligence. Each script is independent, allowing you to enable only the data sources your organization needs or has API access to. The MongoDB schema uses collections like ct_certs, all_dns, zgrab_443, and virustotal that link records via common keys like domain names and IP addresses.

The Node.js frontend handles query complexity. Instead of forcing users to write MongoDB aggregations, the UI provides filtered views like "Show me all domains with expired certificates" or "Find all hosts running nginx." The Express routes translate these UI interactions into efficient MongoDB queries:

router.get('/api/expired-certs', async (req, res) => {
  const now = new Date();
  
  const expiredCerts = await db.collection('ct_certs')
    .find({
      not_after: { $lt: now },
      'zone': { $in: req.user.zones }  // Scope to authorized domains
    })
    .sort({ not_after: -1 })
    .limit(100)
    .toArray();
    
  res.json(expiredCerts);
});

Marinus also includes correlation logic to connect disparate data points. If a certificate lists a subdomain that doesn't appear in your DNS records, it flags potential shadow IT—someone may have created infrastructure outside your DNS provider. Similarly, if Censys detects an HTTP server on one of your IPs but your certificate data doesn't show recent renewal, you might have an orphaned service.

The architecture's strength is extensibility. Need to add a new data source? Write a Python script that populates a new MongoDB collection, then add Express routes to query it. Adobe designed this for security teams with engineering resources who need to customize workflows—not as a turnkey SaaS product. The codebase includes helper libraries for MongoDB connections, configuration management, and logging that accelerate custom script development.

One notable design choice: Marinus doesn't perform active scanning itself. Instead, it relies on third-party services like Censys and Shodan that have already scanned the internet. This keeps your reconnaissance passive and avoids triggering IDS alerts or violating terms of service. You're viewing your infrastructure through the same lens attackers use during the reconnaissance phase of the kill chain.

Gotcha

The biggest gotcha is the Python 2.x legacy code scattered throughout the repository. While some scripts have been migrated to Python 3, others still use Python 2 syntax and libraries. You'll encounter print statements without parentheses, old-style string formatting, and imports from deprecated modules. Running this in modern environments requires maintaining both Python 2 and 3 interpreters, which is increasingly problematic as distributions drop Python 2 support. The migration is incomplete, and the documentation doesn't clearly specify which scripts require which version.

Setup complexity is another barrier. Before you can collect any data, you need API keys for Censys, VirusTotal, Shodan, PassiveTotal, and several other services. Some of these are paid services with significant costs at enterprise scale. The configuration requires editing multiple Python files to insert credentials, and there's no central configuration management. If you're evaluating Marinus, expect several days of setup before you see useful results. The tool also assumes you have MongoDB expertise—there's no wizard for database setup, index creation, or backup strategies. With 62 GitHub stars and minimal community activity, troubleshooting means reading source code rather than finding Stack Overflow answers.

Verdict

Use if: You're a security team at a mid-to-large enterprise with engineering resources, already paying for multiple threat intelligence APIs, and need a customizable platform to aggregate external reconnaissance data into internal workflows. Marinus excels when you need to answer incident response questions quickly ("where are we vulnerable to CVE-X?") or track infrastructure changes over time. It's also valuable if you're undergoing M&A and need to inventory acquired companies' external assets.

Skip if: You're a small team looking for turnkey asset discovery, lack Python/MongoDB expertise, or don't already have API access to the required third-party services. The Python 2.x technical debt and setup complexity make this a poor choice for quick wins. Consider commercial ASM platforms like Censys or modern alternatives like OWASP Amass if you need active community support. Also skip if you need real-time monitoring—Marinus operates on polling intervals measured in hours or days, not seconds.

Marinus: Adobe's External Attack Surface Mapper for Shadow IT Discovery

Marinus: Adobe's External Attack Surface Mapper for Shadow IT Discovery

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Marinus: Adobe's External Attack Surface Mapper for Shadow IT Discovery

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]