Back to Articles

Cartography: Mapping Your Multi-Cloud Infrastructure as a Security Graph

[ View on GitHub ]

Cartography: Mapping Your Multi-Cloud Infrastructure as a Security Graph

Hook

Your infrastructure has secrets: that dormant Lambda function with cross-account IAM access, the S3 bucket shared between three microservices nobody remembers creating, the Okta user with admin rights to both production AWS and your Kubernetes clusters. Traditional cloud consoles show you resources, but they hide the relationships that matter for security.

Context

Modern infrastructure sprawls across multiple clouds, identity providers, SaaS platforms, and container orchestrators. An organization might have AWS accounts, Azure subscriptions, GCP projects, Kubernetes clusters, Okta tenants, GitHub organizations, and AI platforms like OpenAI or Anthropic—all with overlapping users, permissions, and dependencies. When a security team asks “which developers can access production databases?” or “what would an attacker reach if they compromised this service account?”, answering requires manually correlating data from dozens of separate systems, each with its own API, data model, and query language.

Cartography, now part of the CNCF ecosystem with 3,791 GitHub stars, addresses this fragmentation by pulling infrastructure metadata from approximately 30 platforms and consolidating it into a single Neo4j graph database. Instead of treating your EC2 instances, IAM roles, and RDS databases as isolated lists, Cartography models them as nodes connected by relationships: this instance assumes that role, which has permissions on that database. This graph-centric approach transforms infrastructure from a collection of API responses into a queryable knowledge base where security assumptions can be validated through graph traversal queries.

Technical Insight

Core

Sources

Raw API Data

Graph Nodes & Edges

Write Relationships

Cartography CLI

AWS APIs

boto3

Azure APIs

Kubernetes APIs

GitHub APIs

Okta APIs

Extract Module

Transform & Schema

Standardization

Neo4j Loader

Neo4j Graph

Database

Cypher Queries

& Analysis

System architecture — auto-generated

Cartography operates as a batch synchronization tool with a modular architecture. Each supported platform—AWS, Azure, Kubernetes, Okta, GitHub, and others—has its own Python module that handles API authentication, data extraction, transformation into a standardized schema, and loading into Neo4j. The tool runs on a schedule (via cron, Kubernetes CronJob, or CI/CD pipeline) to snapshot your infrastructure state at regular intervals.

The core workflow follows an extract-transform-load pattern. For AWS, Cartography uses boto3 to enumerate resources across services like EC2, IAM, S3, RDS, Lambda, ECS, EKS, and many others listed in the documentation. It doesn’t just collect the resources themselves—it maps the relationships between them. An EC2 instance becomes a node with edges to its security groups, IAM instance profile, VPC, and any attached EBS volumes. An IAM role connects to the principals that can assume it, the policies attached to it, and the resources those policies grant access to.

Cartography is run via its CLI interface. A typical deployment involves installing the tool, configuring Neo4j, setting up credentials for your cloud providers, and running the sync command:

# Install Cartography
pip install cartography

# Run sync for AWS (other platforms have similar flags)
cartography --neo4j-uri bolt://localhost:7687 --aws-sync

After the sync completes, you can query the Neo4j graph using Cypher to answer security questions. For example, you might query for relationships between IAM roles and resources, or trace paths from external identities to sensitive data stores. The power emerges when you query across platform boundaries—Cartography can map an Okta user to their AWS roles, then to EC2 instances, then to RDS databases those instances connect to.

The tool is extensible through custom analysis jobs, as mentioned in the documentation. The README notes you can write analysis jobs that run after sync completes, though the specific implementation details would need to be explored in the extended documentation. The graph schema is documented, enabling integration with Neo4j visualization tools like Neo4j Browser or Bloom for interactive exploration.

Gotcha

Cartography’s batch synchronization model means your graph is always somewhat stale. If you run it hourly, infrastructure changes made between sync runs won’t be reflected until the next execution. This isn’t real-time monitoring—it’s periodic snapshotting. For organizations that need immediate alerts on misconfigurations, you’ll need complementary tools that react to CloudTrail events or API changes in real-time.

Operating Neo4j adds non-trivial infrastructure overhead. You need to provision compute and storage for the database, manage backups, handle upgrades, and potentially cluster it for high availability. For large AWS organizations with hundreds of accounts, the graph can grow to millions of nodes and edges, requiring careful capacity planning and index tuning. Neo4j isn’t a lightweight dependency—it’s a full graph database with its own operational complexity.

There’s also a learning curve: teams must learn Cypher query language and understand Cartography’s specific schema for each platform. The schema documentation helps, but you’ll spend time understanding how Cartography models various relationships across the 30+ supported platforms (like AWS IAM policy evaluation, Kubernetes RBAC bindings, or Okta federation).

Verdict

Use Cartography if you operate multi-cloud or multi-platform infrastructure and need to answer complex security questions that span organizational boundaries—“which external identities can reach our production data stores” or “what’s the blast radius if this service account is compromised.” It excels at organizations with mature security teams who can invest in learning graph queries and operating Neo4j, and where the value of cross-platform relationship analysis justifies that investment. The tool is battle-tested in production by many companies and benefits from CNCF backing.

Skip it if you’re operating primarily within a single cloud provider where native tools like AWS Security Hub or Azure Security Center provide sufficient visibility, if you need real-time alerting rather than periodic analysis, or if you lack the resources to run and maintain a Neo4j database. Also skip it if your team isn’t comfortable with graph databases—the learning curve is real, and simpler SQL-based asset inventory tools might provide adequate visibility with less cognitive overhead.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/data-knowledge/cartography-cncf-cartography.svg)](https://starlog.is/api/badge-click/data-knowledge/cartography-cncf-cartography)