Mapping the Cloud: How to Track 50 Million IP Addresses Across AWS, Azure, and GCP

Hook

Every hour, cloud providers silently add or remove thousands of IP addresses from their infrastructure. If your firewall rules or threat detection systems aren't tracking these changes, you're either blocking legitimate traffic or leaving backdoors open.

Context

Modern infrastructure teams face a deceptively simple problem: identifying whether an IP address belongs to AWS, Azure, GCP, or another major cloud provider. This matters enormously for security policies—you might want to whitelist connections from your cloud infrastructure, block suspicious requests originating from cloud VPS providers commonly used for attacks, or audit whether shadow IT has spun up resources in unauthorized regions.

Historically, this required maintaining bookmarks to each provider's documentation page, manually downloading JSON or XML files, and writing custom parsers for each format. AWS publishes a JSON endpoint. Azure uses a weekly-updated JSON file buried in the Microsoft Download Center. GCP provides a DNS-queryable format. Oracle uses yet another structure. The cloud-ranges repository solves this fragmentation by providing a single Ruby script that fetches, normalizes, and outputs IP ranges from all major providers in consistent formats. It's the kind of unsexy plumbing code that saves security teams hundreds of hours.

Technical Insight

The architecture is deliberately simple: a collection of Ruby scripts that make HTTP requests to official provider endpoints, parse the responses, and write standardized output. The main entry point is fetch.rb, which orchestrates calls to provider-specific fetchers in the lib/ directory. Each provider gets its own module that handles the peculiarities of that vendor's data format.

Let's examine the AWS fetcher to understand the pattern. Amazon publishes IP ranges at https://ip-ranges.amazonaws.com/ip-ranges.json in a well-structured format:

module AWS
  def self.fetch
    uri = URI('https://ip-ranges.amazonaws.com/ip-ranges.json')
    response = Net::HTTP.get(uri)
    data = JSON.parse(response)
    
    ranges = []
    data['prefixes'].each do |prefix|
      ranges << {
        ip: prefix['ip_prefix'],
        region: prefix['region'],
        service: prefix['service']
      }
    end
    
    ranges
  end
end

The Azure implementation gets messier because Microsoft doesn't provide a stable URL—you need to scrape their download page to find the current week's JSON file. The script uses Nokogiri to parse HTML and extract the latest link, then downloads and parses that JSON file. This brittleness is precisely why consolidation tools like cloud-ranges are valuable: they absorb the complexity of each provider's idiosyncrasies.

Output formatting happens in separate modules. The JSON formatter simply serializes the aggregated data structure. The CSV formatter flattens the nested data, which loses some metadata but produces spreadsheet-friendly output for compliance reports. The plaintext formatter outputs one IP range per line—perfect for piping directly into iptables or feeding to security tools that expect simple lists.

Here's how you'd extend the tool to add a new provider, using DigitalOcean as an example:

# lib/digitalocean.rb
require 'net/http'
require 'csv'

module DigitalOcean
  RANGES_URL = 'https://www.digitalocean.com/geo/google.csv'
  
  def self.fetch
    uri = URI(RANGES_URL)
    response = Net::HTTP.get(uri)
    ranges = []
    
    CSV.parse(response, headers: false) do |row|
      ranges << {
        ip: row[0],
        region: 'unknown',
        service: 'digitalocean'
      }
    end
    
    ranges
  end
end

The modular design means adding providers requires only implementing a fetch method that returns an array of hashes with standardized keys. The main script handles deduplication, sorting, and output formatting.

For security teams, the real power comes from automation. You can schedule this script to run daily via cron, commit the output to a git repository, and trigger CI/CD pipelines when changes are detected:

#!/bin/bash
ruby fetch.rb --format json > cloud-ranges.json
git add cloud-ranges.json
if git diff --staged --quiet; then
  echo "No changes"
else
  git commit -m "Update cloud ranges $(date +%Y-%m-%d)"
  git push
  # Trigger firewall rule update
  ansible-playbook update-firewall-rules.yml
fi

This pattern lets you maintain auditable history of cloud provider IP changes while automatically propagating updates to production systems. For reconnaissance use cases, security researchers use cloud-ranges output to build exclusion lists—filtering out cloud provider IPs from scan results to focus on non-cloud infrastructure that might represent the actual target organization's owned hardware.

Gotcha

The most significant limitation is data freshness. The tool requires manual execution or external scheduling—there's no daemon mode, no built-in caching, and no awareness of how stale the data might be. If AWS adds a new IP range at 2 PM and you run the script at 9 AM, you're operating on outdated information for potentially 23 hours. For security-critical applications, this lag could mean rejected legitimate requests or undetected malicious traffic.

Error handling is minimal. If a provider's endpoint is down, the script typically fails silently or returns partial results without clear indication that data is incomplete. There's no validation that fetched ranges are sensible (checking for obviously wrong CIDR notation, duplicate entries, or comparing against expected range counts). In production, you'd want to wrap this in additional logic that checks response sizes, validates JSON schemas, and alerts on anomalies. The tool also provides no historical tracking—you can't easily answer questions like "when did AWS add the 18.x.x.x range?" without maintaining your own versioned storage. For compliance or security investigation purposes, that historical context can be crucial.

Verdict

Use if: You're building security infrastructure that needs to identify traffic from multiple cloud providers, creating firewall rules for multi-cloud environments, conducting attack surface analysis where you need to distinguish cloud-hosted assets from on-premises infrastructure, or building threat intelligence feeds that categorize attacks by origin type. The tool excels as a component in larger automation pipelines where you control scheduling and can add your own validation logic. Skip if: You need real-time or near-real-time updates (the manual execution model creates unacceptable lag), you require historical tracking of IP range changes, you only work with a single cloud provider (use that vendor's native API instead), or you need enterprise-grade reliability with guaranteed uptime and support. For single-provider scenarios or mission-critical applications, commercial services like ipinfo.io or MaxMind provide better data quality, APIs, and SLAs, though at significant cost.

Mapping the Cloud: How to Track 50 Million IP Addresses Across AWS, Azure, and GCP

Mapping the Cloud: How to Track 50 Million IP Addresses Across AWS, Azure, and GCP

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Mapping the Cloud: How to Track 50 Million IP Addresses Across AWS, Azure, and GCP

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]