Back to Articles

Building a Serverless Honeytoken System with AWS Lambda: Inside honeyλ

[ View on GitHub ]

Building a Serverless Honeytoken System with AWS Lambda: Inside honeyλ

Hook

A security monitoring system that costs exactly $0.00 per month until someone actually attacks you—this is the economic inversion that makes serverless honeypots so compelling.

Context

Traditional honeypots require dedicated infrastructure: servers to maintain, patches to apply, and monthly bills regardless of whether anyone takes the bait. This creates a paradox where security tools designed to detect occasional intrusions consume continuous resources. Honeytokens—URLs, credentials, or other breadcrumbs planted to detect unauthorized access—face the same problem. Deploy a honey URL in a document to detect data leakage, and you're paying for 24/7 server uptime to catch an event that might happen once or never.

The serverless revolution promised pay-per-execution economics, but security tools were slow to adopt the model. honeyλ bridges this gap by reimagining honeytokens as ephemeral Lambda functions behind API Gateway endpoints. Each fake URL exists as a configured endpoint that springs to life only when accessed, collects forensic data, performs threat intelligence lookups, and sends alerts—all within a single invocation. For security teams practicing deception at scale, this means deploying hundreds of honeytokens across documents, emails, and systems with costs measured in fractions of cents per actual detection.

Technical Insight

HTTP Request

Extract metadata

IP address

Enrichment data

Headers, User-Agent

Formatted alert

Webhook

Email

SMS

Load config

Deploy & manage

Deploy & manage

API Gateway

Honeytoken URLs

Lambda Function

Request Parser

Cymon API

Threat Intel

Alert Formatter

Notification

Dispatcher

Slack

AWS SES

Twilio

S3 Bucket

Config Storage

Serverless Framework

System architecture — auto-generated

The architecture elegance of honeyλ lies in its simplicity: API Gateway routes act as the honeytoken URLs, Lambda functions handle the logic, and the Serverless Framework orchestrates everything. When you deploy, you're essentially creating a configuration-driven alert system where each endpoint is a potential tripwire.

The core Lambda function does four critical tasks in sequence. First, it extracts request metadata—source IP, headers, query parameters, and user agent strings. Second, it queries the Cymon threat intelligence API to enrich the source IP with known malicious indicators. Third, it formats this data into an alert payload. Finally, it dispatches notifications through configured channels (Slack, email via SES, or SMS via Twilio). Here's the heart of the threat intelligence enrichment:

def cymon_ip_lookup(ip):
    """Query Cymon API for IP reputation data"""
    url = f"https://api.cymon.io/v2/ioc/search/ip/{ip}"
    headers = {'Accept': 'application/json'}
    
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            data = response.json()
            return {
                'malicious': data.get('hits', 0) > 0,
                'sources': data.get('sources', []),
                'tags': data.get('tags', [])
            }
    except Exception as e:
        logger.error(f"Cymon lookup failed: {e}")
    return None

The configuration flexibility is where honeyλ shines for real-world deployment. You can define different response behaviors per token, enabling sophisticated deception scenarios. Want a tracking pixel that returns a 1x1 transparent GIF? Configure a binary response. Need a fake API endpoint that returns plausible JSON to string along a scraper? Define custom response bodies. The configuration lives in JSON and can be stored locally or pulled from S3 for centralized management:

{
  "tokens": [
    {
      "name": "document-leak-detector",
      "path": "/docs/confidential/q4-2024",
      "response": {
        "statusCode": 200,
        "body": "File not found",
        "headers": {"Content-Type": "text/plain"}
      },
      "alert_channels": ["slack", "email"]
    },
    {
      "name": "tracking-pixel",
      "path": "/pixel.gif",
      "response": {
        "statusCode": 200,
        "isBase64Encoded": true,
        "body": "R0lGODlhAQABAIAAAP///wAAACH5BAEAAAAALAAAAAABAAEAAAICRAEAOw==",
        "headers": {"Content-Type": "image/gif"}
      },
      "alert_channels": ["slack"]
    }
  ]
}

The Serverless Framework configuration (serverless.yml) abstracts away the complexity of Lambda permissions, IAM roles, and API Gateway setup. Each honeytoken becomes an HTTP event trigger, and the framework automatically provisions the necessary AWS resources. This Infrastructure-as-Code approach means you can version control your entire honeytoken deployment and replicate it across AWS accounts or regions with minimal effort.

One architectural decision worth noting: honeyλ performs synchronous threat intelligence lookups within the Lambda execution. This adds latency to the HTTP response (potentially revealing the honeypot nature if an attacker measures timing), but ensures enriched data reaches your alert channels immediately. In production deployments handling sophisticated adversaries, you might modify this to return the HTTP response instantly and perform TI lookups asynchronously via SNS/SQS.

The alert dispatch mechanism supports multiple channels through a pluggable design. Slack notifications use incoming webhooks, SES handles email, and Twilio provides SMS capabilities. This multi-channel approach ensures critical alerts reach responders regardless of their monitoring setup. For teams with existing SIEM infrastructure, you could extend the codebase to POST alerts to a central logging endpoint instead.

Gotcha

The biggest operational gotcha is the binary media type limitation. If you want to serve tracking pixels or other image-based honeytokens, API Gateway requires manual console configuration to add binary media types (like image/gif or image/png) to your API settings. The Serverless Framework doesn't expose this configuration option, meaning you break the Infrastructure-as-Code promise—redeploying or creating a new stack requires remembering to manually configure these settings again. For teams running multiple environments or frequent redeployments, this becomes a documentation burden and potential failure point.

The threat intelligence integration relies entirely on the Cymon API, which introduces external dependencies and potential points of failure. If Cymon experiences downtime, rate limits your requests, or the API changes (it's currently on v2), your enrichment pipeline breaks. There's no fallback to alternative TI sources like AbuseIPDB, GreyNoise, or VirusTotal. For production deployments, you'd want to implement a circuit breaker pattern and graceful degradation—still alert even if TI enrichment fails. Additionally, while the Serverless Framework theoretically supports multiple cloud providers, honeyλ's code is AWS-specific in its use of boto3 for S3 config loading and assumptions about Lambda's execution environment. Porting to Azure Functions or Google Cloud Functions would require non-trivial refactoring.

Verdict

Use if: You're running AWS infrastructure and need cost-effective deception capabilities to detect document leakage, unauthorized reconnaissance, or content scraping. It's particularly valuable if you want to deploy dozens or hundreds of honeytokens without worrying about infrastructure costs or maintenance—the serverless economics make this viable where traditional honeypots wouldn't be. Also ideal if you're already using the Serverless Framework and want Infrastructure-as-Code for your security monitoring. Skip if: You need real-time, high-volume monitoring (Lambda cold starts add latency), require non-HTTP protocols (this is strictly web-based), or are committed to non-AWS cloud providers. Also skip if you need a comprehensive deception platform with attacker interaction capabilities—honeyλ detects access but doesn't engage adversaries. For those scenarios, consider full honeypot frameworks like OpenCanary or managed services like Thinkst Canary that offer broader protocol support and more sophisticated interaction capabilities.