Building a Serverless Screenshot Service with AWS Lambda and PhantomJS

Hook

In 2024, one of the most popular serverless screenshot implementations on GitHub still runs PhantomJS—a browser engine that hasn't been updated since 2016. Here's why it still works, and what you should know before deploying it.

Context

Screenshot services are deceptively complex. You need a headless browser to render web pages, compute resources to execute JavaScript, storage for images, a CDN for delivery, and ideally some thumbnail generation for different viewport sizes. Traditionally, this meant maintaining long-running servers with browsers installed, managing memory leaks from browser processes, and scaling infrastructure to handle traffic spikes.

The serverless-screenshot project tackles this complexity by leveraging AWS Lambda's event-driven architecture. Instead of maintaining servers, it uses Lambda functions triggered by API requests and S3 events. The entire pipeline—from URL submission to thumbnail generation—runs on ephemeral compute that only costs money when screenshots are actually being captured. It's an elegant proof-of-concept for serverless image processing pipelines, even if the underlying browser engine shows its age.

Technical Insight

System architecture — auto-generated

The architecture uses two Lambda functions orchestrated through S3 events rather than direct invocation. When you POST a URL to the API Gateway endpoint, the first Lambda function receives it, hashes the URL for deduplication, and launches PhantomJS to capture a full-page screenshot. This screenshot gets uploaded to S3, which automatically triggers the second Lambda function to generate thumbnails.

Here's how the screenshot capture works in the first Lambda function:

const phantom = require('phantom');
const crypto = require('crypto');

// Generate consistent hash from URL for deduplication
const hash = crypto.createHash('md5').update(url).digest('hex');
const filename = `screenshots/${hash}.png`;

// Launch PhantomJS and capture screenshot
const instance = await phantom.create();
const page = await instance.createPage();

await page.property('viewportSize', { width: 1280, height: 1024 });
await page.open(url);

// Wait 3 seconds for page to load
await new Promise(resolve => setTimeout(resolve, 3000));

// Capture full page
const buffer = await page.renderBase64('PNG');
await s3.putObject({
  Bucket: process.env.BUCKET_NAME,
  Key: filename,
  Body: Buffer.from(buffer, 'base64'),
  ContentType: 'image/png'
}).promise();

The URL hashing is clever—submitting the same URL twice doesn't capture a new screenshot, it just returns the existing one. This saves both compute time and storage costs while providing instant responses for previously captured URLs.

What makes this architecture elegant is the S3 trigger pattern. The first Lambda doesn't need to know about thumbnail generation at all. Once it uploads the screenshot to S3, its job is done. The S3 bucket has a trigger configured to invoke the second Lambda function whenever a new screenshot appears:

// Triggered automatically when screenshot lands in S3
exports.handler = async (event) => {
  const bucket = event.Records[0].s3.bucket.name;
  const key = event.Records[0].s3.object.key;
  
  // Get the original screenshot
  const original = await s3.getObject({ Bucket: bucket, Key: key }).promise();
  
  // Use ImageMagick (built into Lambda) to generate thumbnails
  const sizes = [
    { width: 320, height: 240, suffix: 'small' },
    { width: 640, height: 480, suffix: 'medium' },
    { width: 1024, height: 768, suffix: 'large' }
  ];
  
  for (const size of sizes) {
    // ImageMagick is already installed in Lambda environment
    const resized = await execImageMagick(
      original.Body,
      ['-resize', `${size.width}x${size.height}`, '-quality', '85']
    );
    
    const thumbnailKey = key.replace('.png', `-${size.suffix}.png`);
    await s3.putObject({
      Bucket: bucket,
      Key: thumbnailKey,
      Body: resized,
      ContentType: 'image/png'
    }).promise();
  }
};

This event-driven decoupling is a textbook example of serverless patterns. The two functions scale independently, failures in thumbnail generation don't affect screenshot capture, and you can modify thumbnail sizes without touching the capture logic. It's also cost-efficient—you only pay for the exact compute time needed for each step.

The PhantomJS binary is pre-compiled for Amazon Linux and bundled directly in the Lambda deployment package. This is necessary because Lambda doesn't provide a headless browser by default, and you can't install packages at runtime. The binary is about 50MB, which fits comfortably within Lambda's 250MB deployment package limit. The implementation uses a three-second hardcoded wait after page load to allow JavaScript to execute, which works for simple pages but breaks for single-page applications with slow hydration.

CloudFront sits in front of the S3 bucket to serve screenshots efficiently. When you request a screenshot through the GET endpoint, CloudFront caches it at edge locations worldwide, reducing latency and S3 costs. The first request after screenshot capture pulls from S3, subsequent requests hit CloudFront's cache. This is critical for screenshot services where the same images get requested repeatedly.

Gotcha

PhantomJS is the elephant in the room. The project hasn't been maintained since 2016, and modern websites increasingly use features that PhantomJS doesn't support—CSS Grid, modern JavaScript syntax, newer Web APIs. You'll get broken screenshots on many contemporary sites, especially those built with React, Vue, or other frameworks that rely on modern browser features. There's no configuration to extend the three-second page load timeout, so slow-loading sites will produce partial screenshots with missing content.

The thumbnail generation Lambda fails silently. If ImageMagick encounters an error processing an image, the failure doesn't propagate back to the client. You'll successfully capture a screenshot but end up with missing thumbnails and no clear indication why. CloudWatch logs show the errors, but there's no monitoring or alerting built in. For production use, you'd need to add dead-letter queues, CloudWatch alarms, and retry logic. The deployment also requires manual CloudFormation stack creation—there's no infrastructure-as-code with proper parameterization for different environments.

Verdict

Use if: You need a quick proof-of-concept screenshot service for internal tools, you're capturing screenshots of simple static websites that don't require modern browser features, or you want to learn serverless event-driven architectures with a concrete example. This project excels as educational material for understanding S3 triggers and Lambda function chaining. Skip if: You need production-grade screenshot capabilities for modern websites, you require reliable rendering of JavaScript-heavy single-page applications, or you need support and ongoing maintenance. The PhantomJS dependency is a non-starter for 2024 production workloads. Instead, investigate Puppeteer with chrome-aws-lambda or @sparticuz/chromium layers, which provide headless Chrome on Lambda with current browser rendering. For commercial projects, SaaS options like Urlbox or ScreenshotAPI eliminate infrastructure concerns entirely while providing better rendering quality and reliability.

Building a Serverless Screenshot Service with AWS Lambda and PhantomJS

Building a Serverless Screenshot Service with AWS Lambda and PhantomJS

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Building a Serverless Screenshot Service with AWS Lambda and PhantomJS

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]