Back to Articles

Building a Screenshot API with Qt WebKit: Inside url2img's Zero-Dependency Architecture

[ View on GitHub ]

Building a Screenshot API with Qt WebKit: Inside url2img's Zero-Dependency Architecture

Hook

Most screenshot services bundle an entire Chromium installation that weighs 300MB+. url2img delivers the same functionality in a single 50MB static binary with zero runtime dependencies.

Context

Web screenshot services are surprisingly common infrastructure components. They power PDF generators, monitoring systems that capture visual regressions, social media preview cards, archival systems, and automated testing pipelines. The traditional approach uses headless Chrome or Chromium through Puppeteer or Selenium—powerful solutions that come with significant operational overhead. You need Node.js or Python runtimes, Chrome binaries, system libraries, font packages, and often containerization to wrangle all these dependencies into something deployable.

url2img takes a radically different approach. Built by gen2brain, it wraps Qt WebKit—the same browser engine that powered Safari until 2013—into a simple HTTP API server. The entire application compiles into a single static binary using musl libc, eliminating all external dependencies. No runtime, no browser installation, no shared libraries. Just one executable that starts an HTTP server and renders web pages to images. It's a throwback to an era of simpler deployment models, packaged for modern container and microservice architectures.

Technical Insight

url2img's architecture centers on Qt WebKit bindings through therecipe/qt, a comprehensive Go wrapper for the Qt framework. QtWebKit provides the full web rendering stack—HTML parser, CSS engine, JavaScript interpreter, and raster graphics output—as a C++ library. The Go bindings expose this through CGO, letting url2img control page loading, viewport configuration, and image capture without spawning external processes.

The HTTP API is straightforward. You send a GET request with the target URL as a parameter, and the server returns a rendered image:

# Basic screenshot as PNG
curl 'http://localhost:8080/?url=https://example.com' > screenshot.png

# Full page capture with custom viewport
curl 'http://localhost:8080/?url=https://example.com&width=1920&height=1080&full=true' > full_page.png

# JPEG with quality control and zoom
curl 'http://localhost:8080/?url=https://example.com&format=jpg&quality=85&zoom=2.0' > screenshot.jpg

# Base64-encoded output for embedding
curl 'http://localhost:8080/?url=https://example.com&output=base64'

The rendering pipeline is synchronous by design. When a request arrives, url2img creates a QWebView instance, sets viewport dimensions, loads the target URL, and blocks until the page signals completion. A configurable delay parameter (&delay=5000) lets you wait for JavaScript-heavy pages to finish rendering. Once stable, the view's contents are rasterized to QImage, encoded to PNG or JPEG, and streamed back through the HTTP response.

The caching implementation is where url2img shows production-ready thinking. It implements RFC7234 disk caching, storing rendered screenshots keyed by the request URL and parameters. When identical requests arrive, the server checks cache freshness using Cache-Control headers and serves cached images if valid. You control cache duration with the max-age parameter:

# Cache for 1 hour (3600 seconds)
curl 'http://localhost:8080/?url=https://example.com&max-age=3600'

# Force cache bypass
curl -H 'Cache-Control: no-cache' 'http://localhost:8080/?url=https://example.com'

Authentication uses HTTP Basic Auth verified against htpasswd files generated with Apache's htpasswd utility or compatible tools. The server loads the htpasswd file at startup and checks credentials on every request:

# Generate credentials
htpasswd -c .htpasswd username

# Start server with auth
url2img -htpasswd=.htpasswd

# Make authenticated request
curl -u username:password 'http://localhost:8080/?url=https://example.com'

The static compilation process is url2img's secret weapon for zero-dependency deployment. By linking against musl libc instead of glibc and statically embedding Qt WebKit, the build produces a fully self-contained binary. This binary runs on any Linux system regardless of installed libraries—critical for containerized deployments where you want minimal base images. The Dockerfile shows this approach clearly: it starts from a Qt build container, compiles the Go application with static flags, and the final image needs only the binary itself.

One architectural detail worth noting: url2img doesn't implement request queuing or concurrency limiting. Each incoming request spawns a QtWebKit rendering context and blocks a goroutine until completion. For low-to-moderate traffic or pre-cached scenarios this works fine, but under heavy load with slow-loading pages, you'll want to front the service with a queue system or rate limiter to prevent resource exhaustion.

Gotcha

The elephant in the room is QtWebKit's deprecated status. WebKit forked into Blink (Chrome's engine) in 2013, and Apple moved Safari to its own WebKit evolution. QtWebKit itself was removed from Qt 5.6+ in favor of QtWebEngine (Chromium-based). The version url2img uses represents 2013-era web standards. Modern JavaScript features—async/await, ES6 modules, many CSS Grid and Flexbox properties—don't exist in this engine. Single-page applications built with React, Vue, or Angular often fail to render correctly because they rely on APIs QtWebKit doesn't support.

In practice, this limits url2img to relatively static websites. Corporate landing pages, documentation sites, traditional CMSs like WordPress (depending on the theme), and server-rendered applications generally work fine. But try to screenshot a modern web app dashboard, a GitHub repository page, or any site with heavy client-side rendering, and you'll get broken layouts or blank pages. There's no workaround beyond switching to a Chromium-based solution. Additionally, synchronous request processing means a single slow-loading page can block the entire server. If someone requests a screenshot of a site that takes 30 seconds to load, all subsequent requests wait in line. This isn't a problem for internal tools with controlled inputs, but it's a serious limitation for public-facing APIs or high-traffic scenarios.

Verdict

Use if: You're capturing screenshots of traditional, server-rendered websites in a controlled environment. You need a self-contained binary that runs anywhere without dependencies. You're building internal tooling—monitoring dashboards, PDF generators for documentation, archival systems—where you control the input URLs and can verify compatibility. You value operational simplicity and deployment convenience over bleeding-edge web standards support. You're running in restricted environments (air-gapped networks, minimal containers, embedded systems) where installing Chrome is impractical. Skip if: You need to capture modern single-page applications or JavaScript-heavy sites. You require high concurrency with protection against slow-loading pages blocking the server. You want ongoing browser engine updates and modern CSS/JavaScript support. You're building a public-facing API where users submit arbitrary URLs—the compatibility limitations will generate too many support issues. In those cases, invest the operational complexity in Puppeteer, Playwright, or Chrome Headless.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/automation/gen2brain-url2img.svg)](https://starlog.is/api/badge-click/automation/gen2brain-url2img)