urlscan-go: A Minimalist Go Client for Automated URL Threat Analysis
Hook
Most security automation tools force you to choose between simple-but-fragile or complex-but-robust. This 500-line Go library picks simple, and it's refreshingly honest about the tradeoffs.
Context
When you're building security tooling, automation pipelines, or incident response systems, you often need to analyze suspicious URLs at scale. You could scan them manually through urlscan.io's web interface, but that doesn't scale. You could write raw HTTP calls against their REST API, but then you're back to writing the same JSON marshaling boilerplate for the hundredth time.
This is where API client libraries earn their keep—they abstract away the tedious parts while giving you type-safe access to results. The urlscan.io service itself is powerful: it loads URLs in a real browser, captures screenshots, extracts HTTP requests, cookies, DOM structure, and provides threat intelligence signals. But their API returns deeply nested JSON structures that are painful to work with directly. The urlscan-go library by Mizutani attempts to bridge that gap for Go developers, wrapping the submission and retrieval workflow in a handful of idiomatic Go methods.
Technical Insight
The architecture of urlscan-go follows a classic client-wrapper pattern with one interesting twist: it makes async operations feel synchronous through a polling abstraction. At its core, you instantiate a Client with your API key, submit a URL for scanning, get back a Task object, then call Wait() to block until results are ready.
Here's what basic usage looks like:
import "github.com/m-mizutani/urlscan-go"
client := urlscan.NewClient("your-api-key")
// Submit URL for scanning
task, err := client.Submit("https://suspicious-site.example")
if err != nil {
log.Fatal(err)
}
// Block until scan completes
if err := task.Wait(); err != nil {
log.Fatal(err)
}
// Access structured results
for _, cookie := range task.Result.Cookies {
fmt.Printf("Cookie: %s=%s\n", cookie.Name, cookie.Value)
}
for _, req := range task.Result.Requests {
fmt.Printf("Request: %s %s\n", req.Method, req.URL)
}
The Wait() method is where things get interesting. Under the hood, it's polling the urlscan.io API repeatedly until the scan status transitions from "pending" to "completed". The library hardcodes a polling interval and timeout, abstracting away the state machine you'd otherwise implement yourself. This is brilliant for quick scripts and CLI tools where you want synchronous behavior without callback hell.
The Result struct maps to urlscan.io's comprehensive response format, including fields like Page (title, URL, metadata), Cookies (domain, path, security flags), Requests (method, headers, response codes), and DOM structure. This type-safe mapping means you get autocomplete in your IDE and compile-time guarantees that you're accessing valid fields—a huge improvement over parsing raw JSON with map[string]interface{}.
One clever detail: the Task object separates the submission lifecycle from result retrieval. This means you could potentially submit multiple URLs, collect their Task objects, and wait on them concurrently:
tasks := make([]*urlscan.Task, 0)
for _, url := range suspiciousURLs {
task, err := client.Submit(url)
if err != nil {
log.Printf("failed to submit %s: %v", url, err)
continue
}
tasks = append(tasks, task)
}
// Wait concurrently
var wg sync.WaitGroup
for _, task := range tasks {
wg.Add(1)
go func(t *urlscan.Task) {
defer wg.Done()
if err := t.Wait(); err != nil {
log.Printf("scan failed: %v", err)
return
}
// process t.Result
}(task)
}
wg.Wait()
This concurrent pattern works because each Task maintains its own state. The library doesn't manage a global request queue or connection pool beyond what Go's http.Client does by default, which keeps the implementation simple but puts the orchestration burden on you.
The HTTP layer uses standard library components—no external dependencies beyond basic testing utilities. The Client wraps Go's http.Client, constructs requests with proper headers (including the API key), and unmarshals responses into pre-defined structs. Error handling follows Go idioms: methods return error values that you check explicitly. No panics, no exceptions, no surprises.
Gotcha
The simplicity comes at a cost. First, there's zero documentation beyond a minimal README example. Want to know what fields are available in Result.Page? You're reading source code. Need to understand how verdicts work or what the security scores mean? You're heading to urlscan.io's API docs. The library makes no attempt to explain the domain model it's wrapping.
More critically, there's no rate limiting, backoff, or retry logic. If you're scanning thousands of URLs, you'll hit API quotas and get 429 responses. The library will surface these as errors, but you're writing the retry loop yourself. Similarly, network timeouts are handled by whatever defaults your http.Client has—there's no sophisticated timeout management for long-running scans. The Wait() method polls, but if urlscan.io is slow or the scan gets stuck, you could be waiting indefinitely unless you wrap the call in a context with a deadline.
With only 8 stars and no recent commit activity visible, this feels like a side project rather than actively maintained infrastructure. If urlscan.io changes their API schema—adds fields, deprecates endpoints, modifies response formats—you might find yourself forking and patching. The low adoption also means no community around it: no open issues discussing patterns, no Stack Overflow answers, no battle-tested examples from production deployments.
Verdict
Use if: You're building a one-off security automation script, incident response tool, or internal CLI where you need quick integration with urlscan.io and the blocking Wait() semantics match your workflow. The type-safe result structs save time compared to hand-parsing JSON, and you're comfortable reading source code when documentation is lacking. Also use if you're only scanning URLs occasionally and don't need sophisticated error handling or rate limiting. Skip if: You're building production services that need robust retry logic, rate limiting, comprehensive error handling, or detailed documentation for team onboarding. Also skip if you need confidence in long-term maintenance—the low star count and unclear activity suggest this could go stale. In those cases, either use the urlscan.io REST API directly with a battle-tested HTTP client library like go-resty (giving you retries and better error handling), or evaluate whether a different threat intelligence platform with more mature Go support fits your needs. The raw API approach takes maybe 100 lines of code and gives you full control over edge cases this library doesn't handle.