Scraping GitHub Trending: How go-trending Fills the Gap in GitHub's API
Hook
GitHub's API has over 600 endpoints covering nearly every aspect of the platform—except one conspicuous omission: trending repositories. If you want to programmatically access what's hot on GitHub, you're forced to scrape HTML like it's 2005.
Context
The GitHub trending page is one of the platform's most visited destinations. Developers check it daily to discover new projects, find inspiration, and keep their finger on the pulse of the open-source world. Yet despite GitHub's comprehensive REST and GraphQL APIs, there's no official endpoint for trending data.
This creates a genuine problem for anyone building developer tools, dashboards, or analytics platforms that need trending information. Want to build a newsletter featuring hot repos? A Slack bot announcing trending projects in your team's tech stack? A research tool analyzing trending patterns? You're stuck either manually copying data or building a scraper. Enter go-trending: a focused Go library that does the dirty work of parsing GitHub's trending HTML and returning clean, structured data through an idiomatic Go interface.
Technical Insight
The architecture of go-trending is refreshingly straightforward: it's essentially a specialized HTTP client paired with an HTML parser. The library wraps all complexity behind a simple Trending struct that acts as your entry point. Here's how you'd use it to fetch today's trending Go repositories:
package main
import (
"fmt"
"log"
"github.com/andygrunwald/go-trending"
)
func main() {
trend := trending.NewTrending()
// Fetch trending Go repos for today
projects, err := trend.GetProjects(trending.TimeToday, "go")
if err != nil {
log.Fatal(err)
}
for _, project := range projects {
fmt.Printf("%s: %s\n", project.Name, project.Description)
fmt.Printf(" Stars today: %d | Total: %d\n",
project.StarsToday, project.Stars)
}
}
Under the hood, the library constructs URLs pointing to GitHub's trending pages (e.g., github.com/trending/go?since=daily), fetches the HTML, and uses the goquery library—a jQuery-like selector library for Go—to extract data from specific DOM elements. The returned Project struct contains fields like Name, Owner, Description, Language, Stars, StarsToday, and URL.
What makes this design smart is its separation of concerns. The library defines clear time period constants (TimeToday, TimeWeek, TimeMonth) and handles URL construction internally. You don't need to know that GitHub uses since=daily query parameters or understand the HTML structure—you just specify what data you want.
The library also supports fetching trending developers, not just repositories:
developers, err := trend.GetDevelopers(trending.TimeWeek, "rust")
if err != nil {
log.Fatal(err)
}
for _, dev := range developers {
fmt.Printf("%s (%s)\n", dev.DisplayName, dev.Username)
if dev.TrendingProject != nil {
fmt.Printf(" Trending: %s\n", dev.TrendingProject.Name)
}
}
One particularly thoughtful feature is the GetLanguages() method, which returns all programming languages GitHub supports for trending filters. This prevents hardcoding language names and keeps your code resilient if GitHub adds new languages. The library returns a slice of Language structs containing both the display name and the URL parameter value.
For organizations running GitHub Enterprise, the library supports custom base URLs:
trend := trending.NewTrending()
trend.BaseURL = "https://github.yourcompany.com"
The implementation wisely uses Go's standard net/http package rather than introducing heavyweight dependencies. Error handling follows Go conventions, returning errors rather than panicking, which allows callers to decide how to handle failures. The parsing logic includes nil checks and safe navigation to prevent panics when GitHub's HTML structure varies slightly.
One architectural decision worth noting: the library performs synchronous, blocking HTTP requests. There's no built-in concurrency, which keeps the API simple but means you'll need to manage goroutines yourself if you're fetching multiple time periods or languages concurrently. This is actually a reasonable choice—concurrency patterns vary widely depending on use case, and forcing an opinion here would reduce flexibility.
Gotcha
The fundamental limitation is inherent to all web scraping: fragility. GitHub can change their HTML structure at any time—adding new CSS classes, restructuring the DOM, or switching to a client-side rendered framework—and break this library instantly. There's no SLA, no deprecation notice, no compatibility guarantee. The library's test suite can't protect against GitHub's unannounced changes.
In practice, you'll also encounter rate limiting issues that aren't handled by the library. GitHub serves these trending pages through their CDN with standard rate limits for unauthenticated requests. Make too many requests in quick succession, and you'll start seeing 429 errors or temporary blocks. The library doesn't implement exponential backoff, request queuing, or caching—you'll need to build that yourself. There's also no way to authenticate requests, which means you're stuck with public rate limits even if you have a GitHub token. Additionally, the library lacks context timeout support, so a hanging network request could block indefinitely unless you implement your own timeout wrapper. If GitHub's trending page takes 30 seconds to load (or never responds), your application hangs with it.
Verdict
Use if: You're building internal tools, personal projects, or analytics dashboards where occasional downtime is acceptable and you need programmatic access to GitHub trending data. This library provides the cleanest Go interface available for a feature GitHub refuses to officially support. It's perfect for weekend projects, research scripts, or proof-of-concepts where the alternative is manual data collection. The code is readable enough that you can fork and fix it yourself when GitHub inevitably changes their HTML.
Skip if: You're building production systems where reliability matters, customer-facing features that need uptime guarantees, or anything mission-critical. The scraping approach is fundamentally unstable—you're one GitHub redesign away from complete failure. Also skip this if you need historical data or high-frequency polling; scraping isn't appropriate for either use case. Consider GitHub Archive for historical analysis or manual periodic snapshots if real-time trending isn't actually required for your use case.