Scraping GitHub’s Trending Page: Why go-trending Exists and When It’ll Break
Hook
GitHub deliberately hides trending repositories from their official API. If you want that data programmatically, you have exactly one option: scrape it.
Context
GitHub’s trending page is one of the platform’s most visited destinations. Developers browse it to discover hot new projects, track emerging technologies, and spot rising contributors. Yet despite its popularity, GitHub has never exposed trending data through their REST or GraphQL APIs. This isn’t an oversight—it’s intentional. Trending algorithms are complex, resource-intensive to compute, and prone to gaming. By keeping trending calculations server-side and UI-only, GitHub maintains control over how this data is consumed.
This creates a dilemma for developers building tools that need trending data: dashboards for engineering teams, newsletter generators, research projects analyzing open-source trends, or internal metrics for GitHub Enterprise instances. Without an official API, the only path forward is web scraping. That’s where andygrunwald/go-trending comes in. It’s a purpose-built Go library that fetches and parses GitHub’s trending pages, transforming HTML into structured data. It handles the HTTP requests, HTML parsing, and data extraction so you don’t have to maintain your own scraper. For Go developers who need trending data and accept the fragility that comes with scraping, it’s the most straightforward solution available.
Technical Insight
The library’s architecture is refreshingly simple: it’s essentially an HTTP client wrapper with an HTML parser bolted on. Under the hood, it uses Go’s standard net/http package to fetch trending pages and a library like goquery (based on typical Go scraping patterns) to extract data from the DOM. The public API exposes two primary functions: fetching trending repositories and fetching trending developers, both with optional filters for time range (today, this week, this month) and programming language.
Here’s what basic usage looks like:
package main
import (
"fmt"
"github.com/andygrunwald/go-trending"
)
func main() {
trend := trending.NewTrending()
// Get trending Go repositories for today
projects, err := trend.GetProjects(trending.TimeToday, "go")
if err != nil {
panic(err)
}
for _, project := range projects {
fmt.Printf("%s by %s\n", project.Name, project.Owner)
fmt.Printf(" Description: %s\n", project.Description)
fmt.Printf(" Stars today: %d\n", project.StarsToday)
fmt.Printf(" Language: %s\n", project.Language)
}
}
The Project struct returned by GetProjects() contains all the metadata you’d expect: repository name, owner, description, URL, star count, stars added today, language, and language color (for UI rendering). The library also supports fetching trending developers with similar filtering:
developers, err := trend.GetDevelopers(trending.TimeWeek, "rust")
if err != nil {
panic(err)
}
for _, dev := range developers {
fmt.Printf("%s (%s)\n", dev.DisplayName, dev.Username)
if dev.PopularRepository != nil {
fmt.Printf(" Popular repo: %s\n", dev.PopularRepository.Name)
}
}
One architectural detail worth noting: the library supports GitHub Enterprise instances. You can point it at your organization’s internal GitHub by constructing a custom client:
trend := trending.NewTrendingWithURL("https://github.company.com")
This is particularly valuable for companies that want to surface trending internal projects in dashboards or Slack bots. Since GitHub Enterprise uses the same UI as public GitHub, the scraping logic works identically.
The parsing itself relies on CSS selectors hardcoded to match GitHub’s trending page structure. When GitHub renders https://github.com/trending/go?since=daily, the HTML contains article elements with specific classes. The library walks the DOM tree, extracting text from known selectors like .h3.lh-condensed for repository names and .f6.text-gray.mt-2 for metadata. This approach is fast and requires zero authentication, but it’s also the library’s Achilles’ heel. GitHub can (and does) change their HTML structure during redesigns, immediately breaking scrapers.
The library includes no retry logic, no caching layer, and no fallback mechanisms. It makes a single HTTP request, parses the response, and returns structured data or an error. This simplicity is both a strength and weakness—it keeps the codebase small and auditable, but pushes reliability concerns to the caller. If you need resilience, you’ll need to wrap calls in your own retry logic and potentially cache responses to avoid hammering GitHub during outages or rate limiting.
Gotcha
The fundamental limitation is brittleness. Web scraping breaks when the scraped site changes its HTML structure, and GitHub redesigns their UI periodically. When they do, every hardcoded CSS selector in go-trending becomes a potential failure point. The library has 146 stars, suggesting a relatively small user base. When breakage occurs, you may wait days or weeks for a fix, depending on maintainer availability. If you’re building a production system where trending data is critical, this delay is unacceptable.
There’s also no rate limiting protection built into the library. GitHub doesn’t publish rate limits for their web UI, but excessive scraping can trigger throttling or temporary IP bans. If you’re polling trending pages every few minutes across multiple languages and time ranges, you risk getting blocked. The library provides no guidance on safe polling intervals or exponential backoff strategies. You’ll need to implement that yourself, likely using a time-based cache to avoid redundant requests. Additionally, the trending page only shows the top 25 repositories per filter—there’s no pagination support because GitHub’s UI doesn’t paginate. If you need deeper trending data beyond the top 25, this library can’t help you.
Verdict
Use if: You’re building internal tools, dashboards, or hobby projects where occasional breakage is acceptable. You need trending data for GitHub Enterprise instances where official APIs don’t reach. You’re comfortable monitoring the library for updates and can tolerate maintenance windows when GitHub redesigns their UI. You have modest data needs (top 25 repos, no historical analysis) and can implement your own caching and retry logic. Skip if: You’re building production features where reliability is critical. You need historical trending data or want to track trends over time beyond daily/weekly/monthly snapshots. You require real-time updates or sub-hour latency. You’re uncomfortable with web scraping’s legal and ethical gray areas. You need trending data for languages or time ranges GitHub doesn’t support in their UI. In those cases, consider building your own trending algorithm using GitHub’s official GraphQL API to track star velocity, fork rates, and activity metrics—it’s more work upfront but far more reliable long-term.