deedy/edgar: The Undocumented Python Library That 104 Developers Found Anyway
Hook
A Python repository with 104 stars, zero documentation, no README, and no description somehow became popular enough for developers to find and use. What does that tell us about the state of financial data tooling?
Context
The SEC's EDGAR (Electronic Data Gathering, Analysis, and Retrieval) system is a goldmine of financial data—10-Ks, 10-Qs, proxy statements, and insider trading reports from every public company in America. But accessing this data programmatically has historically been a nightmare. The EDGAR interface was designed for humans in the 1990s, not for APIs in 2024.
For years, developers have been building scrapers, parsers, and downloaders to extract structured data from EDGAR's labyrinthine directory structure and SGML-formatted filings. When deedy/edgar appeared, it likely solved a specific pain point well enough that over a hundred developers starred it despite the complete absence of documentation. This phenomenon—useful code spreading through word-of-mouth and Stack Overflow snippets rather than proper documentation—represents both the best and worst of open-source culture. The repository's popularity suggests it works, but its lack of documentation makes it a liability for any serious project.
Technical Insight
Without a README or description, understanding deedy/edgar requires detective work. Based on the repository name and Python ecosystem patterns, this is almost certainly a wrapper around SEC EDGAR's FTP and HTTP interfaces. Most EDGAR libraries follow a similar pattern: they construct URLs to EDGAR's directory structure, parse the index files, download filings, and optionally extract structured data from SGML or XBRL formats.
A typical EDGAR URL follows this structure:
# EDGAR filing URL pattern
base_url = "https://www.sec.gov/cgi-bin/browse-edgar"
params = {
"action": "getcompany",
"CIK": "0000320193", # Apple Inc.
"type": "10-K",
"dateb": "",
"owner": "exclude",
"count": "100"
}
Most EDGAR libraries also need to handle the quarterly index files, which are tab-delimited text files listing all filings. Without seeing deedy/edgar's code, we can infer it likely implements functionality similar to this:
import requests
from pathlib import Path
class EdgarDownloader:
BASE_URL = "https://www.sec.gov/Archives/edgar/data"
def __init__(self, user_agent):
# SEC requires a user agent identifying the requester
self.headers = {"User-Agent": user_agent}
def get_filing(self, cik, accession_number):
# CIK: Central Index Key (company identifier)
# Accession: Unique filing identifier
cik_str = str(cik).zfill(10) # Pad to 10 digits
acc_no_clean = accession_number.replace("-", "")
url = f"{self.BASE_URL}/{cik}/{acc_no_clean}/{accession_number}.txt"
response = requests.get(url, headers=self.headers)
response.raise_for_status()
return response.text
The critical architectural challenge with EDGAR tools is rate limiting and politeness. The SEC explicitly requires meaningful User-Agent headers and can block aggressive scrapers. Any production EDGAR library needs request throttling, retry logic with exponential backoff, and proper error handling for the SEC's occasionally flaky infrastructure.
The 104 stars suggest deedy/edgar handled these concerns adequately, at least for its original use case. But without documentation explaining rate limits, supported filing types, or error conditions, every user must rediscover these constraints through trial and error—or worse, by getting their IP temporarily blocked by the SEC.
Another likely feature is CIK lookup functionality. Companies can be identified by ticker symbol, but EDGAR uses numeric CIK codes internally. Converting "AAPL" to "0000320193" requires parsing SEC's company list, which changes as companies go public, merge, or delist. This mapping logic is where many EDGAR tools differ in reliability and freshness.
Gotcha
The elephant in the room is documentation—or rather, its complete absence. Even if deedy/edgar contains production-quality code with excellent error handling and comprehensive features, you'll need to read every line of source to understand it. This isn't just inconvenient; it's a security and maintenance risk. You can't audit what you can't understand quickly, and you can't fix what breaks if you don't know how it was supposed to work.
The second gotcha is maintenance status. The SEC has evolved its systems over the years—adding new filing types, migrating from SGML to XBRL for structured data, implementing new rate limiting rules, and changing URL structures. A library that worked perfectly in 2015 might fail silently or violently in 2024. Without commit history visibility, issue tracking, or maintainer communication, you're adopting an unknown quantity. The 104 stars tell you it was useful once, to someone, but not whether it works today or will work tomorrow. For financial data—where accuracy is legally critical—this uncertainty is unacceptable for anything beyond personal research projects.
Verdict
Use if: You're doing one-off personal research, you're comfortable reading Python source code to understand API contracts, and you have time to test thoroughly before depending on it. The 104 stars suggest something useful exists here, making it potentially worth a code review if you're evaluating EDGAR library options. It might also serve as inspiration for building your own tool.
Skip if: You need production-ready financial data infrastructure, you're building anything that could have legal or compliance implications, you work on a team where others need to maintain your code, or you value your time. The documentation vacuum makes this unsuitable for professional use. Instead, evaluate sec-edgar-downloader or edgartools—both offer similar functionality with actual documentation, tests, and active maintenance. The few hours you'd save by possibly reusing deedy/edgar's code will be lost many times over when something breaks and you're debugging undocumented behavior.