GitHack: Weaponizing Exposed .git Folders to Reconstruct Source Code
Hook
Over 3,500 developers have starred a tool that can steal your entire source code from a single misconfigured nginx directive. If your production server exposes a .git folder, an attacker needs just minutes to download your intellectual property.
Context
Web developers routinely deploy applications by cloning Git repositories directly onto production servers or using Git-based deployment pipelines. The .git folder contains the complete history of your project, including every file, commit message, and even secrets accidentally committed then removed. When web servers aren't configured to block access to dotfiles, this treasure trove becomes publicly accessible.
The problem is deceptively simple: a developer runs git clone in their web root, forgets to add a deny rule for .git in their nginx or Apache configuration, and suddenly anyone who appends /.git/config to the domain gets a readable file. Traditional security scanners might flag this, but they don't demonstrate the true impact. GitHack bridges that gap by automating the complete reconstruction of your source code, turning a theoretical vulnerability into downloaded Python files, database schemas, and API keys in seconds.
Technical Insight
GitHack's elegance lies in understanding Git's internal storage mechanism. Git doesn't store files as files—it stores them as content-addressable objects using SHA1 hashes. The .git/index file acts as a manifest, mapping file paths to their corresponding object hashes. GitHack exploits this by parsing the index, extracting the hash list, and methodically downloading each object from .git/objects/.
The tool uses the gin library to parse Git's binary index format, which is a complex structure containing cached file metadata. Once parsed, GitHack has a complete map of what files exist and their SHA1 identifiers. Here's how it reconstructs a file:
import zlib
import requests
def download_object(base_url, sha1_hash):
# Git stores objects in .git/objects/ab/cdef123... format
# First 2 chars of SHA1 = directory, rest = filename
dir_name = sha1_hash[:2]
file_name = sha1_hash[2:]
object_url = f"{base_url}/.git/objects/{dir_name}/{file_name}"
response = requests.get(object_url)
if response.status_code == 200:
# Git objects are zlib-compressed
decompressed = zlib.decompress(response.content)
# Format: "blob <size>\0<content>"
null_byte = decompressed.index(b'\0')
file_content = decompressed[null_byte + 1:]
return file_content
return None
This approach is bandwidth-efficient because GitHack only downloads the exact objects referenced in the index, not the entire .git folder. On a typical web application with hundreds of files, this might mean downloading 50MB instead of potentially gigabytes of Git history.
The tool also handles Git's directory structure intelligently. Git doesn't store directories as separate entities—only files with paths. GitHack reconstructs the directory tree by parsing file paths from the index and creating directories as needed before writing file contents. This means a path like src/api/handlers/user.py automatically creates the nested folder structure.
One particularly clever aspect is how GitHack handles concurrent downloads. Modern web servers can handle dozens of simultaneous connections, and since each object is independent, the tool parallelizes requests. This transforms what could be a 10-minute sequential download into a 30-second operation.
The vulnerability GitHack exploits isn't hypothetical. A 2022 security audit of the tool itself revealed an arbitrary file write vulnerability where a malicious web server could send crafted responses causing GitHack to write files outside the intended directory. This was patched by validating that reconstructed paths don't contain directory traversal sequences like ../. The irony of a security tool having security vulnerabilities underscores an important principle: never run untrusted penetration testing tools with elevated privileges.
Gotcha
GitHack only works on the current index state—it can't recover deleted files or browse commit history. If a developer committed AWS credentials, then removed them in a subsequent commit, GitHack won't find them unless you can access .git/logs or enumerate commit objects directly. This is both a limitation and a feature boundary; the tool focuses on quick wins rather than comprehensive Git forensics.
The tool also fails silently when servers implement rate limiting or IP blocking. After a few hundred rapid requests to .git/objects/, many WAFs (Web Application Firewalls) will detect the pattern and block your IP. GitHack doesn't include built-in retry logic or rate limiting, so you'll need to wrap it with your own request throttling if you're testing a hardened target. Additionally, some servers return 200 OK with error HTML instead of proper 404s, which can cause GitHack to write garbage data and fail during decompression. Always validate the reconstructed files manually rather than assuming completeness.
Verdict
Use if: You're conducting authorized penetration tests and need to quickly demonstrate the severity of .git folder exposure to clients who think "it's just configuration files." GitHack proves that misconfiguration equals total source code disclosure in a way that writes a compelling finding. Also use it when you're auditing your own infrastructure and want to verify that your nginx deny rules actually work before an attacker tests them. Skip if: You need to analyze Git history or recover deleted commits—just download the entire .git folder with wget and use standard Git commands instead. Also skip if you're dealing with modern cloud platforms like Vercel or Netlify that never expose .git folders by design; the vulnerability simply doesn't exist in those environments. Finally, skip if you don't have explicit written authorization—reconstructing source code without permission crosses from security research into legal liability territory in most jurisdictions.