Back to Articles

GitHack: Weaponizing Exposed .git Folders to Reconstruct Source Code

[ View on GitHub ]

GitHack: Weaponizing Exposed .git Folders to Reconstruct Source Code

Hook

Over 3,500 developers have starred a tool that can steal your entire source code from a single misconfigured nginx directive. If your production server exposes a .git folder, an attacker needs just minutes to download your intellectual property.

Context

Web developers routinely deploy applications by cloning Git repositories directly onto production servers or using Git-based deployment pipelines. The .git folder contains the complete history of your project, including every file, commit message, and even secrets accidentally committed then removed. When web servers aren't configured to block access to dotfiles, this treasure trove becomes publicly accessible.

The problem is deceptively simple: a developer runs git clone in their web root, forgets to add a deny rule for .git in their nginx or Apache configuration, and suddenly anyone who appends /.git/config to the domain gets a readable file. Traditional security scanners might flag this, but they don't demonstrate the true impact. GitHack bridges that gap by automating the complete reconstruction of your source code, turning a theoretical vulnerability into downloaded Python files, database schemas, and API keys in seconds.

Technical Insight

HTTP GET

Binary index data

Extract metadata

For each SHA1

HTTP GET objects/ab/cdef...

Compressed blob

Remove header, extract content

Preserve directory structure

Web Server with .git/

Download .git/index

gin Library Parser

File Path → SHA1 Map

Object Downloader

.git/objects/XX/YYYY...

zlib Decompressor

File Reconstructor

Reconstructed Source Code

System architecture — auto-generated

GitHack's elegance lies in understanding Git's internal storage mechanism. Git doesn't store files as files—it stores them as content-addressable objects using SHA1 hashes. The .git/index file acts as a manifest, mapping file paths to their corresponding object hashes. GitHack exploits this by parsing the index, extracting the hash list, and methodically downloading each object from .git/objects/.

The tool uses the gin library to parse Git's binary index format, which is a complex structure containing cached file metadata. Once parsed, GitHack has a complete map of what files exist and their SHA1 identifiers. Here's how it reconstructs a file:

import zlib
import requests

def download_object(base_url, sha1_hash):
    # Git stores objects in .git/objects/ab/cdef123... format
    # First 2 chars of SHA1 = directory, rest = filename
    dir_name = sha1_hash[:2]
    file_name = sha1_hash[2:]
    
    object_url = f"{base_url}/.git/objects/{dir_name}/{file_name}"
    response = requests.get(object_url)
    
    if response.status_code == 200:
        # Git objects are zlib-compressed
        decompressed = zlib.decompress(response.content)
        # Format: "blob <size>\0<content>"
        null_byte = decompressed.index(b'\0')
        file_content = decompressed[null_byte + 1:]
        return file_content
    return None

This approach is bandwidth-efficient because GitHack only downloads the exact objects referenced in the index, not the entire .git folder. On a typical web application with hundreds of files, this might mean downloading 50MB instead of potentially gigabytes of Git history.

The tool also handles Git's directory structure intelligently. Git doesn't store directories as separate entities—only files with paths. GitHack reconstructs the directory tree by parsing file paths from the index and creating directories as needed before writing file contents. This means a path like src/api/handlers/user.py automatically creates the nested folder structure.

One particularly clever aspect is how GitHack handles concurrent downloads. Modern web servers can handle dozens of simultaneous connections, and since each object is independent, the tool parallelizes requests. This transforms what could be a 10-minute sequential download into a 30-second operation.

The vulnerability GitHack exploits isn't hypothetical. A 2022 security audit of the tool itself revealed an arbitrary file write vulnerability where a malicious web server could send crafted responses causing GitHack to write files outside the intended directory. This was patched by validating that reconstructed paths don't contain directory traversal sequences like ../. The irony of a security tool having security vulnerabilities underscores an important principle: never run untrusted penetration testing tools with elevated privileges.

Gotcha

GitHack only works on the current index state—it can't recover deleted files or browse commit history. If a developer committed AWS credentials, then removed them in a subsequent commit, GitHack won't find them unless you can access .git/logs or enumerate commit objects directly. This is both a limitation and a feature boundary; the tool focuses on quick wins rather than comprehensive Git forensics.

The tool also fails silently when servers implement rate limiting or IP blocking. After a few hundred rapid requests to .git/objects/, many WAFs (Web Application Firewalls) will detect the pattern and block your IP. GitHack doesn't include built-in retry logic or rate limiting, so you'll need to wrap it with your own request throttling if you're testing a hardened target. Additionally, some servers return 200 OK with error HTML instead of proper 404s, which can cause GitHack to write garbage data and fail during decompression. Always validate the reconstructed files manually rather than assuming completeness.

Verdict

Use if: You're conducting authorized penetration tests and need to quickly demonstrate the severity of .git folder exposure to clients who think "it's just configuration files." GitHack proves that misconfiguration equals total source code disclosure in a way that writes a compelling finding. Also use it when you're auditing your own infrastructure and want to verify that your nginx deny rules actually work before an attacker tests them. Skip if: You need to analyze Git history or recover deleted commits—just download the entire .git folder with wget and use standard Git commands instead. Also skip if you're dealing with modern cloud platforms like Vercel or Netlify that never expose .git folders by design; the vulnerability simply doesn't exist in those environments. Finally, skip if you don't have explicit written authorization—reconstructing source code without permission crosses from security research into legal liability territory in most jurisdictions.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/lijiejie-githack.svg)](https://starlog.is/api/badge-click/cybersecurity/lijiejie-githack)