GitHacker: Exploiting Exposed .git Directories When Directory Listings Are Disabled
Hook
Over 10% of web applications accidentally expose their .git directories in production, leaking complete source code, credentials, and commit history—even when directory listings are disabled.
Context
The .git directory contains everything about a repository: every file version, every commit message, every branch and tag, and often credentials or API keys accidentally committed. When developers deploy web applications without properly excluding the .git folder, attackers can reconstruct the entire codebase. Early exploitation tools like wget could mirror exposed .git directories when Apache or nginx had directory listings enabled, but once administrators disabled autoindexing, these tools became ineffective. Simple scrapers could only recover source code at HEAD by parsing index files, missing the full history and all branches. GitHacker emerged to solve this harder problem: how do you reconstruct a complete Git repository when you can only make blind HTTP requests, without seeing what files exist? The tool needed to understand Git's internal object model deeply enough to bootstrap from minimal information—usually just the .git/HEAD file—and recursively discover every object SHA-1 by parsing downloaded content for references.
Technical Insight
GitHacker implements a two-phase approach that adapts to whether directory listings are available. When indexing is enabled, it recursively crawls all files under .git/. When disabled—the more interesting case—it starts by downloading critical entry points like .git/HEAD, .git/index, and .git/config, then uses these to seed a recursive object discovery process.
The core insight is that Git objects contain SHA-1 references to other objects. A commit object references its parent commits and its tree object. A tree object lists blob and subtree SHA-1s. By parsing each downloaded object, GitHacker extracts these references and adds them to a download queue. Here's the fundamental pattern in the ObjectDownloader class:
def parse_object_for_references(self, obj_content):
"""Extract SHA-1 references from Git objects"""
sha1_pattern = re.compile(rb'[0-9a-f]{40}')
references = sha1_pattern.findall(obj_content)
for ref in references:
obj_path = f"objects/{ref[:2].decode()}/{ref[2:].decode()}"
if obj_path not in self.downloaded:
self.queue.put(obj_path)
This recursive approach discovers all reachable objects from known references, but what about branches, tags, and stashes that aren't directly referenced from HEAD? GitHacker brute-forces common names. It tries refs/heads/master, refs/heads/main, refs/heads/dev, refs/heads/develop, and dozens of other conventional branch names. For each successful hit, it downloads the commit SHA-1 and adds it to the discovery queue, which then pulls in that branch's entire history.
The packed objects problem presents an interesting challenge. Git stores objects in two formats: loose objects (individual files at .git/objects/XX/YYYY...) and packed objects (compressed in .git/objects/pack/*.pack). GitHacker handles loose objects elegantly but can't efficiently download pack files because pack file names are SHA-1 hashes of their contents—unknowable without directory listings. The tool works around this by focusing on loose objects and relying on the fact that recent commits are usually unpacked.
The multi-threading implementation uses Python's ThreadPoolExecutor to parallelize HTTP requests:
with ThreadPoolExecutor(max_workers=self.threads) as executor:
futures = []
while not self.queue.empty():
path = self.queue.get()
future = executor.submit(self.download_object, path)
futures.append(future)
for future in as_completed(futures):
content = future.result()
if content:
self.parse_object_for_references(content)
This architecture achieves impressive throughput—testing against a medium-sized repository, GitHacker can download and reconstruct 1,000+ objects in under a minute with 10 threads. The threading model is I/O-bound rather than CPU-bound, making it ideal for this network-heavy workload.
Once all objects are downloaded, GitHacker leverages native Git commands to reconstruct the repository state. It runs git fsck to verify object integrity, git checkout to extract files for each discovered branch, and git stash list to recover stashed changes. This hybrid approach—custom Python for downloading, native Git for reconstruction—is elegant because it doesn't reimplement Git's complex merge and checkout logic.
The security vulnerabilities discovered in GitHacker itself are instructive. Early versions were vulnerable to path traversal attacks: a malicious server could serve a .git/HEAD file containing ../../../../etc/passwd, causing GitHacker to write outside the target directory. Later, researchers found that downloading .git/hooks could lead to arbitrary code execution if those hooks were triggered during reconstruction. These vulnerabilities highlight a meta-lesson: tools that process adversarial input must be sandboxed. The current recommendation to run GitHacker in Docker isn't paranoia—it's acknowledging that when you download untrusted Git repositories, you're pulling potentially malicious data that could exploit your tooling.
Gotcha
GitHacker's most significant limitation is that it requires containerized execution due to inherent security risks. When you're downloading a .git folder from a potentially hostile server, you must assume it contains malicious content designed to exploit your tooling. The discovered vulnerabilities—path traversal, arbitrary file writes, hook-based RCE—mean you cannot safely run this tool on your host system. If Docker isn't available in your environment, you're stuck with less capable alternatives or manual approaches.
The packed objects limitation is more subtle but impactful. Large repositories with thousands of commits often pack their objects to save space. When directory listings are disabled, GitHacker can't discover pack file names and falls back to reconstructing from loose objects only. This works for recent commits (Git keeps them unpacked) but may miss deeper history. In practice, this affects repositories over 6-12 months old or those that have been aggressively garbage collected. You'll recover the current state and recent branches, but older tags or historical commits might be inaccessible. There's no clean solution—the tool would need to guess 160-bit SHA-1 pack file names, which is computationally infeasible. If you encounter this scenario, check if the server accidentally exposes .git/objects/pack/ with directory listings enabled, allowing you to download pack files directly.
Verdict
Use if: You're conducting legitimate penetration tests or security assessments where you've found exposed .git directories and need complete repository reconstruction, including branches, tags, and stashes. GitHacker is the most thorough tool available and works even when directory listings are disabled—the common configuration. The Docker overhead is worth it for comprehensive results. Also use it if you're doing security research and want to understand how much information leaks from improperly secured .git folders.
Skip if: You cannot run Docker or containerized environments in your workflow—the security risks are too high to run natively. Skip it if you only need the current source code at HEAD (simpler tools like wget or curl scripts suffice). Skip it if the target repository is massive and heavily packed—you'll hit the packed objects limitation and waste time. Also skip if you're not conducting authorized security testing; this is an offensive tool for legitimate assessments only, not for unauthorized access.