AuthTables: A Minimalist Approach to Detecting Credential Theft with Bloom Filters

Hook

Over 50% of account takeover attacks succeed because attackers simply reuse stolen passwords from a different location. What if you could detect this in under a millisecond with zero external dependencies?

Context

Account takeover (ATO) is the invisible epidemic of modern authentication. While security teams obsess over sophisticated attack vectors—browser exploits, zero-days, advanced persistent threats—the vast majority of compromised accounts fall to the simplest attack: an attacker types in a stolen password from a different city. Password breaches from third-party sites leak billions of credentials annually, and users stubbornly reuse passwords across services. The result? Attackers log in as your users without breaking a sweat.

Traditional solutions throw complexity at the problem. Machine learning models analyze behavioral patterns. Third-party risk feeds score IP addresses. Device fingerprinting services track browser attributes across a dozen dimensions. These approaches work, but they come with baggage: API latency, ongoing costs, model training pipelines, and the inherent complexity of integrating external services into your authentication flow. AuthTables takes the opposite approach: what if you could detect the most common ATO vector—remote credential reuse—using only your own authentication data, an in-memory bloom filter, and a simple graph of user-location relationships?

Technical Insight

AuthTables implements a Trust-On-First-Use (TOFU) model backed by what its author calls a "user-location graph." Every time a user authenticates, the service receives three pieces of information: a user identifier, an IP address, and a machine identifier (typically a persistent cookie). It stores combinations of these attributes as edges in a graph where nodes represent users, IP addresses, and devices. When a new authentication attempt arrives, AuthTables checks whether the IP or device has been seen before for that user. If either is familiar, it returns 'OK'. If both are completely new, it returns 'BAD'.

The elegance lies in the data structure. AuthTables uses a bloom filter—a probabilistic data structure that can answer "have I seen this before?" with certainty for 'yes' answers and a small false positive rate for 'no' answers. The bloom filter sits entirely in memory, enabling sub-millisecond lookups without touching a database. Here's how authentication events flow through the system:

// Simplified example based on AuthTables architecture
type AuthEvent struct {
    UserID string `json:"user_id"`
    IP     string `json:"ip"`
    MID    string `json:"mid"`
}

func (s *Service) CheckAuth(event AuthEvent) string {
    // Create lookup keys for user-IP and user-device combinations
    userIPKey := fmt.Sprintf("%s:%s", event.UserID, event.IP)
    userMIDKey := fmt.Sprintf("%s:%s", event.UserID, event.MID)
    
    // Check if we've seen this user from this IP before
    ipKnown := s.bloom.Test([]byte(userIPKey))
    
    // Check if we've seen this user with this device before
    midKnown := s.bloom.Test([]byte(userMIDKey))
    
    // If either is known, this authentication looks legitimate
    if ipKnown || midKnown {
        return "OK"
    }
    
    // Both IP and device are completely new - suspicious
    return "BAD"
}

func (s *Service) RecordAuth(event AuthEvent) {
    // After the parent app validates this auth (maybe via MFA),
    // add it to the graph for future reference
    userIPKey := fmt.Sprintf("%s:%s", event.UserID, event.IP)
    userMIDKey := fmt.Sprintf("%s:%s", event.UserID, event.MID)
    
    s.bloom.Add([]byte(userIPKey))
    s.bloom.Add([]byte(userMIDKey))
    
    // Also persist to Redis for durability and fraud investigation
    s.redis.SAdd("auth:" + event.UserID, userIPKey, userMIDKey)
}

The bloom filter provides the speed; Redis provides the durability. When the service starts, it rebuilds the bloom filter from Redis, ensuring that authentication history survives restarts. Redis also serves as a queryable store for fraud investigations—when you detect a compromised account, you can examine the full authentication history to understand the attack timeline.

What makes this architecture particularly clever is the self-reinforcing feedback loop. Every legitimate authentication strengthens the graph. A user who logs in from home adds their home IP to the graph. Next time they authenticate from home, it's instant recognition. When they log in from their office for the first time, the IP is new but the device is recognized—still 'OK'. Over time, the graph captures the authentic geography and device patterns of your user base. Attackers in different countries with stolen passwords trigger immediate 'BAD' signals because both attributes are foreign.

The microservice design is intentionally minimal. AuthTables exposes two HTTP endpoints: one for checking authentications (GET with query parameters or POST with JSON) and one for recording validated authentications. It doesn't implement policy decisions like "should I trigger MFA?" or "should I block this login?" Those decisions belong in your application logic. AuthTables simply provides a fast, reliable signal: this authentication is consistent with the user's history, or it isn't. This separation of concerns makes it easy to integrate into existing authentication flows without coupling your auth system to a specific risk-scoring strategy.

The performance characteristics are remarkable. Bloom filter lookups run in constant time with no I/O. The author reports sub-millisecond response times even under load. The false positive rate—where the bloom filter incorrectly claims it has seen a combination before—can be tuned by adjusting the filter size and number of hash functions, but the default configuration targets error rates low enough to be negligible in practice. False negatives are impossible with bloom filters: if it says it has seen something, it definitely has (or it's a false positive, which from a security perspective is safe since you're allowing the authentication).

Gotcha

The biggest limitation is right in the design: AuthTables only detects attacks where both the IP address and device identifier are new. This makes it highly effective against remote credential reuse—an attacker in a different country logging in with a stolen password—but blind to local attacks. If an attacker installs malware on the user's device, they inherit the trusted device identifier. If they compromise the user's network (coffee shop MITM, home router exploit), they inherit the trusted IP. The attack becomes invisible to AuthTables. Similarly, sophisticated attackers who steal session cookies rather than passwords bypass the authentication flow entirely.

The false positive problem hits mobile users and privacy-conscious users hard. Imagine a user who connects through a VPN that assigns random exit nodes, and who regularly clears cookies or uses incognito mode. Every authentication looks like both a new IP and a new device—constant 'BAD' signals. These users will face frequent MFA challenges, degrading their experience. The TOFU model helps: after you challenge them once and they pass, that IP-device combination gets added to the graph. But if they're genuinely on a different IP and device every time, AuthTables becomes a source of friction rather than security. This isn't a bug; it's a fundamental tradeoff of the user-location graph approach. You're betting that most users authenticate from a small number of stable locations and devices, and that attackers authenticate from different locations and devices. When that assumption holds, AuthTables works beautifully. When it doesn't, you're challenging legitimate users.

Finally, the project is explicitly unmaintained. The author states clearly in the README that they're no longer supporting it. The codebase is more reference implementation than production-ready software. There are no recent commits, no active issue triage, no security patches. If you discover a bug or need to adapt it to your infrastructure, you're on your own. This isn't necessarily disqualifying—the code is straightforward enough to understand and modify—but it does mean you're committing to ownership if you deploy it.

Verdict

Use if: You're building authentication systems for applications with relatively stable user bases (employees logging into corporate tools, customers accessing banking apps from home and work) and you need a lightweight way to detect remote credential reuse without adding latency or external dependencies. AuthTables shines when most users authenticate from predictable locations and you have the engineering capacity to build the surrounding infrastructure—MFA challenge flows, alert systems, and policy logic to act on the 'BAD' signals. It's particularly valuable if you want to learn how user-location graphs and bloom filters work in practice; the codebase is clean enough to serve as an educational resource. Skip if: Your users are highly mobile, frequently use VPNs, or value anonymity (privacy-focused apps, VPN services, cryptocurrency platforms). Also skip if you need protection against sophisticated local attacks or if you're looking for a maintained, production-ready solution with ongoing support. The unmaintained status makes it risky for critical security infrastructure unless you're prepared to fork and maintain it yourself. If you need comprehensive ATO protection beyond remote credential reuse, you're better off with commercial solutions like Auth0 Anomaly Detection or Castle.io that handle device fingerprinting, behavioral analysis, and threat intelligence feeds. Treat AuthTables as inspiration and reference architecture rather than a drop-in security solution.

AuthTables: A Minimalist Approach to Detecting Credential Theft with Bloom Filters

AuthTables: A Minimalist Approach to Detecting Credential Theft with Bloom Filters

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

AuthTables: A Minimalist Approach to Detecting Credential Theft with Bloom Filters

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]