HASSH: Fingerprinting SSH Implementations Before the Encryption Kicks In

Hook

Every SSH connection reveals its true identity before a single byte gets encrypted. HASSH exploits this brief window of transparency to catch attackers, rogue IoT devices, and data exfiltration tools that think they're being stealthy.

Context

SSH has been the backbone of remote administration for decades, but its ubiquity makes it a double-edged sword. Security teams face a fundamental problem: how do you identify what's actually connecting to your systems when IP addresses rotate, NAT obscures origins, and attackers can trivially spoof version strings? Traditional network monitoring sees 'SSH-2.0-OpenSSH_7.4' and shrugs—but is that a legitimate admin or Metasploit's SSH module masquerading as OpenSSH?

Developed by Ben Reardon at Salesforce in 2018, HASSH addresses this blind spot by creating fingerprints from the algorithm negotiation phase of SSH connections. During the initial handshake, before any encryption occurs, SSH clients and servers exchange SSH_MSG_KEXINIT packets listing their supported cryptographic algorithms in priority order. These algorithm preferences are remarkably stable across implementations and versions—OpenSSH makes different choices than Paramiko, which differs from PuTTY, which differs from the SSH library embedded in your IoT camera. HASSH captures these preferences, hashes them into a compact MD5 fingerprint, and gives security teams a way to track SSH implementations regardless of what they claim to be.

Technical Insight

System architecture — auto-generated

HASSH works by intercepting the SSH_MSG_KEXINIT packet during the SSH handshake—specifically the clear-text portion that occurs before key exchange completes. This packet contains four critical algorithm lists: key exchange methods, encryption algorithms, MAC (Message Authentication Code) algorithms, and compression algorithms. The HASSH fingerprint is simply an MD5 hash of these four lists concatenated with semicolons as delimiters.

Here's what the algorithm extraction looks like in practice. The Python implementation parses pcap files and extracts the relevant fields:

# From the HASSH codebase - simplified for clarity
def hassh_from_kexinit(kex_algorithms, encryption_algorithms_client_to_server, 
                        mac_algorithms_client_to_server, compression_algorithms_client_to_server):
    hassh_string = ';'.join([
        ','.join(kex_algorithms),
        ','.join(encryption_algorithms_client_to_server),
        ','.join(mac_algorithms_client_to_server),
        ','.join(compression_algorithms_client_to_server)
    ])
    return hashlib.md5(hassh_string.encode()).hexdigest(), hassh_string

# Example algorithm lists from a real SSH client
kex = ['curve25519-sha256', 'ecdh-sha2-nistp256', 'diffie-hellman-group14-sha1']
enc = ['aes128-ctr', 'aes192-ctr', 'aes256-ctr', 'aes128-gcm@openssh.com']
mac = ['hmac-sha2-256', 'hmac-sha2-512', 'hmac-sha1']
comp = ['none', 'zlib@openssh.com']

fingerprint, raw = hassh_from_kexinit(kex, enc, mac, comp)
print(f"HASSH: {fingerprint}")
print(f"Raw: {raw}")
# Outputs something like:
# HASSH: ec7378c1a92f5a8dde7e8b7a1ddf33bc
# Raw: curve25519-sha256,ecdh-sha2-nistp256,diffie-hellman-group14-sha1;aes128-ctr,aes192-ctr,aes256-ctr,aes128-gcm@openssh.com;hmac-sha2-256,hmac-sha2-512,hmac-sha1;none,zlib@openssh.com

The brilliance of this approach is that it fingerprints the implementation, not the connection. OpenSSH 7.4 on Ubuntu will produce the same HASSH regardless of who's running it, what IP they're coming from, or what hostname they claim. This creates a searchable, shareable identifier that works across your entire infrastructure.

HASSH also generates a complementary hasshServer fingerprint using the server's algorithm lists, enabling bidirectional identification. This matters for detecting compromised servers or identifying what SSH daemon is actually running on a host—critical when hunting for backdoors or unauthorized services.

The repository includes utilities to process packet captures from tcpdump or Zeek/Bro:

# Capture SSH traffic
tcpdump -i eth0 'tcp port 22' -w ssh_traffic.pcap

# Extract HASSH fingerprints
python3 hassh.py ssh_traffic.pcap

# Output shows client and server fingerprints for each connection
# timestamp, src_ip, dst_ip, hassh, hassh_algorithms, hasshServer, hasshServer_algorithms

The real power emerges when you correlate these fingerprints against known-bad signatures. Metasploit's SSH module has a distinctive HASSH (6b4b60ccff7726a4c68dbabfce1079cb), as does Paramiko's default configuration and various IoT device implementations. Security teams can maintain watchlists and alert when known-malicious or unauthorized fingerprints appear.

One particularly clever application is detecting data exfiltration via algorithm list manipulation. Since the SSH_MSG_KEXINIT packet is clear-text and can contain arbitrary algorithm names, attackers can embed data in fake algorithm lists and exfiltrate it before the connection even fully establishes. HASSH's algorithm visibility makes these covert channels detectable through anomalous fingerprints or unusually long algorithm lists.

Gotcha

The first major gotcha: this repository is effectively archived. Salesforce stopped maintaining it, and active development has moved to Corelight's fork. The code here works, but you won't get bug fixes, new features, or support. If you're deploying this in production, you should be looking at Corelight's version, not this one. The Salesforce repository serves primarily as a reference implementation and the original research.

The second limitation is environmental noise. HASSH works beautifully when you have a controlled environment with known-good SSH clients, but becomes challenging in heterogeneous environments. If your organization uses twenty different SSH clients across various platforms, you'll spend significant time building and maintaining a baseline of legitimate fingerprints before you can effectively detect anomalies. The technique also struggles with SSH implementations that randomize their algorithm preferences or frequently update their algorithm support—though thankfully, most implementations are quite stable in this regard. Finally, MD5 collisions are theoretically possible, though the algorithm space is large enough that accidental collisions are rare. Intentional collision attacks are computationally expensive enough that most attackers won't bother when easier evasion techniques exist.

Verdict

Use if: You're on a security or network defense team that needs to identify, track, or control SSH clients and servers across your infrastructure—especially if you're hunting for lateral movement, malicious toolkits like Metasploit or Cobalt Strike, unauthorized IoT devices, or establishing a zero-trust SSH posture. HASSH excels when you can maintain a whitelist of approved SSH implementations and alert on deviations. It's particularly valuable in incident response and forensics where you need to reconstruct attacker behavior from packet captures. Skip if: You're working in highly diverse or dynamic environments where maintaining fingerprint baselines would be prohibitive, you need actively maintained open-source tooling (use Corelight's fork instead), or you're looking for general SSH monitoring rather than implementation-specific fingerprinting. This is a specialized security tool, not a general-purpose SSH analyzer, so skip it if your use case doesn't involve threat detection or security operations.

HASSH: Fingerprinting SSH Implementations Before the Encryption Kicks In

HASSH: Fingerprinting SSH Implementations Before the Encryption Kicks In

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

HASSH: Fingerprinting SSH Implementations Before the Encryption Kicks In

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

Running Gemma-4 26B on DGX Spark: Why Speculative Decoding Falls Apart at Scale

Headroom: The Three-Layer Compression Stack That Makes LLM Context Windows 60% Cheaper

GSD Core: Why This Tool Spawns a Fresh AI Context for Every Coding Task

Chipotlai Max: Reverse-Engineering Corporate Chatbots for Free LLM Inference

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]