Inside jeanphorn/wordlist: A Curated Arsenal for Credential Testing

Hook

The most common IoT camera password isn't 'admin' or '12345'—it's a blank field. This repository catalogs thousands of these real-world authentication failures, organized by the exact services hackers target first.

Context

Credential-based attacks remain the easiest entry point in penetration testing and red team operations. While sophisticated exploit chains make headlines, the reality is that default passwords, common patterns, and reused credentials account for the majority of initial access scenarios. Security professionals need wordlists—curated collections of likely passwords and usernames—but maintaining these lists is tedious. Scraped breach data needs cleaning, vendor documentation must be parsed for default credentials, and cultural contexts (like Chinese pinyin passwords) require specialized generation logic.

The jeanphorn/wordlist repository emerged as a practical solution to this curation problem. Rather than building yet another password cracker or brute-force tool, it focuses on the unsexy but critical task of organizing known-good wordlists by attack surface. SSH, RDP, FTP, databases, IP cameras, and routers each get dedicated lists drawn from multiple authoritative sources including SecLists, documented breaches, and vendor manuals. The repository's 1,700+ stars reflect its utility as a reference collection that practitioners actually use in authorized testing scenarios, not an academic exercise in password theory.

Technical Insight

System architecture — auto-generated

The architecture is deliberately minimalist: flat text files organized by service type, structured JSON for default credentials, and standalone Python utilities for wordlist manipulation. This isn't a framework—it's a curated dataset with tooling attached. The main value proposition lives in the categorization strategy. Instead of one monolithic password list, you get /ssh/, /rdp/, /db/, and /iot/ directories where each contains context-specific credentials.

The default credentials structure demonstrates the practical design philosophy. Rather than plain text, device defaults are stored as JSON objects with metadata:

{
  "device": "Hikvision IP Camera",
  "port": 80,
  "protocol": "HTTP",
  "username": "admin",
  "password": "12345",
  "notes": "Default for DS-2CD series"
}

This structured approach means automated tools can consume the data programmatically, filtering by port or device type rather than manually parsing text files. The JSON schema isn't documented, but the pattern is consistent enough for scripting against.

The Python tooling ecosystem transforms static lists into dynamic wordlist generation workflows. The mangler.py script implements common password transformation rules—leetspeak substitutions, capitalization variants, appending numbers—that mirror how humans actually modify base passwords:

def mangle_word(word):
    variants = [word]
    # Capitalization patterns
    variants.append(word.capitalize())
    variants.append(word.upper())
    # Common substitutions
    leet = word.replace('a', '@').replace('e', '3').replace('o', '0')
    variants.append(leet)
    # Numeric suffixes (years, common patterns)
    for suffix in ['123', '2024', '!', '!!']:
        variants.append(word + suffix)
    return list(set(variants))  # Deduplicate

While this implementation is simplistic compared to hashcat's rule engine, it's self-contained and requires no external dependencies. For quick wordlist expansion during time-constrained engagements, this pragmatism wins over sophistication.

The Chinese password generation tooling (chinese_password_gen.py) addresses a genuine gap in Western-focused security tools. It generates passwords based on pinyin romanization of common surnames, lucky numbers (6, 8, 88, 168), and cultural patterns like birth years formatted as Chinese zodiac cycles. This isn't just academic—testing authentication systems in Chinese-speaking regions with Western wordlists produces dramatically lower success rates. The generator combines cultural context with technical patterns:

import itertools

# Common Chinese surnames in pinyin
surnames = ['wang', 'li', 'zhang', 'liu', 'chen']
# Lucky numbers and patterns
lucky = ['6', '8', '88', '168', '520', '888']
# Generate combinations
for surname in surnames:
    for number in lucky:
        yield f"{surname}{number}"
        yield f"{surname.capitalize()}{number}"
        # Phone-like patterns (Chinese mobile prefixes)
        yield f"{surname}13{number}"

The repository's real strength is aggregation—it combines breach databases (rockyou.txt variants), vendor documentation parsing (default IoT credentials), and culturally-informed generation. The merge_dedupe.sh shell script demonstrates the assembly process, merging disparate sources and removing duplicates using Unix pipeline semantics rather than heavyweight Python libraries. This keeps the toolchain portable and transparent.

One underappreciated aspect is the categorical organization by protocol and service. When testing RDP access, you don't need every password ever leaked—you need passwords users actually choose for remote desktop authentication. These tend toward memorable phrases and corporate password policies (8+ characters, mixed case). The /rdp/ directory reflects this reality with lists curated for that specific attack surface, including common Active Directory patterns and Citrix defaults.

Gotcha

The most critical limitation is staleness. Wordlists are snapshot artifacts—effective when compiled, increasingly obsolete as breach data evolves and password trends shift. This repository lacks any automated update mechanism. The last significant commit determines your dataset's relevance, and given the nature of static file repositories, you're trusting the maintainer's curation cadence. For production red team operations, you'd need to supplement these lists with fresh breach compilations and OSINT-derived passwords specific to your target organization.

The Python tooling has all the hallmarks of utility scripts written for personal use: minimal error handling, no unit tests, and sparse documentation. The clean.py script fails silently on malformed input. The mangler.py tool has no configuration file support—you modify source code to adjust transformation rules. These aren't production-grade security tools; they're conveniences for manual wordlist work. Integrating them into automated pipelines requires defensive wrappers and extensive testing. The scripts also lack performance optimization—mangling a 10-million-word list will exhaust memory rather than streaming transformations. For serious wordlist work, you'd still reach for hashcat rules or John the Ripper's word mangler, using this repository's curated sources as input datasets rather than its tooling as the processing engine.

Verdict

Use if you're conducting authorized penetration tests and need categorized starting wordlists for specific attack surfaces—particularly SSH, RDP, database authentication, or IoT/IP camera defaults. The Chinese password generation is invaluable for testing systems in Chinese-speaking markets where Western wordlists fall flat. It's also useful as a reference collection when building custom wordlists, providing curated sources to merge and extend. Skip if you need actively maintained, continuously updated breach compilations (use commercial threat intelligence feeds instead), if you're targeting sophisticated environments where generic wordlists won't achieve acceptable hit rates, or if you require production-grade tooling with error handling and performance optimization. The repository's value is in curation and categorization, not in technical sophistication or freshness guarantees.

Inside jeanphorn/wordlist: A Curated Arsenal for Credential Testing

Inside jeanphorn/wordlist: A Curated Arsenal for Credential Testing

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]

Inside jeanphorn/wordlist: A Curated Arsenal for Credential Testing

Hook

Context

Technical Insight

Gotcha

Verdict

// KNOWLEDGE GRAPH

// RELATED

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

How Ripgrep Makes Searching 10x Faster Than Grep: A Deep Dive Into Rust-Powered Text Search

Open Interpreter: Running GPT-4 with Root Access to Your Machine

Accomplish: Why Wrapping OpenCode Instead of Building an Agent Runtime Was the Right Bet

NVIDIA Cosmos: A Case Study in Strategic Repository Deprecation

// CODEBASE INTELLIGENCE

Best for

Skip when

[ SIMILAR REPOS ]