Back to Articles

Building a Spellcheck Dictionary for Cybersecurity: Inside Bishop Fox's Cyberdic

[ View on GitHub ]

Building a Spellcheck Dictionary for Cybersecurity: Inside Bishop Fox's Cyberdic

Hook

Technical writers waste an average of 23 minutes per document right-clicking to dismiss spellcheck warnings for perfectly valid cybersecurity terms—time that Bishop Fox decided to eliminate entirely with a 7,000+ word curated dictionary.

Context

Anyone who's written technical documentation knows the pain: you're drafting a penetration test report or security advisory, and every third word gets the dreaded red squiggle. "OAuth2," "plaintext," "cybersecurity" itself—your word processor flags them all as errors. You right-click, add to dictionary, and move on. But this process is Sisyphean. Each new document, each new machine, each new team member starts from scratch.

Bishop Fox, a cybersecurity consultancy that produces thousands of pages of technical documentation annually, recognized this wasn't just annoying—it was a quality problem. When everything is marked wrong, writers develop warning blindness and miss actual typos. When teams don't have standardized terminology, reports use "cyber security" in one section and "cybersecurity" in another. The company had already published a Cybersecurity Style Guide to address terminology consistency, but enforcement relied on manual review. Cyberdic emerged as the technical implementation of that style guide, transforming editorial guidelines into machine-readable wordlists that could be distributed across their writing infrastructure.

Technical Insight

Hunspell Format

Manual Curation

Filter Special Chars

Generate .dic

Generate .dic

Validates Against

Validates Against

Input Source

Eliminates False Positives

Bishop Fox Style Guide

Master Wordlist

Word Processor Filters

Microsoft Word Dictionary

LibreOffice Dictionary

Document Spellcheck

Cybersecurity Terms

Clean Technical Writing

System architecture — auto-generated

At its core, cyberdic is deceptively simple: a Hunspell-format dictionary file containing approved cybersecurity terminology. Hunspell is the spellcheck engine powering LibreOffice, Chrome, Firefox, and macOS, making it the de facto standard for open-source spellchecking. The format itself is remarkably straightforward—essentially a plain text file with one word per line and an optional affix file for handling inflections.

Here's what a basic Hunspell dictionary looks like:

7245
API
APIs
AES
backdoor
backdoors
botnet
botnets
cybersecurity
cyberattack
firewall
malware
phishing
zero-day

The first line specifies the word count (7245 in cyberdic's case), followed by the wordlist. But the interesting architectural decision isn't in the format—it's in what Bishop Fox chose to exclude. The repository contains processor-specific filtering logic that removes terms with special characters, numbers, or punctuation that might break word processor implementations. Terms like "Wi-Fi" or "OAuth2.0" work perfectly in Hunspell's CLI but can cause undefined behavior in Microsoft Word's dictionary parser.

This filtering reveals a critical insight about building tools for legacy systems: theoretical standards and practical implementations diverge. The Hunspell specification supports rich affix rules for morphological analysis, allowing a single entry like "backdoor/S" to match both "backdoor" and "backdoors." Cyberdic opts for explicit enumeration instead, listing both forms separately. Why? Because Word's Hunspell integration doesn't fully support affix rules, and LibreOffice's implementation has quirks around proper noun handling. By keeping the dictionary as a simple wordlist without affix dependencies, cyberdic achieves maximum compatibility at the cost of file size—an acceptable tradeoff when disk space is cheap but broken spellcheck is expensive.

The curation process itself follows a pull-based model. The dictionary sources terms from Bishop Fox's style guide, technology vendor documentation, and community feedback. Each addition goes through editorial review to ensure it matches the style guide's preferences. For example, the dictionary includes "cybersecurity" (one word) but not "cyber security" (two words), enforcing the style guide's position on compound terms. It includes "plaintext" but not "plain text," preferring the compound form for cryptographic contexts.

Integration into a word processor requires just three steps:

# 1. Download the appropriate dictionary
wget https://raw.githubusercontent.com/BishopFox/cyberdic/main/en_US-cyberdic.dic

# 2. For LibreOffice, copy to extensions directory
mkdir -p ~/.config/libreoffice/4/user/wordbook/
cp en_US-cyberdic.dic ~/.config/libreoffice/4/user/wordbook/

# 3. Enable in LibreOffice: Tools → Options → Language Settings → Writing Aids
# Check "en_US-cyberdic" under User-defined dictionaries

For Microsoft Word, the process is even simpler—just open the .dic file and Word automatically prompts to add it as a custom dictionary. This frictionless installation is by design. Bishop Fox could have built a Word add-in with auto-update capabilities, but that would require code signing, distribution infrastructure, and per-user installation permissions. A static dictionary file sidesteps all that complexity.

The repository structure reflects this minimalist philosophy: a README with installation instructions, the dictionary file itself, and a contribution guide. No build system, no dependencies, no package management. It's a throwback to UNIX philosophy—do one thing well and compose with existing tools. The spellcheck engine already exists in every word processor; cyberdic just supplies better data.

Gotcha

Cyberdic's biggest limitation is its static nature. Unlike modern spellcheck systems that learn from your writing or synchronize across devices, this is a file you download, install, and manually update. There's no notification when new terms are added, no auto-update mechanism, and no cloud sync. If Bishop Fox updates the dictionary next month with 500 new terms, you won't benefit unless you manually download and reinstall. For organizations deploying cyberdic across teams, this means building your own distribution mechanism—perhaps packaging it with configuration management tools or including it in workstation provisioning scripts.

The processor-specific filtering also means you're getting a subset of cybersecurity terminology, not the complete set. Terms that would be useful for technical writers—like "OAuth2" or "Wi-Fi"—are excluded because they contain characters that break certain word processors. You'll still need to manually add these to your personal dictionary, partially defeating the purpose. This is a fundamental constraint of targeting lowest-common-denominator compatibility: you lose access to features that work in sophisticated implementations. A Hunspell dictionary used with the command-line tools could handle these terms perfectly, but Word and LibreOffice's implementations are more restrictive. Bishop Fox chose broad compatibility over completeness, which is probably the right call for a general-purpose tool, but it means power users will still need supplementary wordlists.

Verdict

Use cyberdic if you write security documentation in Microsoft Word or LibreOffice Writer more than once a week and you're tired of maintaining personal dictionaries across machines. It's zero-maintenance after installation, curated by professionals who understand the domain, and integrates seamlessly with your existing workflow. It's especially valuable for teams that need terminology consistency—deploying a shared dictionary file is much easier than training everyone on style guide nuances. Skip it if you primarily write in markdown editors, IDEs, or web-based tools like Google Docs (none of which support Hunspell dictionaries), or if you need a living, auto-updating solution that learns your organization's specific terminology. Also skip if you work in adjacent technical domains—this is cybersecurity-specific, so network engineers or cloud architects will find coverage spotty outside security contexts. For those use cases, invest time in vale or textlint with custom vocabulary files instead.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-dev-tools/bishopfox-cyberdic.svg)](https://starlog.is/api/badge-click/ai-dev-tools/bishopfox-cyberdic)