Swark: Why Architecture Diagrams Should Be Generated, Not Maintained
Hook
Every architecture diagram in your repository is probably outdated. Swark treats them as disposable artifacts you regenerate in seconds, not sacred documents you meticulously maintain.
Context
Architecture documentation has a fundamental problem: it rots faster than milk in summer. You inherit a legacy Python service, spend three hours understanding the module structure, and by the time you’re ready to contribute, the diagram you found is six months stale. Traditional solutions require language-specific parsers—Doxygen for C++, Javadoc for Java, TypeDoc for TypeScript—each with its own learning curve and maintenance burden.
Swark takes a radically different approach: instead of parsing code deterministically, it throws your source files at an LLM and asks it to explain the architecture. Because it leverages GitHub Copilot’s Language Model API, it requires zero authentication setup—if you can use Copilot autocomplete, you can generate diagrams. This eliminates the setup friction that kills most documentation tools before they’re even configured. The bet Swark makes is simple: for exploratory documentation and onboarding, a good-enough diagram generated in 30 seconds beats a perfect diagram that takes three hours to maintain.
Technical Insight
Under the hood, Swark implements a four-stage pipeline that balances LLM context limits with codebase comprehensiveness. The file retrieval stage uses configurable glob patterns and file extension filters to gather source files from your selected directory. Critically, it implements automatic adjustment of file counts to stay within LLM max token limits—if your folder contains more files than the model can process, Swark dynamically reduces the file count. This prevents runtime failures when context limits are exceeded.
The prompt construction stage is where Swark’s language-agnostic philosophy shines. Instead of parsing abstract syntax trees or analyzing import statements, it simply embeds your source code into a structured prompt with diagram generation instructions. Here’s what the output structure looks like:
# swark-output/2025-01-09__20-18-38__diagram.md
```mermaid
graph TD
A[Extension Entry Point] --> B[File Retriever]
B --> C[Prompt Builder]
C --> D[LLM Client]
D --> E[Mermaid Renderer]
E --> F[VS Code Preview]
B --> G[Token Budget Manager]
G --> B
The extension outputs both the Mermaid diagram and a detailed log file containing configuration details, file lists, and run information—essential for debugging when diagrams don't match expectations. This logging-first approach provides visibility into what the model actually received.
Swark integrates with VS Code's official Language Model API rather than calling external services directly. This architectural choice leverages Copilot's existing authentication (eliminating API key management) and keeps your code within GitHub's infrastructure (addressing privacy concerns). The API call is remarkably simple—the extension doesn't need to handle OAuth flows, token refresh, or rate limiting because VS Code's API abstracts those concerns.
The rendering stage outputs Mermaid.js syntax specifically because it's human-editable. Unlike binary diagram formats or proprietary tools, you can open the generated markdown file and manually tweak node labels, add missing connections, or restructure the layout. Swark also implements cycle fixing to prevent Mermaid rendering failures—circular dependencies in your code won't break the diagram viewer. The output integrates seamlessly with existing documentation workflows: commit the markdown file to your repo, and platforms like GitHub, GitLab, and modern static site generators will render it automatically.
The privacy model deserves emphasis: Swark explicitly excludes source code from telemetry collection and only shares files with GitHub Copilot. For teams working on proprietary codebases, this matters. You're not sending your trade secrets to a random third-party API—you're using the same service that already has access to your code through Copilot autocomplete. The README is transparent about this boundary: "Your source code is shared only with GitHub Copilot — no other external APIs or providers involved."
## Gotcha
Swark's biggest limitation is baked into its core value proposition: you're dependent on GitHub Copilot. No Copilot subscription? No diagrams. This is an intentional trade-off for zero-config convenience, but it means Swark requires GitHub Copilot to function.
LLM-generated diagrams may produce varying interpretations when run multiple times on the same codebase—one diagram might emphasize different architectural aspects than another. The extension implements cycle fixing for Mermaid syntax, but the underlying architectural interpretation comes from the LLM. This approach works well for Swark's stated use cases (quick exploration and onboarding) but may be less suitable for documentation requiring strict reproducibility.
Token limits create boundaries on analyzable code size. Large codebases with many files can't be processed in a single pass—the automatic adjustment reduces the file count to stay within LLM limits. This means you may need to manually select subfolders and synthesize a complete picture yourself, with the tool making trade-offs about which files to include when limits are reached.
## Verdict
Use Swark if you're onboarding to unfamiliar codebases, documenting AI-generated projects, or need quick architectural overviews for internal wikis—situations where speed trumps precision and you already have GitHub Copilot access. It excels at answering "what does this repository even do?" in under a minute, making it invaluable for developers context-switching between microservices or evaluating open-source dependencies. The zero-config promise actually delivers: no YAML files, no language-specific plugins, no authentication headaches. Skip Swark if you need deterministic diagrams for compliance audits or require comprehensive analysis of massive codebases in a single pass. This isn't a tool for canonical architectural documentation; it's a tool for making documentation so cheap that keeping it current becomes realistic.