Back to Articles

Attacking and Defending Generative AI: A Security Reference Arsenal

[ View on GitHub ]

Attacking and Defending Generative AI: A Security Reference Arsenal

Hook

A Chevy dealership’s chatbot was tricked into selling a 2024 Tahoe for one dollar through prompt injection. When teachers tried catching students using ChatGPT with hidden prompts, the AI revealed its instructions instead. These aren’t proof-of-concepts—they’re documented real-world exploits catalogued in a growing security knowledge base.

Context

Large language models arrived in production environments faster than their security posture could mature. While traditional software security has decades of established frameworks—OWASP, CVE databases, penetration testing methodologies—generative AI introduced entirely new attack surfaces. Prompt injection doesn’t map to SQL injection. Jailbreaking isn’t privilege escalation. Adversarial attacks on aligned models have no equivalent in conventional application security.

The NetsecExplained/Attacking-and-Defending-Generative-AI repository emerged as a curated reference guide addressing this gap. Rather than building yet another security tool, it functions as an organized taxonomy connecting three critical layers: security frameworks (OWASP LLM Top 10, MITRE ATLAS), practical attack research (from academic papers to Twitter threads documenting exploits), and defensive tooling (scanners, guardrails, moderation APIs). For security teams suddenly responsible for protecting chatbots, RAG systems, and agentic workflows, this repository provides the map to a rapidly evolving threat landscape.

Technical Insight

Knowledge Hub

Security Frameworks

OWASP/MITRE ATLAS

Threat Modeling

Attack Techniques

Taxonomy

Prompt Injection

Direct & Indirect

Jailbreaking

Adversarial Attacks

Real-World

Exploit Examples

Security Tools

Garak/PyRIT/NeMo

Mitigation

Strategies

Academic Papers

Tool References

System architecture — auto-generated

The repository’s architecture reveals the full attack surface through categorical organization. At the foundation layer sit security frameworks that provide threat modeling structure. OWASP’s LLM Top 10 catalogs vulnerabilities like insecure output handling and excessive agency, while MITRE ATLAS extends traditional ATT&CK tactics to machine learning systems. These frameworks help security teams ask the right questions: Where can untrusted input reach prompt context? What happens when an LLM’s tool-use capability is exploited?

The attack technique taxonomy bridges theory and practice. Prompt injection—the LLM equivalent of code injection—comes in direct and indirect variants. Direct injection manipulates user input fields, as demonstrated in the Chevy dealership case where conversational override commands bypassed business logic. Indirect injection is more insidious: embedding malicious instructions in documents, web pages, or emails that an LLM later processes. The research paper “Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection” demonstrates cross-plugin exploitation chains where a poisoned email could trigger unauthorized API calls through ChatGPT’s ecosystem.

Invisible prompt injection leverages non-standard Unicode characters that render as whitespace to humans but execute as instructions for models, as documented in research linked from the repository. Jailbreaking techniques like Microsoft’s documented “Skeleton Key” attack use carefully crafted multi-turn conversations that incrementally shift model behavior outside safety boundaries. The repository links to the L1B3RT45 collection documenting jailbreak techniques across flagship AI models.

On the defensive side, the repository catalogs practical mitigation tools. NeMo Guardrails from NVIDIA implements programmable constraints for LLM applications. Garak functions as a vulnerability scanner for LLMs, probing models with various attack vectors across categories like encoding exploits, toxicity generation, and hallucination triggers. PyRIT (Python Risk Identification Tool) from Microsoft’s AI Red Team appears to provide red teaming capabilities for generative AI systems, as described in their linked video on AI red teaming approaches.

The repository’s real value emerges in connecting attack vectors to both research provenance and defensive tooling. The “Universal and Transferable Adversarial Attacks” paper demonstrates gradient-based suffix generation that jailbreaks aligned models—then links to testing tools like Garak. Real-world exploit documentation, like the ChatGPT plugin cross-plugin request forgery chain, shows how theoretical attacks manifest in production systems where plugins share context without proper isolation.

Gotcha

This repository is a static snapshot, not a living security platform. LLM security research moves at social media velocity—new jailbreaks appear on Twitter weekly, model providers patch vulnerabilities in days, and attack techniques evolve faster than academic publication cycles can track. Links decay, papers get superseded, and tools become unmaintained. Unlike actively maintained databases like CVE or AVID, there’s no guarantee these references stay current or that dead links get pruned.

More critically, it’s a reference list without implementation guidance or risk context. You get links to papers describing universal adversarial attacks, but no assessment of which attacks pose realistic threats versus theoretical curiosities for your specific use case. The repository doesn’t differentiate between defending a customer service chatbot versus an agentic system with database access—threat profiles that require radically different security postures. There’s no code to run, no tutorials to follow, and no comparative analysis of defensive tools. You’re handed a reading list and expected to independently synthesize operational security guidance. For teams without existing LLM security expertise, the repository provides breadth but not the depth needed to actually secure a production system.

Verdict

Use if you’re building a security program around generative AI applications and need comprehensive threat modeling starting points. This repository excels as a research springboard for red teams planning LLM penetration tests, security architects evaluating defensive tools, or engineers needing to quickly understand the attack surface beyond generic vendor marketing. It’s particularly valuable when you encounter a novel LLM vulnerability and need to trace it back to academic research or find practical testing tools. Skip if you need executable security solutions, step-by-step implementation guides, or maintained threat intelligence. Skip if you’re securing your first LLM application without security engineering experience—the lack of contextualization and prioritization makes it overwhelming rather than actionable. For continuously updated threats, active communities like the OWASP LLM project’s working groups or following the Twitter accounts listed in the repository provide better ongoing visibility than this static collection.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/netsecexplained-attacking-and-defending-generative-ai.svg)](https://starlog.is/api/badge-click/developer-tools/netsecexplained-attacking-and-defending-generative-ai)