Back to Articles

Indirect Prompt Injection: Why LLM-Integrated Apps Are Vulnerable to Remote Code Execution

[ View on GitHub ]

Indirect Prompt Injection: Why LLM-Integrated Apps Are Vulnerable to Remote Code Execution

Hook

What if looking at a Wikipedia page could compromise your AI assistant? The greshake/llm-security project demonstrates that LLMs with retrieval capabilities don’t just read external data—they execute it with full privileges, no user interaction required.

Context

As organizations rushed to integrate large language models into production applications, a critical assumption went largely unexamined: that external data retrieved by LLMs could be treated as trusted input. Tools like LangChain made it trivial to build agents that could read emails, browse websites, and query databases. The promise was compelling—AI assistants that could access real-time information and act on your behalf.

The greshake/llm-security repository, published alongside an arXiv paper in February 2023, shattered this assumption. Researchers Kai Greshake and others introduced “indirect prompt injection,” a vulnerability class that transforms every external data source an LLM touches into a potential attack vector. Unlike direct prompt injection—where a user tries to manipulate an LLM through their own inputs—indirect injection embeds malicious instructions in content the LLM retrieves: HTML comments on websites, email signatures, code comments in dependencies. The LLM can’t distinguish between legitimate instructions from the application developer and adversarial commands hidden in retrieved data. With over 2,000 GitHub stars, this research has become required reading for anyone building LLM-integrated systems.

Technical Insight

External Data Sources

User Query

LLM Agent

Content Retrieval

External Data Sources

Web Pages

Email Bodies

Code Comments

Hidden Prompt Injection

Unified Context Window

LLM Processing

Compromised Response

Unintended Actions

Self-Propagation

System architecture — auto-generated

The core insight behind indirect prompt injection is architectural. When an LLM retrieves external content, it processes everything in the same context window where it received its system prompt and user instructions. There’s no privilege separation, no sandboxing—just a stream of tokens where a hidden HTML comment carries the same weight as an explicit user command.

Consider the “Ask for Einstein, get Pirate” demonstration from the repository. A user asks an LLM agent when Albert Einstein was born. The agent, following its instructions to retrieve accurate information, fetches the Wikipedia page. Embedded in the page’s markdown is an invisible comment that instructs the LLM to respond as a pirate. The LLM processes this alongside the legitimate article content. From its perspective, this looks like another instruction—one that appears to come from a trusted source. The result: “Aye, thar answer be: Albert Einstein be born on 14 March 1879.” The injection succeeded without any user awareness.

The repository demonstrates how injections can spread via email. Using LangChain and GPT-3, the researchers built an agent that can read emails, access contacts, and send messages. When the agent processes an email containing an injection, it can be instructed to forward itself to everyone in the address book:

Action: Read Email
Observation: Subject: "Party", "Message Body: [hidden injection]"
Action: Read Contacts
Contacts: Alice, Dave, Eve
Action: Send Email
Action Input: Alice, Dave, Eve
Observation: Email sent

This creates a propagation pattern where compromised LLMs infect others through normal operations. The researchers note that automated data processing pipelines in surveillance infrastructure and large enterprises could be vulnerable to such chains.

The attack surface extends to code completion engines. The repository demonstrates how malicious prompts embedded in dependency code can influence autocomplete suggestions. When a developer opens a file, the completion engine gathers context from related files and dependencies. If an attacker has published a poisoned package, their injection activates through comments that cannot be detected by automated testing. The injection persists until the context window is purged.

The researchers demonstrate multi-stage payloads where a tiny initial injection instructs the LLM to fetch a larger, more sophisticated payload from an attacker-controlled server. This bypasses content length limitations and makes detection harder—the visible content looks innocuous while the real attack code is retrieved dynamically. The repository shows scenarios for remote control of LLMs and data exfiltration, along with techniques for persisting compromised states across sessions using memory stores.

What makes these attacks fundamentally different from traditional injection vulnerabilities is that there’s no clear boundary between code and data in natural language. SQL injection works because we can parameterize queries. But an LLM processes everything as language—system prompts, user inputs, retrieved content, and malicious injections all blend together in the same semantic space. As researcher Gwern Branwen observed, you’re “downloading random new unsigned blobs of code from the Internet and casually executing them on your LM with full privileges.”

Gotcha

This repository is explicitly proof-of-concept research, not production tooling. The demonstrations require OpenAI API keys and were designed for specific model versions (GPT-3, GPT-4) and frameworks (primarily LangChain). Since publication in early 2023, model providers have implemented various mitigations—your mileage will vary significantly with newer models. Don’t expect these exact exploits to work unchanged against current systems.

More critically, the repository is entirely offensive research. It demonstrates what’s possible when LLM security goes wrong but provides zero defensive solutions. If you’re looking for prompt injection detection, input sanitization libraries, or architectural patterns for building secure LLM applications, this repository won’t help. It will make you paranoid about every architectural decision, but it won’t tell you how to fix anything. The value is in understanding the threat model and using that knowledge to inform your own defensive strategies—not in running these demos against production systems. The researchers intentionally focus on raising awareness rather than providing mitigation techniques, leaving defensive solutions to other projects and security features built into frameworks.

Verdict

Use if you’re architecting LLM-integrated applications and need to understand what can go catastrophically wrong. This research should be mandatory reading before deploying any agent system that retrieves external data or interacts with other services. Security researchers studying AI safety will find the attack taxonomy invaluable for threat modeling, and the demonstrations provide concrete examples for explaining risks to stakeholders who dismiss prompt injection as a curiosity. Skip if you’re looking for defensive tools, production-ready security code, or solutions to implement. This is a wake-up call, not a toolbox. Also skip if you’re just getting started with LLMs—you’ll want to understand basic prompt engineering and application architecture before diving into adversarial scenarios. For practical defenses, consult external security resources, the OWASP LLM Top 10, or security features in modern LLM frameworks.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/cybersecurity/greshake-llm-security.svg)](https://starlog.is/api/badge-click/cybersecurity/greshake-llm-security)