GEIA: How Sentence Embeddings Leak Your Original Text Through Generative Inversion
Hook
That sentence embedding you’re storing in your vector database? A fine-tuned GPT-2 can reverse-engineer the original text with alarming accuracy, even though embeddings are supposed to be semantic abstractions.
Context
Sentence embeddings have become the foundation of modern semantic search, retrieval-augmented generation, and recommendation systems. Models like sentence-BERT compress natural language into dense vectors that capture semantic meaning, ostensibly abstracting away the specific words used. The implicit assumption has been that these embeddings are relatively safe to store and share—after all, they’re just mathematical representations in high-dimensional space, not the actual text. This assumption underlies countless production systems that store user queries, conversation logs, and sensitive documents as embeddings.
The GEIA (Generative Embedding Inversion Attack) repository from HKUST’s Knowledge Computing Lab challenges this assumption with empirical evidence published at ACL 2023 Findings. The research demonstrates that pre-trained language models can be fine-tuned to function as effective inverters: given only an embedding vector, they can reconstruct the original sentence with high fidelity across multiple datasets and embedding methods. This isn’t a theoretical vulnerability—the code provides working implementations that attack embedding models, with the README noting support for sentence-bert models and HuggingFace models (specific supported models are documented in the model_cards dictionary within the code), raising serious questions about privacy guarantees in systems that treat embeddings as anonymized data.
Technical Insight
GEIA’s architecture consists of two adversarial components: victim models that produce embeddings, and attacker models that learn to reverse them. The victim models are sentence encoders (the README indicates support for sentence-bert models and HuggingFace models, with a model_cards dictionary providing details), while the attackers are generative language models—GPT-2, OPT, T5, or DialoGPT—that learn a mapping from embedding space back to token sequences.
The attack works by training the generative model with a custom objective. Instead of feeding text tokens directly, the attacker takes sentence embeddings as input and learns to decode them into the original text. The repository implements this through separate scripts for different attacker architectures. For GPT-2-based attacks using attacker.py, you configure the victim model and dataset, then train:
python attacker.py \
--model_dir gpt2-large \
--num_epochs 5 \
--batch_size 16 \
--dataset personachat \
--data_type train \
--embed_model <victim_model_identifier> \
--decode beam
This trains GPT-2-large to invert embeddings from a specified sentence encoder on PersonaChat conversations. The framework supports decoding algorithms including beam and sampling-based approaches (as indicated by the —decode argument).
The repository also implements baseline attackers with simpler architectures—neural network or RNN projection layers—demonstrating that even non-generative approaches achieve partial reconstruction. These baselines use projection.py with a --model_type argument to switch between NN and RNN architectures. The generative attackers appear to outperform baselines based on the paper’s findings, suggesting that pre-trained language model knowledge is critical for effective inversion.
What makes this particularly concerning is the generalization across domains. The code evaluates attacks on seven diverse datasets: PersonaChat dialogues, QNLI and MNLI natural language inference pairs, SST-2 sentiment sentences, WMT16 translations, MultiWOZ task-oriented dialogues, and ABCD customer service conversations. The attacker models appear to transfer reasonably well across these domains when trained on sufficient data, indicating that the vulnerability likely isn’t dataset-specific.
Evaluation spans three dimensions implemented in separate scripts. eval_generation.py computes generation performance metrics measuring quality of reconstructed sentences. eval_classification.py tests whether embeddings of reconstructed sentences preserve the original’s label in downstream tasks—if an attacker recovers a sentence from an embedding that encoded a negative sentiment, does it still classify as negative? Finally, eval_ppl.py calculates perplexity of reconstructed text under a separate language model, measuring fluency and naturalness.
The framework’s modularity allows researchers to plug in different victim-attacker combinations systematically. The model_cards dictionary in the code maps embedding model names to their identifiers, making it possible to attack different embedding methods. However, each attacker architecture requires its own script (attacker.py, attacker_opt.py, attacker_t5.py) due to different model interfaces and decoding implementations, which creates some code duplication but maintains clarity about architecture-specific details.
Gotcha
This is research code, not a production-ready security toolkit, and the limitations matter if you’re planning to use it. First, computational requirements appear substantial—training GPT-2-large or OPT attackers likely requires GPU resources and time comparable to fine-tuning these models for other tasks. The README doesn’t specify training times or hardware requirements.
Second, attack effectiveness likely degrades with distribution shift based on typical machine learning behavior. An attacker trained on PersonaChat dialogues may not invert embeddings from medical notes or legal documents as effectively. The repository evaluates on seven datasets, but cross-domain transfer characteristics aren’t detailed in the README.
Third, documentation is minimal. The README provides command-line arguments but no architectural diagrams, no detailed explanation of the inversion mechanism, and limited guidance on hyperparameter selection. You’ll need to read the ACL 2023 paper and examine the code to understand implementation details. The evaluation scripts require manually setting result paths inside the Python files rather than accepting command-line arguments, which feels brittle. There’s no unified pipeline—training, inference, and evaluation are separate manual steps across different scripts. This is appropriate for reproducing paper results but may be frustrating for extending the work.
Data preparation requires some manual steps: PC (PersonaChat) data is included in the data/ folder, the ABCD dataset must be downloaded from a provided Google Drive link, and other datasets are downloaded via the datasets package at runtime.
Verdict
Use GEIA if you’re a security researcher evaluating privacy risks in embedding-based systems, an academic studying information leakage in learned representations, or a privacy engineer who needs to demonstrate vulnerability before implementing defenses. It’s valuable for benchmarking how much information different embedding methods leak, understanding attack methodologies to inform threat models, and making informed decisions about whether embeddings can be treated as anonymized data in your application. The multi-dataset, multi-model evaluation framework is genuinely useful for systematic privacy analysis. Skip it if you need production security tools, want defensive techniques rather than attack demonstrations, or expect well-documented, modular libraries with clear APIs. The code serves its purpose as a research artifact for reproducing ACL 2023 Findings results, but it’s not designed for integration into existing systems or for users unfamiliar with the academic paper. If you’re looking for privacy-preserving embedding methods rather than ways to break them, you’ll need to look elsewhere—this shows the problem, not the solution.