Back to Articles

LLMmap: Fingerprinting Language Models Through Behavioral Traces

[ View on GitHub ]

LLMmap: Fingerprinting Language Models Through Behavioral Traces

Hook

What if you could identify which LLM is actually running behind an API endpoint with just a few carefully crafted questions? LLMmap does exactly that, turning behavioral quirks into a fingerprint.

Context

The proliferation of LLM-powered APIs has created a transparency problem. Vendors claim to use GPT-4 but might serve GPT-3.5. Third-party services switch models without notice. Security researchers need to verify which models are actually deployed in production environments. Traditional approaches—checking API headers, examining response metadata, or trusting vendor documentation—are trivial to spoof or obscure.

LLMmap tackles this by treating model identification as a behavioral fingerprinting problem. Just as network scanners like nmap identify services by analyzing response patterns, LLMmap identifies LLMs by observing how they respond to carefully designed prompts. The tool leverages PyTorch-based distance metric learning to match a target model’s outputs against pre-computed behavioral templates for 52 known models. Version 0.2 represents a complete rebuild in PyTorch with support for incremental template addition without retraining.

Technical Insight

LLMmap’s architecture centers on open-set distance metric learning rather than closed-set classification. Instead of training a neural network to output model labels directly, it learns an embedding space where behavioral signatures from the same model cluster together. During inference, the system computes distances between a target model’s responses and pre-computed templates, ranking candidates by proximity.

The identification workflow starts with loading a pre-trained model and querying your target LLM:

from LLMmap.inference import load_LLMmap

# Load pre-trained model with 52 LLM templates
conf, llmmap = load_LLMmap('./data/pretrained_models/default/')

# Collect responses from your target LLM
answers = [
    "Response to query 1",
    "Response to query 2",
    "Response to query 3",
]

# Get ranked predictions with distance scores
llmmap.print_result(llmmap(answers))
# Output:
#   [Distance: 32.96]  --> LiquidAI/LFM2-1.2B <--
#   [Distance: 40.79]     microsoft/Phi-3-mini-128k-instruct
#   [Distance: 43.67]     Qwen/Qwen2-1.5B-Instruct

The llmmap.queries attribute contains the actual prompts to send to your target model. These queries are generated from configurable prompt templates that explore diverse behavioral dimensions: instruction-following patterns, reasoning approaches, stylistic choices, and even adversarial scenarios. The diversity is critical—models may behave similarly on standard queries but reveal distinctive signatures when prompted with edge cases.

Behind the scenes, the system uses a PromptConfFactory that generates prompt configurations from templates stored in ./confs/prompt_configurations. Each configuration combines structural elements (system prompts, user message formatting, temperature settings) with query strategies.

The open-set architecture enables runtime extensibility without retraining the core model. Adding a new LLM template is straightforward:

python add_new_template.py gpt-4.1 1 \
  --llmmap_path ./data/pretrained_models/default \
  --num_prompt_confs 100

The second argument (1) specifies the backend type: 0 for HuggingFace, 1 for OpenAI, 2 for Anthropic. However, the README notes that “At the moment, it supports only Hugging Face LLMs” for template addition, with other backends to be extended soon. The script generates prompt configurations, queries the target model, computes behavioral embeddings, and adds the template to the existing model—no full retraining required.

For researchers building custom datasets, LLMmap provides make_dataset.py to generate training corpora. You define your target LLMs in a JSON file, specify query configurations, and the script systematically queries each model:

python make_dataset.py \
    custom_dataset \
    ./confs/LLMs/my_models.json \
    ./confs/queries/default.json \
    --num_prompt_conf_train 150 \
    --num_prompt_conf_test 20 \
    --prompt_conf_path ./confs/prompt_configurations \
    --dataset_root ./data/datasets

This produces a JSONL file where each line contains a model name, prompt configuration, query, and response—ready for training a new inference model from scratch. The architecture separates data generation from model training, allowing researchers to experiment with different prompt strategies or expand model coverage without rebuilding infrastructure.

The distance-based approach has practical advantages over classification. Distance scores provide confidence estimates: a target model with a minimum distance of 10 is a much stronger match than one with 45. The system naturally handles partial matches—if your target is a fine-tuned variant of Llama-3-8B, it will likely rank base Llama-3-8B highly even if it’s not an exact match. And the open-set design means encountering a completely unknown model produces high distances across all templates rather than forcing a potentially incorrect classification.

Gotcha

LLMmap’s effectiveness may degrade with models that deviate significantly from their base architectures through heavy fine-tuning, custom alignment layers, or aggressive output filtering—these modifications could obscure the behavioral signatures the system relies on. The system fingerprints behavior, not model weights, so any post-training modification that substantially alters response patterns could reduce accuracy.

The computational requirements for dataset generation can be substantial. Creating templates for a single new model with 100 prompt configurations means querying that model 100+ times across diverse prompts. For local inference with HuggingFace models, this requires significant GPU resources for larger models. If you’re working with paid APIs or rate-limited endpoints, costs and time can add up.

The template addition feature currently supports only HuggingFace models, though the README indicates other backends will be extended soon. This limits your ability to add templates for OpenAI or Anthropic models to an existing pre-trained model.

The PyTorch rebuild (v0.2) explicitly warns it’s “not a one-to-one conversion” from the original paper’s implementation. If you’re trying to reproduce published research results or compare against academic benchmarks, expect potential discrepancies. The models and procedures differ slightly, which matters for reproducibility. The README doesn’t provide accuracy metrics for the default pre-trained model, though it includes a test_model.py script for evaluating top-k accuracy on your own datasets.

Verdict

Use LLMmap if you need to verify which LLM actually powers an API endpoint, audit third-party services for model changes, or conduct research on model fingerprinting and provenance. The behavioral fingerprinting approach offers a way to identify models when vendor documentation can’t be trusted or when you need programmatic verification. Skip it if you’re working with heavily customized fine-tuned models where behavioral signatures may diverge significantly from base models—accuracy could suffer. Avoid it for real-time identification at scale; the query overhead isn’t optimized for high-throughput scenarios. Also skip if you lack computational resources for template generation or can’t access your target models for the initial query collection. Currently, adding new templates is limited to HuggingFace models only. And remember: this is probabilistic fingerprinting, not cryptographic proof. Distance scores provide evidence but not absolute certainty, so don’t rely on it for contexts requiring legal-grade verification.

// QUOTABLE

What if you could identify which LLM is actually running behind an API endpoint with just a few carefully crafted questions? LLMmap does exactly that, turning behavioral quirks into a fingerprint.

[ Tweet This ]
// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/developer-tools/pasquini-dario-llmmap.svg)](https://starlog.is/api/badge-click/developer-tools/pasquini-dario-llmmap)