AIVSS: Adapting CVSS Vulnerability Scoring for the Age of AI Systems
Hook
Your AI system might survive every traditional security audit while remaining completely defenseless against adversarial inputs that flip critical decisions. Traditional vulnerability scoring wasn’t built for this.
Context
The Common Vulnerability Scoring System (CVSS) has served the security community well, providing a standardized language for communicating the severity of software vulnerabilities. But CVSS was designed for a world of buffer overflows, SQL injections, and authentication bypasses—not for systems that can be manipulated through carefully crafted inputs invisible to traditional security tools, or that silently degrade as their training data becomes stale.
The Artificial Intelligence Vulnerability Scoring System (AIVSS) attempts to bridge this gap by extending CVSS methodology to cover AI-specific threat vectors. This framework recognizes that AI systems—particularly Large Language Models deployed in cloud environments—face an entirely different threat landscape. Model poisoning, concept drift, adversarial perturbations, bias amplification, and lifecycle-stage vulnerabilities don’t map cleanly to traditional metrics like “Attack Vector” and “Privileges Required.” AIVSS proposes a scoring methodology that preserves CVSS’s familiar structure while introducing nine AI-specific metric categories.
Technical Insight
AIVSS extends the familiar CVSS structure—Base and Environmental metrics—with an entirely new AI-Specific Metrics component. The Base Metrics remain largely unchanged, preserving compatibility with existing security workflows. You still evaluate Attack Vector (Network: 0.85, Local: 0.55, Physical: 0.2), Attack Complexity (Low: 0.77, High: 0.44), and Privileges Required (None: 0.85, High: 0.27). This grounding in CVSS means security teams don’t need to learn an entirely new scoring vocabulary.
The innovation lies in the AI-Specific Metrics formula, which multiplies nine distinct vulnerability dimensions:
AISpecificMetrics = [MR × DS × EI × DC × AD × AA × LL × GV × CS] × ModelComplexityMultiplier
Each component addresses threats unique to AI systems. Model Robustness (MR) assesses resilience against adversarial attacks and degradation—can an attacker cause misclassification by adding imperceptible noise to inputs? Data Sensitivity (DS) evaluates risks around training data confidentiality, integrity, and provenance. Ethical Implications (EI) quantifies bias, transparency failures, and accountability gaps. Decision Criticality (DC) measures consequence severity if the model produces incorrect outputs—a content recommendation algorithm scores differently than a medical diagnosis system.
The Adversarial Attack Surface (AA) metric is particularly noteworthy. It evaluates exposure to specific attack classes like evasion attacks (manipulating inputs to fool the model), poisoning attacks (corrupting training data), model extraction (stealing the model through API queries), and inference attacks (extracting training data through model responses). This granularity matters because mitigations differ radically—defending against evasion requires input validation and adversarial training, while defending against extraction requires rate limiting and output perturbation.
Lifecycle Vulnerabilities (LL) acknowledges that AI systems have attack surfaces at every development stage. Data collection might involve scraping sensitive information without proper consent. Training could occur on compromised infrastructure. Deployment might expose models through poorly secured APIs. Maintenance phases risk introducing concept drift if monitoring fails. Unlike traditional software where deployment is a discrete event, AI systems exist in continuous lifecycle flux.
The ModelComplexityMultiplier scales risk based on model sophistication, ranging from 1.0 for simple linear models to 1.5 for complex architectures like transformer-based LLMs. This reflects reality: a logistic regression classifier has a fundamentally different attack surface than large language models. More complex models offer more attack vectors but also potentially more robust behavior—the multiplier captures this nuance.
Environmental Metrics adapt traditional Confidentiality/Integrity/Availability requirements with a Societal Impact Requirement (SIR). A facial recognition system deployed by law enforcement scores differently than the same system used for photo organization, even if the technical vulnerability is identical. SIR modifiers range from Low (0.5) to High (1.5), allowing organizations to weight scores based on deployment context. This is CVSS’s “Scope” metric reimagined for AI’s broader societal implications.
The Cloud Security Alliance LLM Taxonomy integration (CS) is the framework’s most contemporary aspect. It explicitly addresses prompt injection, insecure output handling, training data poisoning, model denial of service, supply chain vulnerabilities in pre-trained models, and sensitive information disclosure through LLM responses. Each threat category receives its own scoring dimension, making AIVSS immediately applicable to the current wave of LLM deployments.
Gotcha
Here’s the critical limitation: AIVSS is a specification document, not a turnkey solution. The repository provides a detailed framework document outlining scoring rubrics and methodology, but building an actual scoring engine from this specification requires development effort—you’ll need to implement the metric calculations, build interfaces for assessors to input evaluations, create reporting tools, and probably build integration hooks with existing vulnerability management systems.
The scoring rubrics, while detailed, present methodological questions. Why does Model Robustness multiply with Data Sensitivity rather than sum? Why does the ModelComplexityMultiplier cap at 1.5 instead of 2.0? The framework provides a comprehensive structure but organizations adopting AIVSS should recognize they’re working with a relatively new methodology. Traditional CVSS has benefited from extensive industry feedback—AIVSS is comparatively nascent.
The framework also faces a challenge inherent to scoring dynamic systems. AI models degrade, training data distributions shift, attack techniques evolve, and ethical standards change. A model scoring 5.0 today might score 8.0 next quarter after a new adversarial technique emerges, even without any system changes. The specification addresses adaptability considerations but implementing effective re-scoring processes remains an organizational challenge.
Verdict
Use AIVSS if you’re building security assessment programs for AI systems and need to communicate risks to stakeholders already familiar with CVSS scoring, or if you’re designing organizational AI governance frameworks and want a structured approach to threat categorization. The detailed rubrics serve excellently as audit checklists even without automated scoring—walking through Model Robustness, Data Sensitivity, and Lifecycle Vulnerabilities will surface risks that traditional security reviews miss. It’s particularly valuable for compliance teams who need to demonstrate systematic risk assessment processes for AI deployments. Skip AIVSS if you need production-ready tooling you can deploy immediately, or if your security program isn’t already using CVSS—there’s no point adopting CVSS conventions for AI if your organization doesn’t use CVSS for traditional vulnerabilities. Consider MITRE ATLAS for actionable attack patterns with real exploitation examples, or the OWASP Top 10 for LLM Applications if you want immediately practical guidance over theoretical frameworks.