> your AI agent picks dependencies from memory; give it dated facts — try starlog.dev ↗ vet your agent's deps ↗ vibe-coding is fine. vibe-importing isn’t. — try starlog.dev ↗ vibe-importing isn’t fine ↗ your agent has never seen your private packages — try starlog.dev ↗ facts for private packages ↗ a linter for the dependencies your AI agent picks — try starlog.dev ↗ a linter for agent deps ↗

Back to Articles

BlindAI: Building Confidential AI Inference with Intel SGX Enclaves

[ View on GitHub ]

BlindAI: Building Confidential AI Inference with Intel SGX Enclaves

Hook

What if your cloud provider could run AI inference on your medical records without ever seeing them in plaintext—not even in RAM? That's the promise of confidential computing with secure enclaves.

Context

The standard cloud AI deployment model has a fundamental trust problem. When you send data to an API for inference—whether it's medical images, financial documents, or biometric data—you're trusting the service provider completely. Your data sits in plaintext in server memory during processing. The cloud provider, their employees, and anyone who compromises the server can access it. Even encryption at rest and in transit doesn't help during the actual computation.

This isn't theoretical paranoia. Regulations like GDPR and HIPAA create real liability around data exposure. Enterprises routinely avoid cloud AI services because they can't prove data confidentiality to auditors. BlindAI, developed by Mithril Security, explored using Intel SGX (Software Guard Extensions) to solve this through hardware-enforced isolation. SGX creates 'enclaves'—protected memory regions that encrypt their contents, invisible even to the operating system, hypervisor, or BIOS. The concept: run your entire AI inference pipeline inside an enclave where data remains encrypted in memory, provably inaccessible to everyone except the specific code you've verified. While the project is now unmaintained, its architecture reveals both the potential and practical challenges of building confidential AI systems.

Technical Insight

BlindAI's architecture splits into two distinct components: a Rust-based server that runs inside an SGX enclave, and a Python client library that handles remote attestation and encrypted communication. The server uses Tract, a pure-Rust ONNX inference engine, specifically chosen because Rust's memory safety guarantees align well with SGX's security model—you can't have buffer overflows exposing enclave secrets.

The client workflow demonstrates the core value proposition. Before sending any sensitive data, clients perform remote attestation—a cryptographic proof that the server is running the exact expected code inside a genuine SGX enclave:

from blindai.client import BlindAIClient
import numpy as np

# Connect and verify the enclave
client = BlindAIClient()
client.connect_server(addr="localhost", port=50051)

# Remote attestation - cryptographically verify we're talking to
# the right code in a real SGX enclave
client.attest_server(policy="path/to/policy.toml")

# Upload your ONNX model - it stays encrypted in the enclave
model_id = client.upload_model("medical_classifier.onnx")

# Send sensitive data for inference
patient_scan = np.load("patient_mri.npy")
response = client.run_model(model_id, patient_scan)

# Only the prediction comes back - raw data never left the enclave
print(f"Diagnosis confidence: {response}")

The attestation step is critical and often misunderstood. SGX hardware generates a signed quote—essentially a hash of the enclave code plus some hardware guarantees. The client verifies this quote against Intel's attestation service and a policy file defining what code is acceptable. If the verification fails, the client refuses to send data. This means even if Mithril Security (or a malicious cloud provider) modified the server code to exfiltrate data, attestation would fail and clients would detect the tampering.

Inside the enclave, the Rust server handles ONNX model loading and inference through Tract. The choice of Tract over more popular runtimes like ONNX Runtime or TensorFlow Lite was architectural: Tract is pure Rust with minimal dependencies, reducing the trusted computing base (TCB)—the amount of code you must trust. A smaller TCB means fewer potential vulnerabilities inside the enclave. Here's a simplified view of how the server processes requests:

// Simplified server-side enclave code
use tract_onnx::prelude::*;

pub struct EnclaveModel {
    model: SimplePlan<TypedFact, Box<dyn TypedOp>, Graph<TypedFact, Box<dyn TypedOp>>>,
}

impl EnclaveModel {
    pub fn load_model(model_bytes: &[u8]) -> Result<Self> {
        // Model bytes are encrypted until they cross into the enclave
        let model = tract_onnx::onnx()
            .model_for_read(&mut &model_bytes[..])?
            .into_optimized()?
            .into_runnable()?;
        
        Ok(EnclaveModel { model })
    }
    
    pub fn run_inference(&self, input: ArrayD<f32>) -> Result<ArrayD<f32>> {
        // Input data is encrypted in transit, only decrypted inside enclave
        let tensor = input.into_tensor();
        let result = self.model.run(tvec![tensor])?;
        
        // Only inference results exit the enclave
        Ok(result[0].to_array_view::<f32>()?.to_owned())
    }
}

The communication layer uses gRPC with TLS, but crucially, the TLS termination happens inside the enclave. The private key never exists in host memory. This prevents the cloud provider from performing TLS interception—a common enterprise security practice that would completely undermine confidentiality in traditional deployments.

BlindAI supported two deployment models: a managed API service (now shut down) where Mithril hosted popular models, and 'BlindAI Core' for self-hosting. The Core version let you deploy custom models while maintaining the same confidentiality guarantees. The implementation handled model quantization and optimization within the enclave to work around SGX's memory constraints, typically using int8 quantization to fit larger models into the ~256MB enclave page cache limit.

The architecture's elegance lies in making strong cryptographic guarantees accessible through a simple Python API. Developers don't write enclave code or understand SGX programming. They just verify attestation and send data, trusting mathematics and hardware rather than legal contracts.

Gotcha

The elephant in the room: BlindAI's GitHub prominently warns 'This project is no longer maintained' and explicitly advises against using it for sensitive data processing. This isn't a minor caveat—it's a dealbreaker for production use. Unmaintained security-critical infrastructure is a vulnerability waiting to happen.

Beyond maintenance status, Intel SGX itself has fundamental limitations that plagued BlindAI. SGX has suffered numerous side-channel attacks (Spectre variants, Foreshadow, SGAxe, etc.) that can leak data from enclaves. While mitigations exist, they impose performance penalties and don't eliminate all attack vectors. Intel deprecated SGX from consumer CPUs entirely, limiting availability to specific Xeon server SKUs. Even on supported hardware, enclave memory is severely constrained—the encrypted memory region tops out around 256MB, barely enough for moderately-sized neural networks. Large language models or high-resolution vision models simply won't fit. Performance overhead from memory encryption ranges from 2-10x slowdown compared to native execution, making real-time inference challenging. The project's reliance on ONNX and Tract also means you're locked into frameworks with good ONNX export support—PyTorch works well, but newer architectures with custom operators often struggle. Finally, the threat model assumes you trust Intel's hardware implementation and attestation infrastructure. If that trust is misplaced—or if nation-state adversaries compromise Intel's signing keys—the entire security model collapses.

Verdict

Use if: You're researching confidential computing architectures and want to understand practical TEE implementation patterns, you're building an educational project to learn about SGX and ONNX inference pipelines, or you have resources to fork and maintain the codebase yourself while migrating to newer TEE technologies like AMD SEV-SNP or Intel TDX. Skip if: You need production-ready confidential AI (look at Azure Confidential Computing or Opaque Systems instead), you're working with models larger than 200MB or requiring real-time performance, you're uncomfortable maintaining critical security infrastructure yourself, or you need support for GPUs and modern accelerators. BlindAI's value today is educational—it's a well-architected example of confidential computing principles, but not a solution you should deploy with actual sensitive data.