Back to Articles

Deep-Live-Cam: Real-Time Face Swapping with ONNX Runtime and InsightFace

[ View on GitHub ]

Deep-Live-Cam: Real-Time Face Swapping with ONNX Runtime and InsightFace

Hook

With 80,240 GitHub stars, Deep-Live-Cam has become one of the most popular deepfake repositories—proving that accessible AI tools will always find an audience, ethics be damned.

Context

Traditional deepfake generation has been computationally expensive and time-intensive. Tools like DeepFaceLab require hours of training on datasets of hundreds or thousands of images, followed by lengthy rendering processes. Even consumer-friendly apps often need multiple photos and cloud processing time. Deep-Live-Cam attacks this friction from a different angle: what if you could swap faces in real-time with just a single source image? This isn’t just about novelty—real-time face swapping enables live streaming applications, interactive content creation, and immediate feedback loops that were previously impossible. The project leverages pre-trained ONNX models (specifically InsightFace’s inswapper) to eliminate the training phase entirely, making deepfake technology accessible to anyone with a webcam and a GPU. Built by hacksider and released as open-source Python software, it represents the democratization of technology that was, until recently, confined to research labs and high-budget productions. Whether that democratization is a net positive for society remains hotly debated, but the technical achievement is undeniable.

Technical Insight

Hardware Acceleration

Yes

No

provides inference

provides inference

provides inference

Video Input

Webcam/File

Face Detection

ONNX Model

Landmark Extraction

InsightFace

Source Face Image

Face Embedding

Extraction

Face Swap

Inswapper Model

Enhancement?

GFPGAN

Post-Processing

Output Renderer

Display/Record

ONNX Runtime

CUDA/CoreML/DirectML

System architecture — auto-generated

Deep-Live-Cam’s architecture is built on three pillars: ONNX Runtime for cross-platform inference, InsightFace’s inswapper model for the actual face swapping, and GFPGAN for post-processing enhancement. The ONNX Runtime provides the critical hardware acceleration layer—supporting CUDA for NVIDIA GPUs, CoreML for Apple Silicon, DirectML for AMD/Intel GPUs on Windows, and graceful CPU fallback when no accelerator is available. This design decision allows the same Python codebase to run across Windows, macOS, and Linux without platform-specific forks.

The face swapping pipeline operates in distinct stages. First, face detection identifies all faces in the current frame using a lightweight detection model. For each detected face, the system extracts facial landmarks to establish precise geometry. The inswapper model then performs the actual swap operation—taking the embedding from your single source image and mapping it onto the target face’s geometry while preserving pose, lighting, and expression. Finally, GFPGAN (Generative Facial Prior GAN) enhances the swapped region to reduce artifacts and improve realism.

The installation requirements reveal the technical dependencies:

# Clone and setup
git clone https://github.com/hacksider/Deep-Live-Cam.git
cd Deep-Live-Cam

# Create isolated environment (Python 3.11 recommended for Windows/Linux)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

You’ll need to manually download two ONNX models: GFPGANv1.4.onnx for face enhancement and inswapper_128_fp16.onnx for the core face swapping operation. These pre-trained models are what eliminate the training phase—they’re frozen networks that have already learned face representations from massive datasets.

The “single image” capability is particularly interesting from an ML perspective. Unlike traditional deepfake approaches that train a subject-specific model, inswapper uses a face recognition embedding space. Your source image is encoded into a high-dimensional vector representing that person’s facial identity. During inference, this embedding is injected into the target face while preserving spatial features like pose and expression. It’s similar to style transfer, but operating in facial identity space rather than artistic style space.

Deep-Live-Cam also implements advanced features like mouth masking and face mapping. Mouth masking preserves the original speaker’s mouth region, allowing their actual lip movements to show through while the rest of the face is swapped—critical for maintaining synchronization in dialogue. Face mapping allows different source faces to be applied to multiple subjects in the same frame simultaneously, tracked via a simple UI where you assign source images to detected face positions.

Performance optimization is handled through execution providers. The system attempts to load CUDA first (for NVIDIA), then DirectML (for AMD/Intel on Windows), then CoreML (for Apple Silicon), and finally falls back to CPU. Frame processing on a modern GPU appears capable of real-time performance based on the project’s demo materials, though specific benchmarks aren’t provided. CPU-only mode will be significantly slower, likely not suitable for true real-time applications in most scenarios.

Gotcha

The installation process is complex for non-technical users, with multiple platform-specific requirements. Manual installation requires Python (3.11 recommended for Windows/Linux, but 3.10 specifically for macOS), platform-specific Visual Studio runtimes on Windows, correct ffmpeg binaries in PATH, and careful virtual environment management. The macOS installation is particularly specific—requiring Homebrew, python-tk@3.10 packages, and explicit Python 3.10 selection for CoreML support, which may conflict with system Python on many Macs. Version compatibility between Python, CUDA (12.8.0), and cuDNN (v8.9.7) must be carefully matched for GPU acceleration.

The ethical safeguards mentioned in the README exist but lack technical implementation details. The documentation mentions “built-in checks to prevent processing inappropriate media” including nudity, graphic content, and sensitive material, but provides no information about how these checks work, their accuracy, or their robustness. Given that this is open-source code, the practical enforceability of these safeguards is limited. The disclaimer places legal responsibility entirely on users while providing a tool explicitly designed for realistic face swapping. Requiring consent and labeling deepfakes is mentioned as user responsibility, but there’s no technical enforcement mechanism. For commercial applications, you’re entering legally grey territory with potential liability for misuse, fraud, or rights-of-publicity violations.

Verdict

Use Deep-Live-Cam if you’re experimenting with real-time video processing pipelines, need to understand ONNX Runtime optimization for live inference, or have legitimate creative projects with full consent from subjects (VTuber avatars, theatrical performances, educational demonstrations of AI capabilities). The technical architecture is genuinely impressive and the pre-trained model approach is instructive for understanding modern face recognition systems. Skip if you lack GPU hardware (CPU performance will be significantly slower), need production reliability (this is a research project with minimal error handling), can’t navigate complex Python dependency management with platform-specific requirements, or—most importantly—if you can’t articulate a specific ethical use case that wouldn’t raise legal concerns. The absence of detailed technical safeguards means you bear full legal and moral responsibility for outputs. Consider commercial alternatives like Reface or professional tools like RunwayML if you need accountability, support, and legal cover.

// ADD TO YOUR README
[![Featured on Starlog](https://starlog.is/api/badge/ai-dev-tools/hacksider-deep-live-cam.svg)](https://starlog.is/api/badge-click/ai-dev-tools/hacksider-deep-live-cam)