NegMerge: Solving Machine Unlearning’s Hyperparameter Problem Through Sign-Consensual Weight Merging
Hook
Training a model to forget specific data presents significant challenges with hyperparameter sensitivity in current machine unlearning methods. Researchers from NAVER AI Lab and Sogang University propose a solution: instead of selecting a single fine-tuned model, merge the consensus across multiple models through sign-consensual weight merging.
Context
Machine unlearning has evolved into an increasingly important problem in modern machine learning. Whether removing unwanted training data, addressing privacy concerns, or eliminating biases from foundation models, practitioners need reliable ways to excise specific knowledge without retraining from scratch. The dominant approach—task arithmetic—appears elegant: fine-tune a model on the data you want to forget (the “forget set”), compute the weight difference from the original model (the “task vector”), then subtract that vector to reverse the learning. Simple, theoretically grounded, and computationally cheaper than full retraining.
However, as the NegMerge authors document, task arithmetic exhibits significant hyperparameter sensitivity. Changing the learning rate, adjusting the batch size, or modifying the number of epochs can cause unlearning results to vary considerably between effective forgetting with retain-set degradation and minimal forgetting. Practitioners often train dozens of models with different hyperparameters, carefully validating each one to select the best candidate. The computational savings diminish, and high-stakes decisions about knowledge removal rely on brittle hyperparameter selection. NegMerge proposes to address this by treating multiple fine-tuned models not as candidates to choose from, but as an ensemble to extract consensus from.
Technical Insight
The core insight behind NegMerge is that if a weight update consistently points in the same direction across models trained with different hyperparameters, that represents signal; if it flips signs, that represents noise. Instead of selecting which single fine-tuned model to use, NegMerge creates task vectors from multiple models and performs sign-consensual merging—only keeping weight components where task vectors agree on the direction of change.
The algorithm operates in three phases. First, you fine-tune multiple models on your forget set using different hyperparameter configurations—different learning rates, batch sizes, optimizers, or other parameters you’d normally search over. Second, you compute task vectors for each fine-tuned model by subtracting the original model’s weights. Third, and this is the key operation, you element-wise compare the signs of corresponding weights across all task vectors. If all task vectors show a positive change for a particular weight, you keep that positive component. If all show negative, you keep the negative component. If they disagree, you zero out that weight entirely. The resulting sign-consensual task vector gets negated and added back to the original model, performing the unlearning operation.
Here’s what the sign-consensual merging looks like conceptually in PyTorch (simplified and adapted from the approach described):
import torch
def sign_consensual_merge(task_vectors):
"""Merge task vectors keeping only sign-consistent components.
Args:
task_vectors: List of task vector dicts, each containing weight deltas
Returns:
Merged task vector with only consensual components
"""
merged = {}
# Stack all task vectors for each parameter
for key in task_vectors[0].keys():
stacked = torch.stack([tv[key] for tv in task_vectors])
# Get signs: positive = 1, negative = -1, zero = 0
signs = torch.sign(stacked)
# Check for unanimous consensus
sign_sum = signs.sum(dim=0)
num_vectors = len(task_vectors)
# Keep only weights where all signs agree
consensus_mask = torch.abs(sign_sum) == num_vectors
# Take mean of the actual values where consensus exists
mean_vector = stacked.mean(dim=0)
merged[key] = mean_vector * consensus_mask.float()
return merged
# Apply unlearning: original - merged_task_vector
unlearned_model = apply_task_vector(original_model, merged, scaling=-1.0)
A key advantage of this approach is that it’s architecture-agnostic. The paper demonstrates effectiveness on both CLIP vision-language models (removing concepts like specific objects or artistic styles) and standard image classification models (removing entire classes). For CLIP unlearning, which the released code supports, you can follow the Jupyter notebook in the repository to see it in action on forget sets.
Notably, NegMerge exploits existing computational requirements. Task arithmetic baselines already require training multiple models for hyperparameter search—practitioners typically select one afterward. NegMerge uses those same models but extracts more signal by merging their consensus. No additional training, no extra compute, just better utilization of the models you were training anyway. According to the paper’s abstract, this approach delivers “more effective unlearning without incurring additional computational costs” and demonstrates “improved unlearning performance with minimal degradation on the retain set, outperforming state-of-the-art techniques.”
The sign-consensus mechanism acts as a natural regularizer. Hyperparameter-specific artifacts—weight changes that only occur with particular training configurations—get filtered out because they don’t appear consistently. The robust forgetting signal, the weight changes fundamentally necessary to remove the target knowledge, remains because it persists across training configurations.
Gotcha
The repository currently only includes code for CLIP unlearning (in the CLIP_MU directory), despite the paper demonstrating results on standard image classification models. If you’re working with non-CLIP architectures, you’ll need to adapt the implementation yourself. The README notes that code is available “for CLIP unlearning” and the updates section mentions code being “under internal review,” but there’s no explicit timeline for additional implementations. The Jupyter notebook (NegMerge.ipynb) provides a starting point, but production deployments will need more robust error handling and validation logic.
More fundamentally, NegMerge doesn’t eliminate the computational cost of fine-tuning multiple models—it aims to make better use of models you’d train anyway for hyperparameter search. If you’re working with very large models where even a single fine-tuning run is prohibitively expensive, this approach still requires multiple fine-tuning passes. The method also inherits the general limitations of task arithmetic: it assumes the forget and retain sets are sufficiently separable that you can cleanly subtract knowledge. For highly entangled concepts or cases where forgetting requires architectural changes rather than weight modifications, sign-consensual merging may not be sufficient. Documentation beyond the single notebook is limited—there’s no comprehensive API reference or CLI tools provided in the current release.
Verdict
Use NegMerge if you’re already planning to fine-tune multiple models for hyperparameter search in a machine unlearning pipeline and want potentially more stable unlearning without additional compute cost—especially for CLIP-based vision-language models where you need to remove specific concepts, classes, or associations while preserving general capabilities. The sign-consensual merging may provide robustness that single-model selection cannot match, and the ICML 2025 acceptance suggests both theoretical soundness and empirical validation. It’s potentially valuable when dealing with unlearning scenarios where hyperparameter sensitivity is a concern. Consider alternatives if you need production-ready implementations for non-CLIP models (currently only CLIP code is available), require extensive documentation and API references for deployment, or work with models so large that training even several fine-tuned variants exceeds your compute budget. Also consider other approaches if your unlearning requirements are simple enough that other methods work reliably—NegMerge’s value proposition is specifically addressing hyperparameter brittleness in task arithmetic approaches.