NegMerge: Fixing Machine Unlearning's Hyperparameter Lottery Problem
Hook
What if the reason your machine unlearning fails isn't the algorithm, but the fact that you picked a learning rate of 1e-4 instead of 5e-5? Task arithmetic unlearning methods are devastatingly sensitive to hyperparameters—NegMerge turns that weakness into a strength.
Context
Machine unlearning has become critical infrastructure as privacy regulations like GDPR's "right to be forgotten" force production systems to selectively remove training data influences from deployed models. The nuclear option—retraining from scratch after excluding specific data—is computationally prohibitive for modern large models that cost hundreds of thousands of dollars to train. This has spawned numerous research directions: gradient ascent methods that explicitly maximize loss on "forget set" data, architectural approaches like SISA that shard training for efficient retraining, and increasingly popular task arithmetic techniques that manipulate model weight differences.
Task arithmetic methods are elegant. Fine-tune your pretrained model on the data you want to forget, compute the task vector (the difference between fine-tuned and original weights), negate it, and add it back to the original model. It's conceptually clean and computationally cheaper than full retraining. But there's a dirty secret: it's wildly sensitive to fine-tuning hyperparameters. Choose the wrong learning rate, batch size, or number of epochs, and your model either fails to forget or catastrophically loses performance on data you wanted to retain. Researchers typically run expensive validation sweeps to find the "best" hyperparameters—essentially a lottery where you hope to pick the right ticket. NegMerge, from NAVER AI Lab and accepted at ICML 2025, reframes this problem: instead of hoping one set of hyperparameters is correct, what if we could extract consensus from multiple hyperparameter configurations?
Technical Insight
The core insight of NegMerge is deceptively simple: when multiple task vectors derived from different hyperparameters agree on the sign of a weight update, that signal is likely robust. When they disagree, that component is probably noise induced by the specific hyperparameter choice. By only merging weight components with sign consensus, NegMerge constructs a more reliable task vector that captures what actually needs to change to forget the data.
Here's how it works in practice. Start with a pretrained model θ₀ and create multiple fine-tuned variants on your forget set using different hyperparameters (learning rates, batch sizes, etc.). For each configuration i, compute the task vector τᵢ = θᵢ - θ₀. Now comes the sign-consensus mechanism:
import torch
def sign_consensual_merge(task_vectors, threshold=1.0):
"""
Merge task vectors by keeping only elements with sign consensus.
Args:
task_vectors: List of task vectors (weight differences)
threshold: Fraction of vectors that must agree (1.0 = unanimous)
Returns:
Merged task vector with sign-consensual weights
"""
stacked = torch.stack(task_vectors) # Shape: [num_vectors, ...weight_dims]
signs = torch.sign(stacked) # Get signs: -1, 0, or +1
# Count how many vectors agree on positive vs negative
pos_count = (signs > 0).sum(dim=0)
neg_count = (signs < 0).sum(dim=0)
total = len(task_vectors)
# Consensus mask: where one sign dominates beyond threshold
pos_consensus = pos_count >= (threshold * total)
neg_consensus = neg_count >= (threshold * total)
# Average only the consensual components
merged = torch.mean(stacked, dim=0)
mask = pos_consensus | neg_consensus
return merged * mask.float()
# Usage for unlearning
forget_task_vectors = []
for lr in [1e-5, 5e-5, 1e-4, 5e-4]:
finetuned = finetune_on_forget_set(pretrained_model, forget_data, lr=lr)
task_vector = subtract_weights(finetuned, pretrained_model)
forget_task_vectors.append(task_vector)
# Create sign-consensual task vector
merged_vector = sign_consensual_merge(forget_task_vectors)
# Negate and apply to achieve unlearning
unlearned_model = pretrained_model - merged_vector
The beauty is in what this achieves: you're essentially running hyperparameter search anyway (most unlearning papers do), but instead of discarding all but the "best" configuration based on expensive validation, you extract value from all of them. The sign consensus acts as a robust aggregation mechanism that filters out hyperparameter-specific noise.
The implementation in the repository focuses on CLIP unlearning, where this approach shines particularly well. CLIP's dual-encoder architecture (image and text) creates interesting dynamics for unlearning—you might want to forget certain image-text associations while preserving general visual or linguistic understanding. The task vector approach naturally operates in CLIP's joint embedding space, and sign-consensual merging ensures you're not accidentally destroying representations due to hyperparameter sensitivity.
One architectural decision worth noting: NegMerge doesn't require any changes to the model architecture itself or the fine-tuning procedure. It's purely a post-processing step on already-computed task vectors. This makes it remarkably easy to retrofit into existing unlearning pipelines. If you're already running hyperparameter sweeps (and let's be honest, you probably are), you can apply NegMerge without additional computational cost.
The threshold parameter (defaulting to 1.0 for unanimous consensus) provides a tunable knob for how conservative you want to be. Lowering it to 0.75 or 0.5 allows partial consensus, potentially retaining more of the task vector signal but with less guarantee of robustness. In practice, the paper shows that even unanimous consensus (1.0) retains enough signal for effective unlearning while dramatically improving stability across hyperparameter choices.
What makes this particularly clever is that it solves a meta-problem: how do you validate your unlearning when you can't directly measure whether the model has "forgotten"? Traditional approaches require a validation set of forget data to tune hyperparameters, creating a philosophical paradox (you need to keep data you're supposed to delete). NegMerge sidesteps this by making hyperparameter choice less critical, reducing the need for extensive forget-set validation.
Gotcha
The first limitation you'll hit is practical: the repository only includes the CLIP implementation as a Jupyter notebook. If you're working on standard image classification, language models, or any other domain, you're on your own to translate the methodology. The paper mentions experiments on image classification (CIFAR-10, ImageNet), but that code isn't released. For a research prototype accepted at a top venue, this is expected, but it means you'll need to invest engineering time to adapt it to your use case.
More fundamentally, NegMerge doesn't reduce computational costs—it just makes better use of computation you're probably already doing. You still need to fine-tune multiple models with different hyperparameters, which can be expensive for large models. If you're working in a constrained environment where you can only afford a single fine-tuning run, NegMerge doesn't help. It's an optimization for extracting better results from hyperparameter sweeps, not a replacement for them.
There's also a subtle assumption: that sign consensus indicates robust signal rather than systematic bias. If all your hyperparameters come from the same family (e.g., all small learning rates), the consensus might just reflect that regime's characteristics rather than ground truth about what needs forgetting. The paper doesn't extensively explore adversarial hyperparameter selections that might game the consensus mechanism. Additionally, with only 15 GitHub stars and being freshly accepted to ICML 2025, this hasn't seen extensive community validation. Early research code often has rough edges, incomplete documentation, or subtle bugs that only emerge with diverse real-world usage.
Verdict
Use if: You're already running hyperparameter sweeps for task arithmetic unlearning and want to extract better performance from that computational investment. You're working with CLIP or willing to adapt the methodology to your architecture. You're in a research context where improving unlearning effectiveness matters more than production-ready tooling. You need a theoretically grounded way to aggregate multiple unlearning candidates without expensive forget-set validation. Skip if: You need battle-tested production code with comprehensive documentation and community support—this is early-stage research software. You're working on standard image classification or other domains beyond CLIP and need ready-to-use implementations. You can only afford a single fine-tuning run and need methods that reduce computational costs rather than improve aggregation. You need guarantees about unlearning effectiveness in adversarial settings where the methodology hasn't been extensively tested. For researchers pushing on machine unlearning, NegMerge offers a clever solution to hyperparameter sensitivity, but practitioners should wait for more mature implementations and broader validation.