How Reverse-SynthID Exploits Phase Coherence to Bypass AI Image Watermarks
Hook
Google's SynthID watermark survives crops, filters, and compression—yet a researcher defeated it by analyzing what solid-color backgrounds reveal in the Fourier domain.
Context
As generative AI floods the internet with synthetic imagery, watermarking has become the frontline defense for attribution and authenticity. Google's SynthID represents the state-of-the-art: an imperceptible watermark embedded directly into images generated by Gemini models, designed to survive common transformations like JPEG compression, resizing, and color adjustments. Unlike metadata-based approaches that strip away with a single exiftool command, SynthID operates in the pixel domain itself, making it theoretically robust against casual tampering.
But watermarking systems live in an adversarial ecosystem. Security researchers need to probe these systems not to enable misuse, but to identify weaknesses before malicious actors do. The reverse-SynthID project by aloshdenny represents precisely this kind of adversarial research: a systematic attempt to reverse-engineer and bypass SynthID detection through pure signal processing. With over 3,700 GitHub stars, it's sparked both fascination and controversy by demonstrating that even sophisticated frequency-domain watermarks have exploitable structural vulnerabilities—particularly when attackers can observe how watermarks behave across controlled input variations.
Technical Insight
The core insight driving reverse-SynthID is elegantly counterintuitive: watermarks reveal themselves most clearly when you remove the signal. The system exploits what the author calls "content-independence"—the fact that SynthID's watermark carriers must maintain consistent phase relationships regardless of image content, while natural image frequencies scramble and decorrelate across different scenes.
The attack begins with dataset construction across three dimensions: model type (gemini-3.1-flash-image-preview vs. nano-banana-pro-preview), solid colors (red, blue, green, white, black), and resolutions. By generating multiple watermarked images of pure solid backgrounds, the system builds a spectral profile where genuine watermark frequencies exhibit phase coherence across all color variants while content-driven artifacts disappear. The multi-resolution spectral subtraction stage then identifies consensus patterns—frequencies that appear consistently regardless of whether the background is crimson or cyan.
The code structure reveals a hierarchical attack pipeline. While the full implementation isn't provided in the repository analysis, the conceptual flow operates like this:
import numpy as np
from scipy.fftpack import fft2, ifft2
def detect_watermark_carriers(solid_color_images, threshold=0.85):
"""
Identify frequency bins showing phase coherence across color variants.
True watermark carriers maintain consistent phase; content frequencies don't.
"""
spectra = [fft2(img) for img in solid_color_images]
phase_consensus = np.zeros(spectra[0].shape)
for freq_y in range(spectra[0].shape[0]):
for freq_x in range(spectra[0].shape[1]):
phases = [np.angle(spec[freq_y, freq_x]) for spec in spectra]
# Calculate circular variance to measure phase coherence
phase_variance = 1 - np.abs(np.mean(np.exp(1j * np.array(phases))))
if phase_variance < (1 - threshold):
phase_consensus[freq_y, freq_x] = 1.0
return phase_consensus # Binary mask of suspected watermark carriers
def spectral_subtraction_attack(target_image, carrier_mask, strength=0.3):
"""
Surgically attenuate identified watermark frequencies in Fourier domain.
"""
spectrum = fft2(target_image)
# Apply graduated attenuation to avoid ringing artifacts
spectrum *= (1 - strength * carrier_mask)
return np.real(ifft2(spectrum))
This spectral subtraction forms just the first stage. The project then chains six additional transformations, each targeting different watermark resilience mechanisms that Google's paper describes. VAE reconstruction passes the image through a variational autoencoder bottleneck, exploiting the fact that imperceptible watermarks often fail to survive the information compression of latent space encoding. Elastic spatial deformation—arguably the most sophisticated component—applies independent sub-pixel offsets to ~50-pixel neighborhoods, fragmenting the spatial phase coherence that frequency-domain watermarks depend on without introducing visible warping.
The pixel-domain squeeze transformations and color space manipulations add additional perturbation layers, while JPEG compression at the end exploits block-based quantization to further degrade high-frequency watermark components. The system's "all-in-one" attack combines these stages with per-model calibration weights stored in what the documentation calls "codebooks."
Perhaps most revealing is the human-in-the-loop calibration architecture. The attacker generates modified images, manually uploads them to Gemini's detection interface, records whether watermarks are detected, then uses this binary feedback to refine spectral subtraction weights:
# Conceptual calibration loop
for iteration in range(max_calibration_rounds):
modified_image = apply_attack_pipeline(original_image, current_weights)
# Human manually checks detection via Gemini app
detected = input("Was watermark detected? (y/n): ") == 'y'
if detected:
# Increase aggression on identified carriers
current_weights['spectral_strength'] *= 1.2
current_weights['elastic_intensity'] *= 1.1
else:
# Success - record winning configuration
save_codebook(current_weights)
break
This creates an adaptive adversarial system that evolves with each detection attempt. The project reports achieving 90% accuracy in detecting SynthID presence and demonstrates successful bypass (labeled "Round 06" in documentation) with peak signal-to-noise ratio above 43 dB—meaning perceptual quality remains extremely high even after aggressive watermark removal. The multi-stage approach trades computational complexity for surgical precision, avoiding the quality degradation typical of blunt-force attacks like heavy JPEG compression.
Gotcha
The Achilles' heel of reverse-SynthID is its dependency on manual verification. Every calibration iteration requires a human to upload images to Gemini's detection interface and report results back to the system. This semi-automated workflow makes the attack impractical for batch processing and vulnerable to rate limiting or access restrictions. If Google implements API-based detection with usage caps, the calibration loop breaks down. The human-in-the-loop design also means reproducibility suffers—different operators may interpret borderline detection results inconsistently, introducing noise into the calibration feedback.
The spectral codebook approach also reveals a fundamental limitation: the attack requires extensive training data. You need multiple watermarked samples across resolution and color combinations to build accurate phase consensus profiles. For a new Gemini model or updated watermarking scheme, you're back to square one, generating dozens of calibration images before attacks become effective. The project's success against current SynthID implementations doesn't guarantee transferability to future versions—particularly if Google responds by introducing randomized carrier selection or time-varying watermark patterns that break the phase coherence assumption.
Finally, while 43+ dB PSNR sounds impressive, the seven-stage pipeline introduces subtle artifacts that trained observers or forensic tools might detect. The elastic deformation stage, despite careful tuning, can create microscopic geometric inconsistencies. The VAE reconstruction may smooth fine textures differently than the original generation process. These aren't visible to casual inspection, but they leave forensic fingerprints. The "nuke" preset mentioned in documentation trades even more quality for higher bypass confidence—acknowledging that complete undetectability and perfect fidelity remain fundamentally in tension.
Verdict
Use reverse-SynthID if you're a security researcher auditing watermarking robustness, an ML engineer benchmarking detection systems against adversarial attacks, or a watermarking developer who needs to understand real-world threat models before deployment. It's invaluable for exposing the gap between theoretical robustness and practical resilience. Skip it if you're looking for production watermark removal tools (massive legal and ethical concerns), lack signal processing expertise to interpret spectral analysis results, need fully automated operation without manual feedback, or want attacks that transfer to arbitrary watermarking schemes beyond SynthID. This is a research artifact demonstrating specific vulnerabilities in frequency-domain approaches—not a universal watermark eraser. Treat it as a security audit tool that illuminates what "imperceptible and robust" actually means when subjected to informed adversarial pressure.