RESEARCH

HOME RESEARCH
Behavior Computing
Safety
Defend for Self-Vocoding: A Novel Enhanced Decoder Network for Watermark Recovery
Abstract
Recent advances in voice cloning technology have raised security concerns due to its ability to generate highly realistic synthetic speech, making it challenging to detect malicious usage. Proactive watermarking approaches embed authentication information in target voices to prevent unauthorized synthesis. While existing methods show resilience against traditional preprocessing attacks, we identify a novel threat, self-vocoding, which reconstructs audio using neural vocoders, can cause severe watermark degradation but preserve high audio fidelity. To address this, we propose an enhanced decoding framework to handle self-vocoding distortions on watermarks. In addition to general vocoder distortions, we systematically categorize them into two vocoder types for further analysis. Experimental results demonstrate that our approach significantly improves watermark decoding accuracy, offering an effective defense against self-vocoding attacks.
Figures
Introduction
“Self-vocoding” watermark preprocessing attack
Keywords
voice cloning | watermark recovery | vocoder | self-vocoding
Publication Date
2025/08/17
Conference
Interspeech 2025​​​​​​​​​​​​​
DOI
10.21437/Interspeech.2025-1091