Defend for Self-Vocoding: A Novel Enhanced Decoder Network for Watermark Recovery｜BIIC Lab - NTHU

Safety

Defend for Self-Vocoding: A Novel Enhanced Decoder Network for Watermark Recovery

ISCA Archive

Abstract

Recent advances in voice cloning technology have raised security concerns due to its ability to generate highly realistic synthetic speech, making it challenging to detect malicious usage. Proactive watermarking approaches embed authentication information in target voices to prevent unauthorized synthesis. While existing methods show resilience against traditional preprocessing attacks, we identify a novel threat, self-vocoding, which reconstructs audio using neural vocoders, can cause severe watermark degradation but preserve high audio fidelity. To address this, we propose an enhanced decoding framework to handle self-vocoding distortions on watermarks. In addition to general vocoder distortions, we systematically categorize them into two vocoder types for further analysis. Experimental results demonstrate that our approach significantly improves watermark decoding accuracy, offering an effective defense against self-vocoding attacks.

Figures

“Self-vocoding” watermark preprocessing attack

Keywords

voice cloning ｜ watermark recovery ｜ vocoder ｜ self-vocoding

Authors