Abstract
In recent years, there have been significant advancements in one-shot voice conversion (VC), enabling the alteration of speaker traits with just a single sentence. However, as this technology matures and generates increasingly realistic utter- ances, it becomes vulnerable to privacy concerns. In this paper, we propose RW-VoiceShield to shield voice from replication. This is achieved by effectively attacking one-shot VC models through the application of imperceptible noise generated from a raw waveform-based generative model. Our method under- goes testing using the latest one-shot VC model, conducting subjective and objective evaluations under both black-box and white-box scenarios. Our results indicate significant disparities in speaker characteristics between the utterances generated by the VC model and those of the protected speaker. Furthermore, even with adversarial noise introduced to protected utterances, the speaker’s distinct characteristics remain recognizable.