RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
Other: Signal Modeling for Understanding
Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy
Abstract
Speech emotion recognition (SER) is being actively developed in multiple real-world application scenarios, and users tend to become intimately connected to the service. However, most existing models are vulnerable to malicious hackers and unable to robustly defend against adversarial attacks. The degraded performance can lead to dreadful user experiences and un-satisfactions. In order to improve the robustness of the SER model against attacks, we proposed a self-supervised augmentation defense (SSAD) model by that using a 'single purifying' as a general (i.e., without knowing the types of the attack beforehand) defense model for adversarial attacks instead of training a custom-made defense model for each type of attacks. In this work, we evaluate our defense approach by performing an emotion recognition task on the well-known IEMOCAP corpus and examine the model performances under multiple adversarial attacks. Our proposed SSAD model achieve average 43.53% and 34.99% UAR while under Fast Gradient Sign Method (FGSM) and Projected Gradient descent (PGD) with significantly different intensity settings. Furthermore, our proposed SSAD boosts a 7.29% increasing in protection efficacy and 3.98% increasing in recovery rate.
Figures
An overall scheme of our self-supervised augmentation defense model (SSAD) which includes training and inference stages.
An overall scheme of our self-supervised augmentation defense model (SSAD) which includes training and inference stages.
Keywords
speech emotion recognition | adversarial attacks | self-supervised learning | augmentation
Authors
Publication Date
2022/09/18
Conference
Interspeech
Interspeech 2022
DOI
10.21437/Interspeech.2022-10453
Publisher
ISCA