Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy｜BIIC Lab - NTHU

Speech and Language

Other: Signal Modeling for Understanding

Vaccinating SER to Neutralize Adversarial Attacks with Self-Supervised Augmentation Strategy

Download PDF

Abstract

Speech emotion recognition (SER) is being actively developed in multiple real-world application scenarios, and users tend to become intimately connected to the service. However, most existing models are vulnerable to malicious hackers and unable to robustly defend against adversarial attacks. The degraded performance can lead to dreadful user experiences and un-satisfactions. In order to improve the robustness of the SER model against attacks, we proposed a self-supervised augmentation defense (SSAD) model by that using a 'single purifying' as a general (i.e., without knowing the types of the attack beforehand) defense model for adversarial attacks instead of training a custom-made defense model for each type of attacks. In this work, we evaluate our defense approach by performing an emotion recognition task on the well-known IEMOCAP corpus and examine the model performances under multiple adversarial attacks. Our proposed SSAD model achieve average 43.53% and 34.99% UAR while under Fast Gradient Sign Method (FGSM) and Projected Gradient descent (PGD) with significantly different intensity settings. Furthermore, our proposed SSAD boosts a 7.29% increasing in protection efficacy and 3.98% increasing in recovery rate.

Figures

An overall scheme of our self-supervised augmentation defense model (SSAD) which includes training and inference stages.

Keywords

speech emotion recognition ｜ adversarial attacks ｜ self-supervised learning ｜ augmentation

Authors