MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer｜BIIC Lab - NTHU

States and Traits

MetricAug: A Distortion Metric-Lead Augmentation Strategy for Training Noise-Robust Speech Emotion Recognizer

Download PDF

Abstract

Noise-robust speech emotion recognition (SER) systems are important in real world applications. Conventionally, noise ro- bustness is achieved by training on a noise-augmented dataset. In this work, instead of pre-defining noise SNRs to augment the clean set, we propose an augment-while-train strategy while referencing speech distortion metric. This strategy (MetricAug) constructs an augmented set per each training epoch by assess- ing the effect of different distortion levels have on degrading the SER performances. That is, we augment more of those noisy data that degrade the SER performance the most dynam- ically at each learning epoch. We evaluate our framework on two databases, MSP-Podcast and MELD. Our framework shows consistent robustness against varying levels and even unseen noise types. Further analysis reveals that by choosing STOI as the metric of noise distortion, it leads the construction of aug- mented sets better than metrics of PESQ and fwSNRseg.

Figures

Illustration of MetricAug: An epoch-wise distortion metric-lead noise augmentation.

Keywords

speech emotion recognition ｜ peech distortion metrics ｜ noise robustness

Authors

Publication Date

2023/08/22

Conference

Interspeech 2023

Publisher

RESEARCH

Related Research