RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
Affect
Achieving Fair Speech Emotion Recognition via Perceptual Fairness
Abstract
Speech emotion recognition (SER) is a key technological module to be integrated into many voice-based solutions. One of the unique fairness issues in SER is caused by the inherently biased emotion perception given by the raters as ground truth labels. Mitigating rater biases are at core for SER to move toward optimizing both recognition and fairness performance. In this work, we proposed a two-stage framework, which produces debiased representations by using a fairness constraint adversarial framework in the first stage. Then, users are endued with the right to toggle between specified gender-wise perceptions on-demand after the gender-wise perceptual learning in the second stage. We further evaluate our results on two important fairness metrics to show that the distributions and predictions across different gender are fair.
Figures
Overview of the fair speech emotion recognition (SER) architecture using our proposed two-stage framework.
Overview of the fair speech emotion recognition (SER) architecture using our proposed two-stage framework. The first stage is used to train for a fair representation, then utilize the fair representation to train for the gender-wise perceptual predictions through the second stage. For the corresponding application scenarios, once the technology enabler provides a fair embedding from the FairRep, the service provider provides users the ability to toggle options on-demand.
Keywords
speech emotion recognition | rater bias | fair representation | perceptual fairness
Authors
Conference
IEEE ICASSP
2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)
DOI
10.1109/ICASSP49357.2023.10094984
Publisher
IEEE