Achieving Fair Speech Emotion Recognition via Perceptual Fairness｜BIIC Lab - NTHU

Speech and Language

Affect

Achieving Fair Speech Emotion Recognition via Perceptual Fairness

Download PDF IEEE Xplore

Abstract

Speech emotion recognition (SER) is a key technological module to be integrated into many voice-based solutions. One of the unique fairness issues in SER is caused by the inherently biased emotion perception given by the raters as ground truth labels. Mitigating rater biases are at core for SER to move toward optimizing both recognition and fairness performance. In this work, we proposed a two-stage framework, which produces debiased representations by using a fairness constraint adversarial framework in the first stage. Then, users are endued with the right to toggle between specified gender-wise perceptions on-demand after the gender-wise perceptual learning in the second stage. We further evaluate our results on two important fairness metrics to show that the distributions and predictions across different gender are fair.

Figures

Overview of the fair speech emotion recognition (SER) architecture using our proposed two-stage framework. The first stage is used to train for a fair representation, then utilize the fair representation to train for the gender-wise perceptual predictions through the second stage. For the corresponding application scenarios, once the technology enabler provides a fair embedding from the FairRep, the service provider provides users the ability to toggle options on-demand.

Keywords

speech emotion recognition ｜ rater bias ｜ fair representation ｜ perceptual fairness

Authors

Conference

2023 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2023)

DOI

10.1109/ICASSP49357.2023.10094984

Publisher

RESEARCH

Related Research