Balancing Speaker-Rater Fairness for Gender-Neutral Speech Emotion Recognition｜BIIC Lab - NTHU

Mental Health

Speech and Language

Balancing Speaker-Rater Fairness for Gender-Neutral Speech Emotion Recognition

FULL PAPER IEEE Xplore

Abstract

Speech emotion recognition (SER) adds to the humane aspects of voice technologies to enhance user experiences. The ground truth emotion annotations provided by human raters and attributes related to the speakers themselves arise a compounded

fairness issue in SER. While there exist works in fair SER, our work presents one of the first studies in addressing the unique joint speaker-rater (two-sided) bias, focusing on the issue of gender fairness. Our cross-reference evaluation demonstrates that the SER fair model, which merely mitigates one-sided bias introduces biases when examining from another viewpoint. Furthermore, in order to handle model stability when optimizing for these compounded speaker-rater constraints, we introduce a flexible controlled mechanism that dynamically balances the contribution of each viewpoint. Our analyses show the efficacy of our approach in achieving a fair SER that meets the dual speaker-rater gender neutrality criterion.

Figures

Overview of the fair speech emotion recognition (SER) architecture using both one-sided and our proposed two-sided learning frameworks. The one-sided debiased embedding is first trained independently, then we compute the WD distance between these embeddings per batch to derive the flexible control parameter α, which adjusts the other side’s contribution dynamically. Finally, the two-sided fair SER model is batch-wise optimized by the LTotal.

Keywords

Speech emotion recognition ｜ Fairness ｜ Gender neutrality ｜ Speaker-rater biases

Authors

Publication Date

2024/04/17

Conference

ICASSP 2024

DOI

10.1109/ICASSP48485.2024.10447167

Publisher

RESEARCH

Related Research