RESEARCH

HOME RESEARCH
Behavior Computing
States and Traits
Mental Health
Speech and Language
Balancing Speaker-Rater Fairness for Gender-Neutral Speech Emotion Recognition
Abstract
Speech emotion recognition (SER) adds to the humane aspects of voice technologies to enhance user experiences. The ground truth emotion annotations provided by human raters and attributes related to the speakers themselves arise a compounded
fairness issue in SER. While there exist works in fair SER, our work presents one of the first studies in addressing the unique joint speaker-rater (two-sided) bias, focusing on the issue of gender fairness. Our cross-reference evaluation demonstrates that the SER fair model, which merely mitigates one-sided bias introduces biases when examining from another viewpoint. Furthermore, in order to handle model stability when optimizing for these compounded speaker-rater constraints, we introduce a flexible controlled mechanism that dynamically balances the contribution of each viewpoint. Our analyses show the efficacy of our approach in achieving a fair SER that meets the dual speaker-rater gender neutrality criterion.
Figures
Overview of the fair speech emotion recognition (SER) architecture using both one-sided and our proposed two-sided learning frameworks. The one-sided debiased embedding is first trained independently, then we compute the WD distance between these embeddings per batch to derive the flexible control parameter α, which adjusts the other side’s contribution dynamically. Finally, the two-sided fair SER model is batch-wise optimized by the LTotal.
Overview of the fair speech emotion recognition (SER) architecture using both one-sided and our proposed two-sided learning frameworks. The one-sided debiased embedding is first trained independently, then we compute the WD distance between these embeddings per batch to derive the flexible control parameter α, which adjusts the other side’s contribution dynamically. Finally, the two-sided fair SER model is batch-wise optimized by the LTotal.
Keywords
Speech emotion recognition | Fairness | Gender neutrality | Speaker-rater biases
Authors
Woan-Shiuan Chien Shreya G Upadhyay Chi-Chun Lee
Publication Date
2024/04/17
Conference
ICASSP 2024
Publisher