RESEARCH

HOME RESEARCH
Behavior Computing
Every Rating Matters: Joint Learning of Subjective Labels and Individual Annotators for Speech Emotion Classification
Abstract
Emotion perception is subjective and vary with respect to each individual due to the natural bias of human, such as gender, culture, and age. Conventionally, emotion recognition relies on the consensus, e.g., majority of annotations (hard label) or the distribution of annotations (soft label), and do not include rater-specific model. In this paper, we propose a joint learning methodology that simultaneously considers the label uncertainty and annotator idiosyncrasy using hard and soft emotion label annotation accompanying with individual and crowd annotator modeling. Our proposed model achieves unweighted average recall (UAR) 61.48% on the benchmark emotion corpus. Further analyses reveal that emotion perception is indeed rater-dependent, using the hard label and soft emotion distribution provides complementary affect modeling information, and finally joint learning of subjective emotion perception and individual rater model provides the best discriminative power.
Figures
(a) Learning target (b) Final recognition Layer (c) Individual rater model (d) Component framework
(a) Learning target (b) Final recognition Layer (c) Individual rater model (d) Component framework
Keywords
speech emotion recognition | BLSTM | annotator modeling | soft label learning
Authors
Publication Date
2019/05/12
Conference
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI
10.1109/icassp.2019.8682170
Publisher
IEEE