Every Rating Matters: Joint Learning of Subjective Labels and Individual Annotators for Speech Emotion Classification｜BIIC Lab - NTHU

Every Rating Matters: Joint Learning of Subjective Labels and Individual Annotators for Speech Emotion Classification

Download PDF IEEE Xplore

Abstract

Emotion perception is subjective and vary with respect to each individual due to the natural bias of human, such as gender, culture, and age. Conventionally, emotion recognition relies on the consensus, e.g., majority of annotations (hard label) or the distribution of annotations (soft label), and do not include rater-specific model. In this paper, we propose a joint learning methodology that simultaneously considers the label uncertainty and annotator idiosyncrasy using hard and soft emotion label annotation accompanying with individual and crowd annotator modeling. Our proposed model achieves unweighted average recall (UAR) 61.48% on the benchmark emotion corpus. Further analyses reveal that emotion perception is indeed rater-dependent, using the hard label and soft emotion distribution provides complementary affect modeling information, and finally joint learning of subjective emotion perception and individual rater model provides the best discriminative power.

Figures

(a) Learning target (b) Final recognition Layer (c) Individual rater model (d) Component framework

Keywords

speech emotion recognition ｜ BLSTM ｜ annotator modeling ｜ soft label learning

Authors