RESEARCH

HOME RESEARCH
Behavior Computing
States and Traits
Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels
Abstract
An individual’s emotion perception plays a key role in affecting our decision-making and task performances. Previous speech emotion recognition research focuses mainly on recognizing the emotion label derived from the majority vote (hard label) of the speaker (i.e., producer) but not on recognizing per-rater’s emotion perception. In this work, we propose a framework that integrates different viewpoints of emotion perception from other co-raters (exclude target rater) using soft and hard label learning to improve target rater’s emotion perception recognition. Our methods achieve [3.97%, 1.48%] and [1.71%, 2.87%] improvement on average unweighted accuracy recall (UAR) on the three-class (low, middle, and high class) [valence, activation (arousal)] emotion recognition task for four different raters on the IEMOCAP and the NNIME databases, respectively. Further analyses show that learning from the soft label of co-raters provides the most robust accuracy even without obtaining the target rater’s labels. By simply adding 50% of a target raters annotation, our framework performance mostly surpasses the model trained with 100% of raters annotations.
Figures
An illustration ofper-rater’s emotion perception recognition model (L, H, and M means that low, middle, and high class).
An illustration ofper-rater’s emotion perception recognition model (L, H, and M means that low, middle, and high class).
Keywords
Speech Emotion Recognition (SER) | rater perception | BLSTM-DNN | soft label learning
Authors
Huang-Cheng Chou Chi-Chun Lee
Publication Date
2020/10/25
Conference
Interspeech 2020
Interspeech 2020
DOI
10.21437/Interspeech.2020-1714
Publisher
ISCA