Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels｜BIIC Lab - NTHU

Learning to Recognize Per-Rater's Emotion Perception Using Co-Rater Training Strategy with Soft and Hard Labels

Download PDF ResearchGate

Abstract

An individual’s emotion perception plays a key role in affecting our decision-making and task performances. Previous speech emotion recognition research focuses mainly on recognizing the emotion label derived from the majority vote (hard label) of the speaker (i.e., producer) but not on recognizing per-rater’s emotion perception. In this work, we propose a framework that integrates different viewpoints of emotion perception from other co-raters (exclude target rater) using soft and hard label learning to improve target rater’s emotion perception recognition. Our methods achieve [3.97%, 1.48%] and [1.71%, 2.87%] improvement on average unweighted accuracy recall (UAR) on the three-class (low, middle, and high class) [valence, activation (arousal)] emotion recognition task for four different raters on the IEMOCAP and the NNIME databases, respectively. Further analyses show that learning from the soft label of co-raters provides the most robust accuracy even without obtaining the target rater’s labels. By simply adding 50% of a target raters annotation, our framework performance mostly surpasses the model trained with 100% of raters annotations.

Figures

An illustration ofper-rater’s emotion perception recognition model (L, H, and M means that low, middle, and high class).

Keywords

Speech Emotion Recognition (SER) ｜ rater perception ｜ BLSTM-DNN ｜ soft label learning

Authors