An Enroll-to-Verify Approach for Cross-Task Unseen Emotion Class Recognition｜BIIC Lab - NTHU

Speech and Language

An Enroll-to-Verify Approach for Cross-Task Unseen Emotion Class Recognition

Download PDF IEEE Xplore

Abstract

Most speech emotion recognition studies often focus on recognizing pre-set emotion classes. However, the task definition may change due to a shift in focus to a previously unseen class in real-world applications. This cross-task modeling has not been addressed previously. Lengthy data re-collection, model retraining, and the traditional adaptation and transfer learning approaches are not applicable to this cross-task setting. This study proposes an enroll-to-verify framework to avoid model retraining and rapidly perform a new task prediction using only a handful of enrolled samples. Specifically, we use negative angular margin prototypical loss in a pretrained multiclass network as an emotion encoder. Then, we enroll a few samples corresponding to emotion classes in the new task definition and simply compare the encoded embedding distance to perform recognition. In the experiments on the IEMOCAP dataset, given a four-class pretrained emotion encoder, we achieved a 71.9% unweighted average recall in the frustration (unseen) recognition task. The MELD dataset was used where the unseen class was surprise, fear, or disgust. The results revealed that enrolling only 20 samples without retraining was comparable to supervised training using the complete dataset. Further analyses were conducted to demonstrate the working mechanism of our proposed enroll-to-verify approach.

Figures

This is the overall enroll-to-verify approach for cross-task emotion recognition.

Keywords

negative margin ｜ prototypical loss ｜ unseen class ｜ cross-task modeling

Authors