RESEARCH

HOME RESEARCH
Behavior Computing
States and Traits
Speech and Language
An Enroll-to-Verify Approach for Cross-Task Unseen Emotion Class Recognition
Abstract
Most speech emotion recognition studies often focus on recognizing pre-set emotion classes. However, the task definition may change due to a shift in focus to a previously unseen class in real-world applications. This cross-task modeling has not been addressed previously. Lengthy data re-collection, model retraining, and the traditional adaptation and transfer learning approaches are not applicable to this cross-task setting. This study proposes an enroll-to-verify framework to avoid model retraining and rapidly perform a new task prediction using only a handful of enrolled samples. Specifically, we use negative angular margin prototypical loss in a pretrained multiclass network as an emotion encoder. Then, we enroll a few samples corresponding to emotion classes in the new task definition and simply compare the encoded embedding distance to perform recognition. In the experiments on the IEMOCAP dataset, given a four-class pretrained emotion encoder, we achieved a 71.9% unweighted average recall in the frustration (unseen) recognition task. The MELD dataset was used where the unseen class was surprise, fear, or disgust. The results revealed that enrolling only 20 samples without retraining was comparable to supervised training using the complete dataset. Further analyses were conducted to demonstrate the working mechanism of our proposed enroll-to-verify approach.
Figures
This is the overall enroll-to-verify approach for cross-task emotion recognition.
This is the overall enroll-to-verify approach for cross-task emotion recognition.
Keywords
negative margin | prototypical loss | unseen class | cross-task modeling
Authors
Jeng-Lin Li Chi-Chun Lee
Publication Date
2022/06/14
Journal
IEEE TAFFC
IEEE Transactions on Affective Computing
DOI
10.1109/TAFFC.2022.3183166
Publisher
IEEE