RESEARCH

HOME RESEARCH
Multimedia Modeling
Multimodal Model
States and Traits
Speech and Language
Attentive to Individual: A Multimodal Emotion Recognition Network with Personalized Attention Profile
Abstract
A growing number of human-centered applications benefit from continuous advancements in the emotion recognition technology. Many emotion recognition algorithms have been designed to model multimodal behavior cues to achieve high performances. However, most of them do not consider the modulating factors of an individual’s personal attributes in his/her expressive behaviors. In this work, we propose a Personalized Attributes-Aware Attention Network (PAaAN) with a novel personalized attention mechanism to perform emotion recognition using speech and language cues. The attention profile is learned from embeddings of an individual’s profile, acoustic, and lexical behavior data. The profile embedding is derived using linguistics inquiry word count computed between the target speaker and a large set of movie scripts. Our method achieves the stateof-the-art 70.3% unweighted accuracy in a four class emotion recognition task on the IEMOCAP. Further analysis reveals that affect-related semantic categories are emphasized differently for each speaker in the corpus showing the effectiveness of our attention mechanism for personalization.
Figures
This is the overall PAaAN framework. We compute dot product ofeach target speaker’s LIWC features with a large speaker set ofmovie scripts to project the target speaker into a personal profile space.
This is the overall PAaAN framework. We compute dot product ofeach target speaker’s LIWC features with a large speaker set ofmovie scripts to project the target speaker into a personal profile space.
Keywords
personal attribute | multimodal emotion recognition | attention | psycholinguistic norm
Authors
Jeng-Lin Li Chi-Chun Lee
Publication Date
2019/09/15
Conference
Interspeech 2019
Interspeech 2019
DOI
10.21437/Interspeech.2019-2044
Publisher
ISCA