Abstract
A growing number of human-centered applications benefit from continuous advancements in the emotion recognition technology. Many emotion recognition algorithms have been designed to model multimodal behavior cues to achieve high performances. However, most of them do not consider the modulating factors of an individual’s personal attributes in his/her expressive behaviors. In this work, we propose a Personalized Attributes-Aware Attention Network (PAaAN) with a novel personalized attention mechanism to perform emotion recognition using speech and language cues. The attention profile is learned from embeddings of an individual’s profile, acoustic, and lexical behavior data. The profile embedding is derived using linguistics inquiry word count computed between the target speaker and a large set of movie scripts. Our method achieves the stateof-the-art 70.3% unweighted accuracy in a four class emotion recognition task on the IEMOCAP. Further analysis reveals that affect-related semantic categories are emphasized differently for each speaker in the corpus showing the effectiveness of our attention mechanism for personalization.