RESEARCH

HOME RESEARCH
Behavior Computing
States and Traits
Speech and Language
A Dialogical Emotion Decoder for Speech Emotion Recognition in Spoken Dialog
Abstract
Developing a robust emotion speech recognition (SER) system for human dialog is important in advancing conversational agent design. In this paper, we proposed a novel inference algorithm, a dialogical emotion decoding (DED) algorithm, that treats a dialog as a sequence and consecutively decode the emotion states of each utterance over time with a given recognition engine. This decoder is trained by incorporating intra- and inter-speakers emotion influences within a conversation. Our approach achieves a 70.1% in four class emotion on the IEMOCAP database, which is 3% over the state-of-art model. The evaluation is further conducted on a multi-party interaction database, the MELD, which shows a similar effect. Our proposed DED is in essence a conversational emotion rescoring decoder that can also be flexibly combined with different SER engines.
Figures
An illustration of dialogical emotion decoder: including both emotion assignment based on prior predicted sequence and emotion shift that models the probabilistic change of emotion states over time.
An illustration of dialogical emotion decoder: including both emotion assignment based on prior predicted sequence and emotion shift that models the probabilistic change of emotion states over time.
Keywords
speech emotion recognition | conversation | dialogical emotion decoder
Authors
Publication Date
2020/05/04
Conference
IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
DOI
10.1109/icassp40776.2020.9053561
Publisher
IEEE