RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
States and Traits
Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network
Abstract
The manner that human encodes emotion information within an utterance is often complex and could result in a diverse salient acoustic profile that is conditioned on emotion types. In this work, we propose a framework in imposing a graph attention mechanism on gated recurrent unit network (GA-GRU) to improve utterance-based speech emotion recognition (SER). Our proposed GA-GRU combines both long-range time-series based modeling of speech and further integrates complex saliency using a graph structure. We evaluate our proposed GA-GRU on the IEMOCAP and the MSP-IMPROV database and achieve a 63.8% UAR and 57.47% UAR in a four class emotion recognition task. The GA-GRU obtains consistently better performances as compared to recent state-of-art in per-utterance emotion classification model, and we further observe that different emotion categories would require distinct flexible structures in modeling emotion information in the acoustic data that is beyond conventional left-to-right or vice versa.
Figures
Architecture ofGA-GRU, the input is 78 dimension Emobase LLDs and graph is built by the hidden output ofBi-GRU; After graph attention mechanism, the representation will be classified into 4 categories.
Architecture ofGA-GRU, the input is 78 dimension Emobase LLDs and graph is built by the hidden output ofBi-GRU; After graph attention mechanism, the representation will be classified into 4 categories.
Keywords
speech emotion recognition | graph | attention mechanism | recurrent neural network
Authors
Bo-Hao Su Chun-Min Chang Chi-Chun Lee
Publication Date
2020/10/25
Conference
Interspeech
Interspeech 2020
DOI
10.21437/Interspeech.2020-1733
Publisher
ISCA