Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network｜BIIC Lab - NTHU

States and Traits

Improving Speech Emotion Recognition Using Graph Attentive Bi-Directional Gated Recurrent Unit Network

Download PDF Google Scholar

Abstract

The manner that human encodes emotion information within an utterance is often complex and could result in a diverse salient acoustic profile that is conditioned on emotion types. In this work, we propose a framework in imposing a graph attention mechanism on gated recurrent unit network (GA-GRU) to improve utterance-based speech emotion recognition (SER). Our proposed GA-GRU combines both long-range time-series based modeling of speech and further integrates complex saliency using a graph structure. We evaluate our proposed GA-GRU on the IEMOCAP and the MSP-IMPROV database and achieve a 63.8% UAR and 57.47% UAR in a four class emotion recognition task. The GA-GRU obtains consistently better performances as compared to recent state-of-art in per-utterance emotion classification model, and we further observe that different emotion categories would require distinct flexible structures in modeling emotion information in the acoustic data that is beyond conventional left-to-right or vice versa.

Figures

Architecture ofGA-GRU, the input is 78 dimension Emobase LLDs and graph is built by the hidden output ofBi-GRU; After graph attention mechanism, the representation will be classified into 4 categories.

Keywords

speech emotion recognition ｜ graph ｜ attention mechanism ｜ recurrent neural network

Authors

Publication Date

2020/10/25

Conference

Interspeech 2020

DOI

10.21437/Interspeech.2020-1733

Publisher

RESEARCH

Related Research