RESEARCH

HOME RESEARCH
Behavior Computing
States and Traits
Other: Signal Modeling for Understanding
Multimodal Model
Modeling Perceivers Neural-Responses Using Lobe-Dependent Convolutional Neural Network to Improve Speech Emotion Recognition
Abstract
Developing automatic emotion recognition by modeling expressive behaviors is becoming crucial in enabling the next generation design of human-machine interface. Also, with the availability of functional magnetic resonance imaging (fMRI), researchers have also conducted studies into quantitative understanding of vocal emotion perception mechanism. In this work, our aim is two folds: 1) investigating whether the neuralresponses can be used to automatically decode the emotion labels of vocal stimuli, and 2) combining acoustic and fMRI features to improve the speech emotion recognition accuracies. We introduce a novel framework of lobe-dependent convolutional neural network (LD-CNN) to provide better modeling of perceivers neural-responses on vocal emotion. Furthermore, by fusing LD-CNN with acoustic features, we demonstrate an overall 63.17% accuracies in a four-class emotion recognition task (9.89% and 14.42% relative improvement compared to the acoustic-only and the fMRI-only features). Our analysis further shows that temporal lobe possess the most information in decoding emotion labels; the fMRI and the acoustic information are complementary to each other, where neural-responses and acoustic features are better at discriminating along the valence and activation dimensions, respectively.
Figures
A schematic ofmultimodal emotion recognition from audio (Fisher-vector feature representation) and fMRI (Lobe-depedent convolutional neural network-derived feature representation) data
A schematic ofmultimodal emotion recognition from audio (Fisher-vector feature representation) and fMRI (Lobe-depedent convolutional neural network-derived feature representation) data
Keywords
speech emotion recognition | convolutional neural network (CNN) | affective computing, fMRI
Authors
Ya-Tse Wu Hsuan-Yu Chen Chi-Chun Lee
Publication Date
2017/08/20
Conference
Interspeech
Interspeech 2017
DOI
10.21437/Interspeech.2017-562
Publisher
ISCA