RESEARCH

HOME RESEARCH
Behavior Computing
States and Traits
Speech and Language
Learning Enhanced Acoustic Latent Representation for Small Scale Affective Corpus with Adversarial Cross Corpora Integration
Abstract
Achieving robust cross contexts speech emotion recognition (SER) has become a critical next direction of research for wide adoption of SER technology. The core challenge is in the large variability of affective speech that is highly contextualized. Prior works have worked on this as a transfer learning problem that mostly focuses on developing domain adaptation strategy. However, many of the existing speech emotion corpora, even those considered as large scale, are still limited in size resulting in an unsatisfactory transfer result. On the other hand, directly collecting context-specific corpus often results in an even smaller data size leading to an inevitably non-robust accuracy. In order to mitigate this issue, we propose the concept of enhancing the affect-related variability when learning the in-context acoustic latent representation by integrating out-of-context emotion data. Specifically, we utilize adversarial autoencoder network as our backbone with multiple out-of-context emotion labels derived for each in-context samples that serve as an auxiliary constraint in learning the latent representation. We extensively evaluate our framework using three in-context databases with three out-of-context databases. In this work, we demonstrate not only an improved recognition accuracy but also a comprehensive analysis on the effectiveness of this representation learning strategy.
Figures
Emotion-enriched adversarially acoustic latent representations for in-context emotion data learned by leveraging out-of-context emotion corpora and trained with a neural network as a classifier.
Emotion-enriched adversarially acoustic latent representations for in-context emotion data learned by leveraging out-of-context emotion corpora and trained with a neural network as a classifier.
Keywords
speech emotion recognition | adversarial network | acoustic representation | cross corpus learning
Authors
Chi-Chun Lee
Publication Date
2021/11/15
Journal
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing
DOI
10.1109/TAFFC.2021.3126145
Publisher
IEEE