Learning Enhanced Acoustic Latent Representation for Small Scale Affective Corpus with Adversarial Cross Corpora Integration｜BIIC Lab - NTHU

Speech and Language

Learning Enhanced Acoustic Latent Representation for Small Scale Affective Corpus with Adversarial Cross Corpora Integration

Download PDF IEEE Xplore

Abstract

Achieving robust cross contexts speech emotion recognition (SER) has become a critical next direction of research for wide adoption of SER technology. The core challenge is in the large variability of affective speech that is highly contextualized. Prior works have worked on this as a transfer learning problem that mostly focuses on developing domain adaptation strategy. However, many of the existing speech emotion corpora, even those considered as large scale, are still limited in size resulting in an unsatisfactory transfer result. On the other hand, directly collecting context-specific corpus often results in an even smaller data size leading to an inevitably non-robust accuracy. In order to mitigate this issue, we propose the concept of enhancing the affect-related variability when learning the in-context acoustic latent representation by integrating out-of-context emotion data. Specifically, we utilize adversarial autoencoder network as our backbone with multiple out-of-context emotion labels derived for each in-context samples that serve as an auxiliary constraint in learning the latent representation. We extensively evaluate our framework using three in-context databases with three out-of-context databases. In this work, we demonstrate not only an improved recognition accuracy but also a comprehensive analysis on the effectiveness of this representation learning strategy.

Figures

Emotion-enriched adversarially acoustic latent representations for in-context emotion data learned by leveraging out-of-context emotion corpora and trained with a neural network as a classifier.

Keywords

speech emotion recognition ｜ adversarial network ｜ acoustic representation ｜ cross corpus learning

Authors

Publication Date

2021/11/15

Journal

IEEE Transactions on Affective Computing

DOI

10.1109/TAFFC.2021.3126145

Publisher

RESEARCH

Related Research