Encoding Individual Acoustic Features Using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition｜BIIC Lab - NTHU

States and Traits

Speech and Language

Encoding Individual Acoustic Features Using Dyad-Augmented Deep Variational Representations for Dialog-level Emotion Recognition

Download PDF ResearchGate

Abstract

Face-to-face dyadic spoken dialog is a fundamental unit of human interaction. Despite numerous empirical evidences in demonstrating interlocutor’s behavior dependency in dyadic interactions, few technical works exist in leveraging the unique pattern of dynamics in task of advancing emotion recognition during face-to-face settings. In this work, we propose a framework of encoding an individual’s acoustic features with dyadaugmented deep networks. The dyad-augmented deep networks includes a general variational deep Gaussian Mixture embedding network and a dyad-specific fine-tuned network. Our framework utilizes the augmented dyad-specific feature space to incorporate the unique behavior pattern emerged when two people interact. We perform dialog-level emotion regression tasks in both the CreativeIT and the NNIME databases. We obtain affect regression accuracy of 0.544 and 0.387 for activation and valence in the CreativeIT database (a relative improvement of 4.41% and 4.03% compared to using features without augmenting the dyad-specific representation), and we obtain 0.700 and 0.604 (4.48% and 4.14% relative improvement) for regressing activation and valence in the NNIME database.

Figures

This is the overall framework for an individual’s dialog-level emotion recognition. We first extract low-level descriptors. Then, the LLDs are encoded using two networks ofgeneral VaDE and dyad-specific VaDE. General representation acts as a behavior representation learned from the entire database while dyad-specific representation embeds dyadic interaction dynamics.

Keywords

variational deep embedding ｜ dyadic interaction ｜ emotion recognition ｜ feature augmentation ｜ frozen fine-tuning

Authors

Publication Date

2018/09/02

Conference

Interspeech 2018

DOI

10.21437/Interspeech.2018-1455

Publisher

RESEARCH

Related Research