Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities｜BIIC Lab - NTHU

States and Traits

Other: Signal Modeling for Understanding

Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities

Download PDF IEEE Xplore

Abstract

Speech emotion recognition (SER) is an important research area, with direct impacts in applications of our daily lives, spanning education, health care, security and defense, entertainment, and human–computer interaction. The advances in many other speech signal modeling tasks, such as automatic speech recognition, text-to-speech synthesis, and speaker identification, have led to the current proliferation of speech-based technology. Incorporating SER solutions into existing and future systems can take these voice-based solutions to the next level. Speech is a highly nonstationary signal, with dynamically evolving spatial-temporal patterns. It often requires a sophisticated representation modeling framework to develop algorithms capable of handling real-life complexities.

Figures

An overview of a deep representation learning scheme for SER, and the three real-world modeling challenges: robustness, generalization, and usability.

Keywords

speech emotion recognition ｜ deep representation learning ｜ robustness ｜ generalization ｜ usability

Authors

Publication Date

2021/10/27

Journal

IEEE Signal Processing Magazine

DOI

10.1109/MSP.2021.3105939

Publisher

RESEARCH

Related Research