RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
States and Traits
Other: Signal Modeling for Understanding
Deep Representation Learning for Affective Speech Signal Analysis and Processing: Preventing unwanted signal disparities
Abstract
Speech emotion recognition (SER) is an important research area, with direct impacts in applications of our daily lives, spanning education, health care, security and defense, entertainment, and human–computer interaction. The advances in many other speech signal modeling tasks, such as automatic speech recognition, text-to-speech synthesis, and speaker identification, have led to the current proliferation of speech-based technology. Incorporating SER solutions into existing and future systems can take these voice-based solutions to the next level. Speech is a highly nonstationary signal, with dynamically evolving spatial-temporal patterns. It often requires a sophisticated representation modeling framework to develop algorithms capable of handling real-life complexities. 
Figures
An overview of a deep representation learning scheme for SER, and the three real-world modeling challenges: robustness, generalization, and usability.
An overview of a deep representation learning scheme for SER, and the three real-world modeling challenges: robustness, generalization, and usability.
Keywords
speech emotion recognition | deep representation learning | robustness | generalization | usability
Authors
Chi-Chun Lee Jeng-Lin Li Bo-Hao Su
Publication Date
2021/10/27
Journal
IEEE Signal Processing Society
IEEE Signal Processing Magazine
DOI
10.1109/MSP.2021.3105939
Publisher
IEEE