RESEARCH

HOME RESEARCH
Behavior Computing
Spoken Dialogs
States and Traits
Monologue versus Conversation: Differences in Emotion Perception and Acoustic Expressivity
Abstract
Advancing speech emotion recognition (SER) de- pends highly on the source used to train the model, i.e., the emotional speech corpora. By permuting different design param- eters, researchers have released versions of corpora that attempt to provide a better-quality source for training SER. In this work, we focus on studying communication modes of collection. In particular, we analyze the patterns of emotional speech collected during interpersonal conversations or monologues. While it is well known that conversation provides a better protocol for elic- iting authentic emotion expressions, there is a lack of systematic analyses to determine whether conversational speech provide a “better-quality” source. Specifically, we examine this research question from three perspectives: perceptual differences, acoustic variability and SER model learning. Our analyses on the MSP- Podcast corpus show that: 1) rater’s consistency for conversation recordings is higher when evaluating categorical emotions, 2) the perceptions and acoustic patterns observed on conversations have properties that are better aligned with expected trends discussed in emotion literature, and 3) a more robust SER model can be trained from conversational data. This work brings initial evidences stating that samples of conversations may provide a better-quality source than samples from monologues for building a SER model.
Figures
Scatter plot for the categorical samples from Mono and Conv in valence-arousal (V-A) plane; each panel corresponds to a different categorical emotion and each quadrant also shows the occupancy rate for Mono (M) and Conv (C).
Scatter plot for the categorical samples from Mono and Conv in valence-arousal (V-A) plane; each panel corresponds to a different categorical emotion and each quadrant also shows the occupancy rate for Mono (M) and Conv (C).
Keywords
speech emotion recognition | emotion perception | acoustic expression | conversation | monologue
Authors
Woan-Shiuan Chien Shreya G Upadhyay Ya-Tse Wu Bo-Hao Su Chi-Chun Lee
Publication Date
2022/10/18
Conference
ACII
2022 10th International Conference on Affective Computing and Intelligent Interaction (ACII)
DOI
10.1080/02699931.2018.1454403
Publisher
IEEE