An analysis of the relationship between signal-derived vocal arousal score and human emotion production and perception

States and Traits

Speech and Language

Other: Signal Modeling for Understanding

Download PDF Google Scholar

Abstract

Bone et al. recently proposed an unsupervised signal-derived vocal arousal score (VC-AS) based on fusion of three intuitive acoustic features, i.e., pitch, intensity, and HF500, and have shown the effectiveness of quantifying human perceptual ratings of arousal robustly across multiple corpora. Due to the readily-applicable nature of the system, this objective quantification scheme could foresee-ably be used in multiple fields of behavioral science as an objective measure of affect. In this work, we investigate in detail the relationship of this signalderived measure to both intended arousal expression (i.e., production aspect) and perceived arousal rating (i.e., perception aspect). On the perception side, our results on three databases (EMA, VAM, and IEMOCAP) indicate that VC-AS agrees with mean perception at least as well as an average individual rater does. Regarding production, we observe that intended arousal correlates more with VC-AS than mean perception (EMA and IEMOCAP), and that VC-AS correlates more with intended arousal than perceived arousal (EMA); these findings are surprising given that the framework is motivated by extensive affective perception studies, although there is physiological backing. Implications for the use ofVC-AS for novel scientific study (e.g., to mitigate subjectivity) is further discussed.

Figures

Arousal rating system diagram showing progression from raw data (utterance ‘j’), to features, to individual feature scores, and finally to fused score pj.

Keywords

vocal arousal rating ｜ affective perception ｜ affective production

Authors

Publication Date

2015/09/06

Conference

Interspeech 2015

DOI

10.21437/Interspeech.2015-325

Publisher

RESEARCH

Related Research