Multimodal arousal rating using unsupervised fusion technique｜BIIC Lab - NTHU

States and Traits

Speech and Language

Multimodal arousal rating using unsupervised fusion technique

Download PDF IEEE Xplore

Abstract

Arousal is essential in understanding human behavior and decision-making. In this work, we present a multimodal arousal rating framework that incorporates minimal set of vocal and non-verbal behavior descriptors. The rating framework and fusion techniques are unsupervised in nature to ensure that it can be readily-applicable and interpretable. Our proposed multimodal framework improves correlation to human judgment from 0.66 (vocal-only) to 0.68 (multimodal); analysis shows that the supervised fusion framework does not improve correlation. Lastly, an interesting empirical evidence demonstrates that the signal-based quantification of arousal achieves a higher agreement with each individual rater than the agreement among raters themselves. This further strengthens that machine-based rating is a viable way of measuring subjective humans’ internal states through observing behavior features objectively.

Figures

A work flow on computing arousal by fusin intra- (within) and inter- (cross) modalities. b) a depiction on the evaluation setup of experiment I, II and experiment III, where experiment I, II assess the correlation of the framework with the average human raters, and experiment III assesses the correlation between the framework and individual rater

Keywords

behavioral signal processing ｜ affective computing ｜ arousal rating ｜ multimodal signal processing

Authors

Publication Date

2015/04/19

Conference

2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

DOI

10.1109/icassp.2015.7178982

Publisher

RESEARCH

Related Research