Abstract
Understanding the underlying neuro-perceptual mechanism of humans’ ability to decode emotional content in vocal signal is an important research direction. In this paper, we describe our initial research effort into quantitatively modeling the joint dynamics between measures of vocal arousal and blood oxygen level-dependent (BOLD) signals. We utilize Gaussian mixture regression approach to predict the invoked BOLD signal response as the subject is exposed to various levels of continuous vocal arousal stimuli. The proposed framework is built upon measures of vocal arousal from acoustically-derived features, and we obtain a reasonable predictive correlation to the true BOLD signal for the seven emotionally-related brain regions. Further experiment also demonstrates that there exists a more explanatory power of using signal-derived arousal measure to the internal BOLD signal responses compared to using human annotated arousal in the construction of Gaussian mixture regression modeling.