Abstract
This paper presents an unsupervised method for producing a bounded rating of affective arousal from speech. One of the major challenges in such behavioral signal classification is the design of methods that generalize well across domains and datasets. We propose a framework that provides robustness across databases by: selecting coherent features based on empirical and theoretical evidence, fusing activation confidences from multiple features, and effectively weighting the soft-labels without knowing the true labels. Spearman's rankcorrelation (and binary classification accuracy) on four arousal databases are: 0.62 (73%), 0.77 (86%), 0.70 (82%), and 0.65 (73%).