Abstract
The overall interaction atmosphere is often a result of complex interplay between individual interlocutor's behavior expressions and joint manifestation of dyadic interaction dynamics. There is very limited work, if any, that has computationally analyzed a human interaction at the dyad-level. Hence, in this work, we propose to compute an extensive novel set of features representing multi-faceted aspects of a dyadic interaction. These features are grouped into two broad categories: expressive and structural behavior dynamics, where each captures information about within-speaker behavior manifestation, interspeaker behavior dynamics, durational and transitional statistics providing holistic behavior quantifications at the dyad-level. We carry out an experiment of recognizing targeted affective atmosphere using the proposed expressive and structural behavior dynamics features derived from audio and video modalities. Our experiment shows that the inclusion of both expressive and structural behavior dynamics is essential in achieving promising recognition accuracies across six different classes (72.5%), where structural-based features improve the recognition rates on classes of sad and surprise. Further analyses reveal important aspects of multimodal behavior dynamics within dyadic interactions that are related to the affective atmospheric scene.