Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features｜BIIC Lab - NTHU

Spoken Dialogs

Multimodal Model

Deriving Dyad-Level Interaction Representation Using Interlocutors Structural and Expressive Multimodal Behavior Features

Download PDF ResearchGate

Abstract

The overall interaction atmosphere is often a result of complex interplay between individual interlocutor's behavior expressions and joint manifestation of dyadic interaction dynamics. There is very limited work, if any, that has computationally analyzed a human interaction at the dyad-level. Hence, in this work, we propose to compute an extensive novel set of features representing multi-faceted aspects of a dyadic interaction. These features are grouped into two broad categories: expressive and structural behavior dynamics, where each captures information about within-speaker behavior manifestation, interspeaker behavior dynamics, durational and transitional statistics providing holistic behavior quantifications at the dyad-level. We carry out an experiment of recognizing targeted affective atmosphere using the proposed expressive and structural behavior dynamics features derived from audio and video modalities. Our experiment shows that the inclusion of both expressive and structural behavior dynamics is essential in achieving promising recognition accuracies across six different classes (72.5%), where structural-based features improve the recognition rates on classes of sad and surprise. Further analyses reveal important aspects of multimodal behavior dynamics within dyadic interactions that are related to the affective atmospheric scene.

Figures

A schematic of our complete multimodal structural and expressive features. The framework involves two steps: 1) preprocessing to assign each audio and video frame as one ofthe three distinct states, and 2) computing structural and expressive features to capture aspects on individual speaker’s behavioral manifestation, inter-speaker behavioral dynamics, durational and transitional statistics.

Keywords

affect recognition ｜ face-to-face interaction ｜ multimodal behaviors ｜ dyad-level affect

Authors

Publication Date

2017/08/20

Conference

Interspeech 2017

DOI

10.21437/Interspeech.2017-569

Publisher

RESEARCH

Related Research