Speaking State Decoder with Transition Detection for Next Speaker Prediction｜BIIC Lab - NTHU

Small Group

Speaking State Decoder with Transition Detection for Next Speaker Prediction

Download PDF

Abstract

Next speaker prediction and turn change prediction are two im- portant tasks in group interaction and human-agent interaction. In order to carry out a fluent conversation, we need to identify who is currently speaking, who is the next speaker and when the next speaker starts to speak. These questions are computa- tionally designed as the task of next speaker prediction. Behav- iors such as gaze direction, speaking prosody or gestures have been modeled to perform this task. In this work, we propose a decoder-based speaking state decoder (SSD) for next speaker prediction, which jointly considers current behavior features, past history of talking and speaking state transition detection model. Our decoder approach achieves next speaker prediction with UAR of 78.11%, which is 3.41% improvement over the champion model in MultiMediate challenge 2021.

Figures

Structure of speaking state decoder (SSD).

Keywords

next speaker prediction ｜ transition detection ｜ at-tention mechanism ｜ decoder

Authors

Publication Date

2023/08/22

Conference

Interspeech 2023

Publisher

RESEARCH

Related Research