RESEARCH

HOME RESEARCH
Behavior Computing
Spoken Dialogs
Small Group
Speaking State Decoder with Transition Detection for Next Speaker Prediction
Abstract
Next speaker prediction and turn change prediction are two im- portant tasks in group interaction and human-agent interaction. In order to carry out a fluent conversation, we need to identify who is currently speaking, who is the next speaker and when the next speaker starts to speak. These questions are computa- tionally designed as the task of next speaker prediction. Behav- iors such as gaze direction, speaking prosody or gestures have been modeled to perform this task. In this work, we propose a decoder-based speaking state decoder (SSD) for next speaker prediction, which jointly considers current behavior features, past history of talking and speaking state transition detection model. Our decoder approach achieves next speaker prediction with UAR of 78.11%, which is 3.41% improvement over the champion model in MultiMediate challenge 2021.
Figures
Structure of speaking state decoder (SSD).
Structure of speaking state decoder (SSD).
Keywords
next speaker prediction | transition detection | at-tention mechanism | decoder
Authors
Chi-Chun Lee
Publication Date
2023/08/22
Conference
Interspeech
Interspeech 2023
Publisher
ISCA