A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program｜BIIC Lab - NTHU

States and Traits

A multimodal approach for automatic assessment of school principals' oral presentation during pre-service training program

Download PDF ResearchGate

Abstract

Developing automatic recognition systems of subjective rating using behavior data, collected using audio-video recording devices, has been at the forefront of many interdisciplinary research effort between behavior science and engineering in order to provide objective decision-making tools. In the field of education, pre-service training program for school principals has becoming more critical due to the increasingly complex and demanding nature of the job. In this work, we collaborate with researchers from the National Academy for Educational Research to develop a system in order to assess pre-service principals’ oral presentation skill. Our recognition framework incorporates multimodal behavioral data, i.e., audio and video information. With proper handling of label normalization and binarization, we achieve an unweighted average recall of (0.63, 0.70, 0.67) or (0.67, 0.68, 0.67) depending on the choice of labeling schemes, i.e., original or rank-normalized, on differentiating between high versus low performing scores. The three oral presentation rating dimensions used in this work are Dim1: content + structure + word, Dim2: prosody, Dim3: total score.

Figures

Our experimental setup: the raw recording is first manually-segmented into utterances and each utterance is run through audio and video feature extractor component. Video-only system is trained on individual utterances, and audio-only system is trained on entire speech by utilizing second stage statistical functional computation. Classifier of choice is support vector machine, and the multimodal fusion is done by training logistic regression on the decision scores of each modality.

Keywords

behavioral signal processing (BSP) ｜ oral presentation ｜ multimodal signal processing ｜ education research

Authors

Publication Date

2015/09/06

Conference

Interspeech 2015

DOI

10.21437/Interspeech.2015-545

Publisher

RESEARCH

Related Research