Toward Automating Oral Presentation Scoring During Principal Certification Program Using Audio-Video Low-Level Behavior Profiles｜BIIC Lab - NTHU

States and Traits

Other: Signal Modeling for Understanding

Toward Automating Oral Presentation Scoring During Principal Certification Program Using Audio-Video Low-Level Behavior Profiles

Download PDF IEEE Xplore

Abstract

Effective leadership bears strong relationship to attributes of emotion contagion, positive mood, and social intelligence. In fact, leadership quality has been shown to be manifested in the exhibited communicative behaviors, especially in settings of public speaking. While studies on the theories of leadership has received much attention, little has progressed in terms of the computational development in its measurements. In this work, we present a behavioral signal processing (BSP) research to assess the qualities of oral presentations in the domain of education, in specific, we propose a multimodal framework toward automating the scoring process of pre-service school principals’ oral presentations given at the yearly certification program. We utilize a dense unit-level audio-video feature extraction approach with session-level behavior profile representation techniques based on bag-of-word and Fisher-vector encoding. Furthermore, we design a scoring framework, inspired by the psychological evidences of human’s decision-making mechanism, to use confidence measures outputted from support vector machine classifier trained on the distinctive set of data samples as the regressed scores. Our proposed approach achieves an absolute improvement of 0.049 (9.8 percent relative) on average over support vector regression. We further demonstrate that the framework is reliable and consistent compared to human experts.

Figures

It shows a complete diagram of our computational framework and experimentation in this work: dense unit-level audio-video feature extractions is performed on the raw audio-video recordings, and the k-means bag-of-word (BOW) and Fisher-vector (FV) encoding methods map the varying-length sequences of audio-video feature vectors to a single-fixed length vector as the behavior profile at the speech-level. Then, we carry out automatic scoring of the 10 ratings, i.e., 2 (original & rank-normalized) 5 dimensions of interest.

Keywords

Behavioral signal processing (BSP) ｜ oral presentation ｜ multimodal signal processing ｜ educational research

Authors

Publication Date

2017/09/07

Journal

IEEE Transactions on Affective Computing (Volume 10)

DOI

10.1109/taffc.2017.2749569

Publisher

RESEARCH

Related Research