RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
States and Traits
Other: Signal Modeling for Understanding
Toward Automating Oral Presentation Scoring During Principal Certification Program Using Audio-Video Low-Level Behavior Profiles
Abstract
Effective leadership bears strong relationship to attributes of emotion contagion, positive mood, and social intelligence. In fact, leadership quality has been shown to be manifested in the exhibited communicative behaviors, especially in settings of public speaking. While studies on the theories of leadership has received much attention, little has progressed in terms of the computational development in its measurements. In this work, we present a behavioral signal processing (BSP) research to assess the qualities of oral presentations in the domain of education, in specific, we propose a multimodal framework toward automating the scoring process of pre-service school principals’ oral presentations given at the yearly certification program. We utilize a dense unit-level audio-video feature extraction approach with session-level behavior profile representation techniques based on bag-of-word and Fisher-vector encoding. Furthermore, we design a scoring framework, inspired by the psychological evidences of human’s decision-making mechanism, to use confidence measures outputted from support vector machine classifier trained on the distinctive set of data samples as the regressed scores. Our proposed approach achieves an absolute improvement of 0.049 (9.8 percent relative) on average over support vector regression. We further demonstrate that the framework is reliable and consistent compared to human experts.
Figures
It shows a complete diagram of our computational framework and experimentation in this work: dense unit-level audio-video feature extractions is performed on the raw audio-video recordings, and the k-means bag-of-word (BOW) and Fisher-vector (FV) encoding methods map the varying-length sequences of audio-video feature vectors to a single-fixed length vector as the behavior profile at the speech-level. Then, we carry out automatic scoring of the 10 ratings, i.e., 2 (original & rank-normalized)   5 dimensions of interest.
It shows a complete diagram of our computational framework and experimentation in this work: dense unit-level audio-video feature extractions is performed on the raw audio-video recordings, and the k-means bag-of-word (BOW) and Fisher-vector (FV) encoding methods map the varying-length sequences of audio-video feature vectors to a single-fixed length vector as the behavior profile at the speech-level. Then, we carry out automatic scoring of the 10 ratings, i.e., 2 (original & rank-normalized) 5 dimensions of interest.
Keywords
Behavioral signal processing (BSP) | oral presentation | multimodal signal processing | educational research
Authors
Chi-Chun Lee
Publication Date
2017/09/07
Journal
IEEE Transactions on Affective Computing
IEEE Transactions on Affective Computing (Volume 10)
DOI
10.1109/taffc.2017.2749569
Publisher
IEEE