Abstract
Developing automatic recognition systems of subjective rating using behavior data, collected using audio-video recording devices, has been at the forefront of many interdisciplinary research effort between behavior science and engineering in order to provide objective decision-making tools. In the field of education, pre-service training program for school principals has becoming more critical due to the increasingly complex and demanding nature of the job. In this work, we collaborate with researchers from the National Academy for Educational Research to develop a system in order to assess pre-service principals’ oral presentation skill. Our recognition framework incorporates multimodal behavioral data, i.e., audio and video information. With proper handling of label normalization and binarization, we achieve an unweighted average recall of (0.63, 0.70, 0.67) or (0.67, 0.68, 0.67) depending on the choice of labeling schemes, i.e., original or rank-normalized, on differentiating between high versus low performing scores. The three oral presentation rating dimensions used in this work are Dim1: content + structure + word, Dim2: prosody, Dim3: total score.