RESEARCH

HOME RESEARCH
Behavior Computing
Other: Signal Modeling for Understanding
States and Traits
Speech and Language
Enhancement of Automatic Oral Presentation Assessment System Using Latent N-Grams Word Representation and Part-of-Speech Information
Abstract
The development of an automatic oral presentation assessment system is important for the educational researchers to assess and train the communication ability of school leaders. In this work, we aim at enhancing the performance of the existing pre-service school principals' presentation scoring system by including lexical information as an additional modality. We propose to use latent n-grams distributed word representations and weighted counts of part-of-speech tag to derive features from the speech transcripts in the National Academy for Educational Research (NAER) oral presentation database. We carry out two different experiments: Exp I is a binary classification task between high versus low performing speech, and Exp II is a continuous scoring on the entire dataset. In Exp I, the proposed framework achieves a competitive accuracy of 0.79, and in Exp II, by fusing this text-based system to the existing audio-video based system, we obtain a spearman correlation of 0.641 (18.05% relative improvement). The two experiments demonstrate the modeling power of our proposed framework and signify the substantial complementary information in the lexical modality while assessing the quality of an oral presentation.
Figures
It depicts the complete workflow of our lexical feature extractions: the top portion shows the part-of-speech based feature extraction, and the bottom portion shows the latent n-grams distributed word representations. We further present their accuracies in the binary classification experiment and the continuous scoring assessment in this work.
It depicts the complete workflow of our lexical feature extractions: the top portion shows the part-of-speech based feature extraction, and the bottom portion shows the latent n-grams distributed word representations. We further present their accuracies in the binary classification experiment and the continuous scoring assessment in this work.
Keywords
behavioral signal processing | multimodal signal processing | educational research | natural language processing
Authors
Chi-Chun Lee
Publication Date
2016/09/08
Conference
Interspeech
Interspeech 2016
DOI
10.21437/Interspeech.2016-400
Publisher
ISCA