RESEARCH

HOME RESEARCH
Multimedia Modeling
Multimodal Model
Other: Media Processing for Interpretation
Through the Eyes of Viewers: A Comment-Enhanced Media Content Representation for TED Talks Impression Recognition
Abstract
Developing computational frameworks for personalized content query and recommendation has sparked numerous research into automatic indexing and retrieval of multimedia data. Assessing viewer impression as an appropriate index of media content is especially important as it links directly to the audience preferences toward media content. Most of the existing machine learning frameworks rely on modeling the media contents solely without considering the potential usefulness of user feedback in order to assess the viewer impressions. In this work, we develop a cross-modal network that projects the multimodal media content through the viewer’s comment space in order to learn a joint (content and viewer) embedding space to perform viewer impression recognition. Specifically, we gather a large corpus of TED talks including viewer’s online comments for each of the presentation video. Our proposed crossmodal projection network achieves 80.8%, 79.5%, and 80.8% of unweighted average recall (UAR) in binary classification tasks for three different viewer impression ratings (i.e., inspiring, persuasive, and funny, respectively). Our experiments demonstrate intuitively that online user comments reflect the viewer impression the most, but an interesting finding shows that it is important to project the content’s information into the user comment space, i.e., through the eyes of the comment, in order to obtain an improved recognition accuracy as compared to simply concatenating content and comment features directly.
Figures
We extract features from audio, transcript, and comments of TED talks, and then we apply cross-modal network to project content space onto comment space.
We extract features from audio, transcript, and comments of TED talks, and then we apply cross-modal network to project content space onto comment space.
Keywords
cross-modal projection | viewer impressions | TED talks
Authors
Publication Date
2019/11/18
Conference
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)
DOI
10.1109/apsipaasc47483.2019.9023066
Publisher
IEEE