Through the Eyes of Viewers: A Comment-Enhanced Media Content Representation for TED Talks Impression Recognition｜BIIC Lab - NTHU

Multimodal Model

Other: Media Processing for Interpretation

Through the Eyes of Viewers: A Comment-Enhanced Media Content Representation for TED Talks Impression Recognition

Download PDF IEEE Xplore

Abstract

Developing computational frameworks for personalized content query and recommendation has sparked numerous research into automatic indexing and retrieval of multimedia data. Assessing viewer impression as an appropriate index of media content is especially important as it links directly to the audience preferences toward media content. Most of the existing machine learning frameworks rely on modeling the media contents solely without considering the potential usefulness of user feedback in order to assess the viewer impressions. In this work, we develop a cross-modal network that projects the multimodal media content through the viewer’s comment space in order to learn a joint (content and viewer) embedding space to perform viewer impression recognition. Specifically, we gather a large corpus of TED talks including viewer’s online comments for each of the presentation video. Our proposed crossmodal projection network achieves 80.8%, 79.5%, and 80.8% of unweighted average recall (UAR) in binary classification tasks for three different viewer impression ratings (i.e., inspiring, persuasive, and funny, respectively). Our experiments demonstrate intuitively that online user comments reflect the viewer impression the most, but an interesting finding shows that it is important to project the content’s information into the user comment space, i.e., through the eyes of the comment, in order to obtain an improved recognition accuracy as compared to simply concatenating content and comment features directly.

Figures

We extract features from audio, transcript, and comments of TED talks, and then we apply cross-modal network to project content space onto comment space.

Keywords

cross-modal projection ｜ viewer impressions ｜ TED talks

Authors

Publication Date

2019/11/18

Conference

2019 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC)

DOI

10.1109/apsipaasc47483.2019.9023066

Publisher

RESEARCH

Related Research