Movie watching is one of the most popular leisure activities in our daily life. The box office revenue, especially in the first week, is critical for financial planning in the movie industry. Most existing movie box office prediction relies on meta data, viewer’s comments, and trailer content. However, when viewers are immersed in a movie experience, they would naturally manifest expressions invoked by the media content. In this work, we propose a novel movie box office prediction framework by joint modeling meta attributes, trailer content, and viewer’s natural expressions gathered from YouTube reactor videos. The proposed network learns a discriminabilityenhanced content and expression embeddings using a minimal intra-genre distance loss function. The proposed architecture achieves 79.07%, 73.79%, and 76.82% for low/high movie box office tier classification (top 30%, top 10%, and top 5%) on a large scale trailer-reactor database. Furthermore, we provide an analysis on the effectiveness of viewer’s reaction and our intra-genre projection.