Research into understanding humor has been investigated over centuries. It has recently attracted various technical effort in computing humor automatically from data, especially for humor in speech. Comprehension on the same speech and the ability to realize a humor event vary depending on each individual audience's background and experience. Most previous works on automatic humor detection or impression recognition mainly model the produced textual content only without considering audience responses. We collect a corpus of TED Talks including audience comments for each of the presented TED speech. We propose a novel network architecture that considers the natural entanglement between speech transcripts and user's online feedbacks as an integrative graph structure, where the content speech and online feedbacks are nodes where the edges are connected though their common words. Our model achieves 61.2% of accuracy in a threeclass classification on humor impression recognition on TED talks; our experiments further demonstrate viewers comments are essential in improving the recognition tasks, and a joint content-comment modeling achieves the best recognition.