Abstract
It is well known that human is not good at deception detection because of a natural inclination of truth-bias. However, during a conversation, when an interlocutor (interrogator) is being asked explicitly to assess whether his/her interacting partner (deceiver) is lying, this perceptual judgment depends highly on how the interrogator interprets the context of the conversation. While the deceptive behaviors can be difficult to model due to their heterogeneous manifestation, we hypothesize that this contextual information, i.e., whether the interlocutor trusts or distrusts what his/her partner is saying, provides an important condition in which the deceiver's deceptive behaviors are more consistently distinct. In this work, we propose a Judgmental-Enhanced Automatic Deception Detection Network (JEADDN) that explicitly considers interrogator's perceived truths-deceptions with three types of speechlanguage features (acoustic-prosodic, linguistic, and conversational temporal dynamics features) extracted during a conversation. We evaluate our framework on a large Mandarin Chinese Deception Dialog Database. The results show that the method significantly outperforms the current state-of-the-art approach without conditioning on the judgements of interrogators on this database. We further demonstrate that the behaviors of interrogators are important in detecting deception when the interrogators distrust the deceivers. Finally, with the late fusion of audio, text, and turntaking dynamics (TTD) features, we obtain promising results of 87.27% and 94.18% accuracy under the conditions that the interrogators trust and distrust the deceivers in deception detection which improves 7.27% and 13.57% than the model without considering the judgements of interlocutor respectively.