Abstract
Emotion recognition in conversation (ERC) is an increasingly important topic as it improves user experiences when adopting speech technology in our daily life. In this work, we propose an emotion-shift aware decoder based on formulation of conditional random field (CRF) to address the perennial issue of poor performances when handling emotion shift in dialogues. We conduct speech emotion recognition experiments on the IEMOCAP and the NNIME and achieve a 74.47% unweighted accuracy, which is the current state-of-the-art performance in the four class emotion recognition on the IEMOCAP. This is also the first work for ERC on the NNIME that obtains an outstanding performance of 61.02% weighted accuracy.