Abstract
The uncertainty in the modeling of emotions makes speech emotion recognition (SER) systems less reliable. An intuitive way to increase trust in SER is to reject predictions with low confidence. This approach assumes that an SER system is well calibrated, where high confident predictions are often right and low confident predictions are often wrong. Hence, it is desir- able to calibrate the confidence of SER classifiers. We eval- uate the reliability of SER systems by exploring the relation- ship between confidence and accuracy, using the expected cal- ibration error (ECE) metric. We develop a multi-label variant of the post-hoc temperature scaling (TS) method to calibrate SER systems, while preserving their accuracy. The best method combines an emotion co-occurrence weight penalty function, a class-balanced objective function, and the proposed multi-label TS calibration method. The experiments show the effectiveness of our developed multi-label calibration method in terms of ac- curacy and ECE.