Abstract
In this paper we aim to tackle the Cold sub-challenge proposed in the INTERSPEECH 2017 ComParE Challenge. The goal is to determine whether given speech is under cold condition. In this paper we present two frameworks. One of them is based on an alternative neural network-based autoencoder using two different loss functions. The first one is the standard reconstruction error used in unsupervised autoencoder, and the hinge loss (second loss function) is incorporated into the middle layer to attract utterances spoken by the same condition into similar identity code spaces. The classification is then carried out by comparing the cosine similarity of identity codes between the target and the mean of cold and non-cold utterances. With a simple logistic regression combining our method and the baseline systems predictions, we achieve 65.81% and 66% UAR on development set and test set provided by 2017 ComParE, respectively. Another approach is based on strength modeling, where diverse classifiers' confidence outputs are concatenated to original feature space as input to the support vector machine. The feature representations are derived from multiple sub-dictionary within the framework of GMM Fisher-vector encoding and eGeMAPS functional features concatenating with diverse classifiers. We achieve 70.2% and 65.5% on development and test set provided by 2017 ComPareE, respectively.