Abstract
The rich personal information contained in speech signal can lead to privacy leakage and unfair prediction for speech based technology. In this work, we propose a feature-scoring variational autoencoder (FS-VAE) to handle these issues by performing attribute alignment for speech representation learning. FS-VAE performs attribute alignment by using attention-based scoring machines guided by two additional penalty terms. After obtaining the attribute-aligned representation, we can then choose and mask the nodes containing specific attribute of interest based on the requirement in the downstream tasks. We evaluate our methods on tasks of PP-SER (identity-free emotion recognition) and PP-SV (emotion-less speaker verification). Our proposed method achieves better utility maintenance and competitive privacy protection compared to the most recent attribute-aligned representation learning method.