An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning｜BIIC Lab - NTHU

Speech and Language

Other: Signal Modeling for Understanding

An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning

Download PDF

Abstract

The rich personal information contained in speech signal can lead to privacy leakage and unfair prediction for speech based technology. In this work, we propose a feature-scoring variational autoencoder (FS-VAE) to handle these issues by performing attribute alignment for speech representation learning. FS-VAE performs attribute alignment by using attention-based scoring machines guided by two additional penalty terms. After obtaining the attribute-aligned representation, we can then choose and mask the nodes containing specific attribute of interest based on the requirement in the downstream tasks. We evaluate our methods on tasks of PP-SER (identity-free emotion recognition) and PP-SV (emotion-less speaker verification). Our proposed method achieves better utility maintenance and competitive privacy protection compared to the most recent attribute-aligned representation learning method.

Figures

An illustration of our proposed FS-VAE.

Keywords

speech representation ｜ feature scoring ｜ privacy ｜ fair ｜ attribute alignment

Authors