RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
Other: Signal Modeling for Understanding
An Attention-Based Method for Guiding Attribute-Aligned Speech Representation Learning
Abstract
The rich personal information contained in speech signal can lead to privacy leakage and unfair prediction for speech based technology. In this work, we propose a feature-scoring variational autoencoder (FS-VAE) to handle these issues by performing attribute alignment for speech representation learning. FS-VAE performs attribute alignment by using attention-based scoring machines guided by two additional penalty terms. After obtaining the attribute-aligned representation, we can then choose and mask the nodes containing specific attribute of interest based on the requirement in the downstream tasks. We evaluate our methods on tasks of PP-SER (identity-free emotion recognition) and PP-SV (emotion-less speaker verification). Our proposed method achieves better utility maintenance and competitive privacy protection compared to the most recent attribute-aligned representation learning method.
Figures
An illustration of our proposed FS-VAE.
An illustration of our proposed FS-VAE.
Keywords
speech representation | feature scoring | privacy | fair | attribute alignment
Authors
Publication Date
2022/09/18
Conference
Interspeech
Interspeech 2022
DOI
10.21437/Interspeech.2022-10419
Publisher
ISCA