Noise-Robust Bandwidth Expansion for 8K Speech Recordings｜BIIC Lab - NTHU | 人本訊號運算研究室

Other: Signal Modeling for Understanding

Noise-Robust Bandwidth Expansion for 8K Speech Recordings

Download PDF

Abstract

Speech recordings in call centers are narrowband and mixed with various noises. Developing a bandwidth expansion (BWE) model is important to mitigate the automated speech recognition (ASR) performance gap between the low and high sampling rate speech data. To further address the in-the-wild noise in call cen- ter settings, we propose an Embedding-Polished Wave-U-Net (EP-WUN) that includes an additional speech quality classifier to handle the noise and bandwidth expansion of 8k audio si- multaneously. Our framework shows improved speech quality metrics on a well-known BWE dataset (Valentini-Botinhao cor- pus) when comparing to the current state-of-the-art noise-robust BWE model with 33% fewer parameters. It also achieves an 11.71% word error rate reduction when evaluating on a real- world interactive voice response system from the E.SUN bank.

Figures

Proposed EP − WUN is composed of WUN and SQC. Robust learning is done by applying the modified triplet loss on the hidden feature h of SQC, where ha is the anchor, and hp, hn comes from the clean and noisy speech respectively.

Keywords

Bandwidth expansion ｜ Robust speech represen- tation learning ｜ Automated speech recognition

Authors

Publication Date

2023/08/22

Conference

Interspeech 2023

Publisher

RESEARCH

Related Research