RESEARCH

HOME RESEARCH
Behavior Computing
Speech and Language
Other: Signal Modeling for Understanding
Noise-Robust Bandwidth Expansion for 8K Speech Recordings
Abstract
Speech recordings in call centers are narrowband and mixed with various noises. Developing a bandwidth expansion (BWE) model is important to mitigate the automated speech recognition (ASR) performance gap between the low and high sampling rate speech data. To further address the in-the-wild noise in call cen- ter settings, we propose an Embedding-Polished Wave-U-Net (EP-WUN) that includes an additional speech quality classifier to handle the noise and bandwidth expansion of 8k audio si- multaneously. Our framework shows improved speech quality metrics on a well-known BWE dataset (Valentini-Botinhao cor- pus) when comparing to the current state-of-the-art noise-robust BWE model with 33% fewer parameters. It also achieves an 11.71% word error rate reduction when evaluating on a real- world interactive voice response system from the E.SUN bank.
Figures
Proposed EP − WUN is composed of WUN and SQC. Robust learning is done by applying the modified triplet loss on the hidden feature h of SQC, where ha is the anchor, and hp, hn comes from the clean and noisy speech respectively.
Proposed EP − WUN is composed of WUN and SQC. Robust learning is done by applying the modified triplet loss on the hidden feature h of SQC, where ha is the anchor, and hp, hn comes from the clean and noisy speech respectively.
Keywords
Bandwidth expansion | Robust speech represen- tation learning | Automated speech recognition
Authors
Bo-Hao Su Chi-Chun Lee
Publication Date
2023/08/22
Conference
Interspeech
Interspeech 2023
Publisher
ISCA