Abstract
Speech recordings frequently encounter a variety of distor- tions, making the task of eliminating them essential yet chal- lenging. In this study, leveraging the current success of score- based generative modeling (SGM), we propose a novel noise- robust bandwidth expansion (BWE) framework based on an in- novative parameterized stochastic diffusion process, achieved through stepwise bandwidth expansion in the spectrogram. Our proposed Step-Wised Bandwidth Expansion (SWiBE) method outperforms baseline approaches over considered metrics, in- cluding the current state-of-the-art noise-robust BWE model and various diffusion and GAN-based models. Moreover, we analyze the interaction between the hyperparameters and per- formance across different aspects including perceptual quality and spectral reconstruction. Our findings reveal that the score- based model manifests distinct characteristics under varying pa- rameterizations.