Abstract
Automated auscultation and cardiovascular screening systems for cardiac abnormalities have received growing interest in clinical applications. Still, they face challenges due to missing or invalid recordings caused by technical issues. To address this, we introduce a novel framework leveraging the masked autoencoder strategy, uniquely treating each heart valve recording as a distinct token. Our approach reconstructs missing valve data using existing representation and learnable mask tokens, achieving inter-valve integration through positional embeddings and TCNs. We demonstrate state-of-the-art (SOTA) performance on the CirCor DigiScope dataset, outperforming top participants and SOTA imputation methods in terms of mean cost of patient outcome, accuracy, F1-measure, and macro F1 score. Furthermore, our analysis highlights improved predictive accuracy on limited input data, while generative results indicate our capability to provide comprehensive reconstruction of auscultation recordings for further clinical evaluations.