Abstract
Automatic speech recognition (ASR) is crucial for all users, but adapting it for Alzheimer’s disease (AD) faces challenges due to irregular speech patterns and privacy concerns. Feder- ated learning (FL), a privacy-preserving algorithm, is a solu- tion. However, FL ASR suffers from acoustic and text hetero- geneities. While advanced model-based and cluster-based FL methods aim to address the issue, they lack a direct mechanism for high intra-speaker heterogeneity exhibited by AD individ- uals and ASR-related properties. This study presents cluster- based personalized federated learning (CPFL), a strategy miti- gating heterogeneity by clustering ASR output token using the proposed CharDiv, a metric for pause and word usage distri- butions. Evaluation on the ADReSS challenge dataset shows a 3.6% improvement in word error rate (WER). Analysis of per- cluster WER improvements and CharDiv distributions indicates reduced heterogeneity, emphasizing pause usage as a potential key factor in AD-oriented ASR.