Improving Young Stroke Prediction by Learning with Active Data Augmenter in a Large-Scale Electronic Medical Claims Database｜BIIC Lab - NTHU

Predictive Model

Clinical Attributes

Improving Young Stroke Prediction by Learning with Active Data Augmenter in a Large-Scale Electronic Medical Claims Database

Download PDF IEEE Xplore

Abstract

Electronic medical claims (EMC) database has been successfully used for predicting occurrences of stroke and a variety of other diseases. However, inadequate predictive performances have been observed in cases of rare occurrences due to both insufficient training samples and highly imbalanced class distribution. In this work, our aim is to improve stroke prediction, especially for young age group (25-45 year-old) in a large population-based EMC database (552,898 subjects). We learn a young stroke predictive deep neural network model using a novel active data augmenter. The augmenter selects the most informative EHR data samples from old age stroke patients. This approach achieves 9.3% and 8.2% area under the receiver operating characteristic curve (AUC) value improvements compared to training directly with only young age group data and training all age groups data, respectively. We further provide analyses on the AUC values obtained as a function of the training data size, and the amount and the type of augmented data samples.

Figures

Constructions of expanded augemented datasets for the six learning procedures in this study.

Experiments for young age stroke prediction by using 10%, 20%, 40% and 80% datasets (AUC values).

Authors