RESEARCH

HOME RESEARCH
Health Analytics
Predictive Model
Clinical Attributes
Improving Young Stroke Prediction by Learning with Active Data Augmenter in a Large-Scale Electronic Medical Claims Database
Abstract
Electronic medical claims (EMC) database has been successfully used for predicting occurrences of stroke and a variety of other diseases. However, inadequate predictive performances have been observed in cases of rare occurrences due to both insufficient training samples and highly imbalanced class distribution. In this work, our aim is to improve stroke prediction, especially for young age group (25-45 year-old) in a large population-based EMC database (552,898 subjects). We learn a young stroke predictive deep neural network model using a novel active data augmenter. The augmenter selects the most informative EHR data samples from old age stroke patients. This approach achieves 9.3% and 8.2% area under the receiver operating characteristic curve (AUC) value improvements compared to training directly with only young age group data and training all age groups data, respectively. We further provide analyses on the AUC values obtained as a function of the training data size, and the amount and the type of augmented data samples.
Figures
Constructions of expanded augemented datasets for the six learning procedures in this study.
Constructions of expanded augemented datasets for the six learning procedures in this study.
Experiments for young age stroke prediction by using 10%, 20%, 40% and 80% datasets (AUC values).
Experiments for young age stroke prediction by using 10%, 20%, 40% and 80% datasets (AUC values).
Authors
Publication Date
2018/07/18
Conference
2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)
DOI
10.1109/embc.2018.8513479
Publisher
IEEE