Abstract
Applying machine learning (ML) methods on electronic health records (EHRs) that accurately predict the occurrence of a variety of diseases or complications related to medications can contribute to improve healthcare quality. EHRs by nature contain multiple modalities of clinical data from heterogeneous sources that require proper fusion strategy. The deep neural network (DNN) approach, which possesses the ability to learn classification and feature representation, is well-suited to be employed in this context. In this study, we collect a large in-hospital EHR database to develop analytics in predicting 1-year gastrointestinal (GI) bleeding hospitalizations for patients taking anticoagulants or antiplatelet drugs. A total of 815,499 records (16,757 unique patients) are used in this study with three different available EHR modalities (disease diagnoses, medications usage, and laboratory testing measurements). We compare the performances of 4 deep multimodal fusion models and other ML approaches. NNs result in higher prediction performances compare to random forest (RF), gradient boosting decision tree (GBDT), and logistic regression (LR) approaches. We further demonstrate that deep multimodal NNs with early fusion can obtain the best GI bleeding predictive power (area under the receiver operator curve [AUROC] 0.876), which is significantly better than the HAS-BLED score (AUROC 0.668).