How well do AI algorithms work in treating patients in real-world scenarios? How reliable are they in predicting rare cases and handling special circumstances? Can we quantitatively measure and compare complex and new solutions using benchmarks before their real-world deployment? What are the fundamental limitations of AI algorithms in its prediction capability? These are the important research questions we aim to address.
For example, in our bias correction work, we inspect how data imbalance impacts minority patients. Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease class) and demographic subgroups (e.g., Black or young patients). For example, our work found that missed death cases are 3.14 times higher than missed survival cases in a mortality prediction model. In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but unreported. What makes it worse is the deceptive nature of widely used whole-population metrics, such as AUC-ROC. We show that some metrics fail to reflect serious prediction deficiencies. To mitigate representational biases, we design a double prioritized (DP) bias correction technique. Our method trains customized models for specific ethnicity or age groups, a substantial departure from the one-model-predicts-all convention. We report our findings on four prognosis tasks over two imbalanced clinical datasets. DP reduces relative disparities among race and age groups, 5.6% to 86.8% better than the 8 existing sampling solutions being compared, in terms of the minority class’ recall. This work is in collaboration with Dr. Chang Lu at Virginia Tech Chemical Engineering and Dr. Charles Nemeroff at University of Texas at Austin Dell Medical School. Find out about this work here, manuscript under review at Communications Medicine.
Our group has several other super exciting ongoing projects on AI digital health, including investigating the trustworthiness of machine learning models in critical medical applications, as well as designing effective medical knowledge discovery algorithms and their quantitative measurement.