Precision and Fairness in AI Digital Health

How well do AI algorithms work in treating patients in real-world scenarios? How reliable are they in predicting rare cases and handling special circumstances? Can we quantitatively measure and compare complex and new solutions using benchmarks before their real-world deployment? What are the fundamental limitations of AI algorithms in its prediction capability? These are the important research questions we aim to address.

For example, in our bias correction work, we inspect how data imbalance impacts minority patients. Many clinical datasets are intrinsically imbalanced, dominated by overwhelming majority groups. Off-the-shelf machine learning models that optimize the prognosis of majority patient types (e.g., healthy class) may cause substantial errors on the minority prediction class (e.g., disease class) and demographic subgroups (e.g., Black or young patients). For example, our work found that missed death cases are 3.14 times higher than missed survival cases in a mortality prediction model. In the typical one-machine-learning-model-fits-all paradigm, racial and age disparities are likely to exist, but unreported. What makes it worse is the deceptive nature of widely used whole-population metrics, such as AUC-ROC. We show that some metrics fail to reflect serious prediction deficiencies. To mitigate representational biases, we design a double prioritized (DP) bias correction technique. Our method trains customized models for specific ethnicity or age groups, a substantial departure from the one-model-predicts-all convention. We report our findings on four prognosis tasks over two imbalanced clinical datasets. DP reduces relative disparities among race and age groups, 5.6% to 86.8% better than the 8 existing sampling solutions being compared, in terms of the minority class’ recall. This work is in collaboration with Dr. Chang Lu at Virginia Tech Chemical Engineering and Dr. Charles Nemeroff at University of Texas at Austin Dell Medical School. Find out about this work here, manuscript under review at Communications Medicine.

Our group has several other super exciting ongoing projects on AI digital health, including investigating the trustworthiness of machine learning models in critical medical applications, as well as designing effective medical knowledge discovery algorithms and their quantitative measurement.

Deployment-grade High-precision Vulnerability Screening

Software vulnerabilities are costly. NIST estimates that cost to be $60 billion each year, which includes the costs for developing and distributing software patches and reinstalling infected systems and the lost productivity due to malware and errors.

The problem of software vulnerabilities is not new. What is new and promising is the increasing adoption of cryptography and security mechanisms in common software applications. However, it is difficult to write crypto code correctly.

Surprisingly, the practical task of securing cryptographic implementation is still in its infancy. This status is in sharp contrast with the multi-decade advancement of modern cryptography.

This gap became particularly alarming, after multiple high-profile discoveries of cryptography-related vulnerable code in widely used network libraries and tools (e.g., the lack of authenticated encryption in iMessage, Diffie-Hellman key exchange downgrade vulnerability in TLS, and the exposure of random seeds in Juniper Network).

Our ongoing effort is on cryptographic program analysis (CPA), where we design rigorous static program analysis to detect crypto vulnerabilities in code C programs (IEEE SecDev 2017) and Java programs (ACM CCS 2019).

Our ICSE '18 work on empirical findings from the Stack Overflow forum are interesting. They motivate the need for effective crypto coding assistance. Our CCS '19 work CryptoGuard exposes cryptographic API misuses, such as exposed secrets, predictable random numbers, and vulnerable certificate verification. Running our tool on massive sized (e.g., millions of LoC) on 46 high-impact large-scale Apache projects and 6,181 Android apps generated many security insights. Our findings helped multiple popular Apache projects to harden their code, including Spark, Ranger, and Ofbiz. At the same time our refinements in CryptoGuard reduce false alerts by 76% to 80% in our experiments.

Oracle Labs Australia has integrated CryptoGuard's detection approach to their internal code screening framework Parfait and uses it to harden large-scale production codebases (ACM DTRAP '22).

This work has been supported by NSF and ONR.

DATA BREACHES AND CERTIFICATION SECURITY

Enterprise Data Security and Breach Prevention

The massive payment card industry (PCI) involves various entities such as merchants, issuer banks, acquirer banks, and card brands. Ensuring security for all entities that process payment card information is a challenging task. The PCI Security Standards Council requires all entities to be compliant with the PCI Data Security Standard (DSS), which specifies a series of security requirements. However, little is known regarding how well PCI DSS is enforced in practice.

We have highlighted data-leaks due to human mistakes (TIFS '19), data breaches in enterprise (WIREs Data Mining Knowl Discov '17), and reported the Target Corporation data breach (CoRR '17) the second most devastating data breach in history.

Our CCS '19 work BuggyCart identified an alarming gap between the security standard and its real-world enforcement. Using 35 PCI DSS related vulnerabilities BuggyCart can examine the capability and limitations of PCI scanners and the rigor of the certification. process.

Our new lightweight scanning tool named PciCheckerLite reveal that none of the 6 PCI scanners we tested are fully compliant with the PCI scanning guidelines, issuing certificates to merchants that still have major vulnerabilities. By scanning 1,203 e-commerce websites across various business sectors, we can confirm that 86% of the websites have at least one PCI DSS violation that should have disqualified them as non-compliant. Our in-depth accuracy analysis also shows that PciCheckerLite’s output is more precise than w3af. We have reached out to the PCI Security Council to share our research results to improve the enforcement in practice.

Our IEEE TIFS 2015 work on data-loss detection as a service and fuzzy fingerprints was a top 25 most downloaded article of the IEEE Signal Processing Society in 2018. Also check out the many keynote presentations HERE, including explanations on high-profile Target, Equifax, and Office of Personnel Management (OPM) data breaches, as part of our efforts to democratize data protection knowledge.

Welcome to Yao Group!

Human-centric Machine Intelligence Lab

Current Research

Precision and Fairness in AI Digital Health

Deployment-grade High-precision Vulnerability Screening

Enterprise Data Security and Breach Prevention