< >

Human-centric Machine Intelligence Lab

Our current research thrusts include:
Software vulnerability screening, enterprise data loss prevention, high-precision anomaly detection; Machine learning technologies in healthcare, patient-oriented digital health.

Current Research


Software vulnerabilities are costly. NIST estimates that cost to be $60 billion each year, which includes the costs for developing and distributing software patches and reinstalling infected systems and the lost productivity due to malware and errors.

The problem of software vulnerabilities is not new. What is new and promising is the increasing adoption of cryptography and security mechanisms in common software applications. However, it is difficult to write crypto code correctly.

Surprisingly, the practical task of securing cryptographic implementation is still in its infancy. This status is in sharp contrast with the multi-decade advancement of modern cryptography.

This gap became particularly alarming, after multiple high-profile discoveries of cryptography-related vulnerable code in widely used network libraries and tools (e.g., the lack of authenticated encryption in iMessage, Diffie-Hellman key exchange downgrade vulnerability in TLS, and the exposure of random seeds in Juniper Network).

Our ongoing effort is on cryptographic program analysis (CPA), where we design rigorous static program analysis to detect crypto vulnerabilities in code C programs (IEEE SecDev 2017) and Java programs (ACM CCS 2019).

Our ICSE '18 work on empirical findings from the Stack Overflow forum are interesting. They motivate the need for effective crypto coding assistance. Our CCS '19 work CryptoGuard exposes cryptographic API misuses, such as exposed secrets, predictable random numbers, and vulnerable certificate verification. Running our tool on massive sized (e.g., millions of LoC) on 46 high-impact large-scale Apache projects and 6,181 Android apps generated many security insights. Our findings helped multiple popular Apache projects to harden their code, including Spark, Ranger, and Ofbiz. At the same time our refinements in CryptoGuard reduce false alerts by 76% to 80% in our experiments.

This work has been supported by NSF and ONR.


The massive payment card industry (PCI) involves various entities such as merchants, issuer banks, acquirer banks, and card brands. Ensuring security for all entities that process payment card information is a challenging task. The PCI Security Standards Council requires all entities to be compliant with the PCI Data Security Standard (DSS), which specifies a series of security requirements. However, little is known regarding how well PCI DSS is enforced in practice.

We have highlighted data-leaks due to human mistakes (TIFS '19), data breaches in enterprise (WIREs Data Mining Knowl Discov '17), and reported the Target Corporation data breach (CoRR '17) the second most devastating data breach in history.

Our CCS '19 work BuggyCart identified an alarming gap between the security standard and its real-world enforcement. Using 35 PCI DSS related vulnerabilities BuggyCart can examine the capability and limitations of PCI scanners and the rigor of the certification. process.

Our new lightweight scanning tool named PciCheckerLite reveal that none of the 6 PCI scanners we tested are fully compliant with the PCI scanning guidelines, issuing certificates to merchants that still have major vulnerabilities. By scanning 1,203 e-commerce websites across various business sectors, we can confirm that 86% of the websites have at least one PCI DSS violation that should have disqualified them as non-compliant. Our in-depth accuracy analysis also shows that PciCheckerLite’s output is more precise than w3af. We have reached out to the PCI Security Council to share our research results to improve the enforcement in practice.

Our IEEE TIFS 2015 work on data-loss detection as a service and fuzzy fingerprints was a top 25 most downloaded article of the IEEE Signal Processing Society in 2018. Also check out the many keynote presentations HERE, including explanations on high-profile Target, Equifax, and Office of Personnel Management (OPM) data breaches, as part of our efforts to democratize data protection knowledge.


Program and system anomaly detection analyzes normal program and system behaviors and discovers aberrant executions caused by attacks, misconfigurations, program bugs, and unusual usage patterns. It was first introduced as an analogy between intrusion detection for programs and the immune mechanism in biology. Program and system anomaly detection has been a long-standing security approach with versatile applications, ranging from securing server programs in critical environments, to detecting insider threats in enterprises, to anti-abuse detection for online social networks

Despite the seemingly diverse application domains, anomaly detection solutions share similar technical challenges, such as how to accurately recognize various normal patterns, how to reduce false alarms, how to adapt to concept drifts, and how to minimize performance impact. They also share similar detection approaches and evaluation methods, such as feature extraction, dimension reduction, and experimental evaluation

Among many contributions to the diverse domain of anomaly detection, to list a few we have worked on developing machine learning algorithms to detect diverse normal call-correlation patterns (CCS '15), formal framwork for program anamoly detection (RAID '16) , and context sensitive probabilistic reasoning of program behaviours (DSN '16). Our CCS '16 turotial introduces the problem of program attacks and the anomaly detection approach against threats.

Our book titled Anomaly Detection as a Service Challenges, Advances, and Opportunities highlights the real-world adoption and deployment of anomaly detection technologies, by systematizing the body of existing knowledge on anomaly detection. This book is focused on data-driven anomaly detection for software, systems, and networks against advanced exploits and attacks, but also touches on a number of applications, including fraud detection and insider threat.

Yao group holds two U.S. patents on traffic anomaly detection, U.S. Patents No. 8,763,127 and No. 9,888,030.