PhD Oral Exam - Paria Shirani, Individualized Program
Binary Code Fingerprinting with Application to Automated Vulnerability Detection
This event is free
School of Graduate Studies
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
With the growing popularity of emerging technologies, the prevalence of digital systems is more than ever. Security, however, has still lagged behind, as evidenced by the increasing rate of recent attacks. Oftentimes, cyber-attacks are initiated by running a malicious code or by exploiting vulnerabilities in the underlying software. To mitigate such alarming threats, analyzing software binary code (a.k.a. binary analysis) has been known as a promising solution. This thesis answers the following research question: how to automatically fingerprint a cross-architecture code with optimization and obfuscation by attributing compiler provenance, identifying library functions, and detecting vulnerable functions? Specifically, it first analyzes the syntax, structure, and semantic of functions to extract compiler provenance in cross-complied binaries. Second, it introduces a single robust function signature based on heterogeneous features to solve library function identification problem. Third, it overcomes vulnerable function detection problem through a multi-stage fuzzy matching approach on firmware images. Finally, it addresses vulnerability detection problem in cross-architecture obfuscated binaries and firmware images through a neural machine translation-based approach. This thesis advances the state-of-the-art by improving the accuracy, scalability, and efficiency of binary code analysis. All of the proposed approaches are implemented as a prototype system and their performance are evaluated with real data.