When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Abstract
Software defects can lead to significant consequences, adversely affecting system per- formance by resulting in critical failures. The objective of Just-In-Time Software Defect Prediction (JIT-SDP) techniques is to identify potential defects at an early stage of de- velopment, thereby enhancing the reliability and maintainability of software. This thesis contributes novel advancements to JIT-SDP, specifically addressing project clusters, data imbalance, and classifier combination challenges. Additionally, all contributions are evalu- ated using diverse software projects and 34 datasets, encompassing a total of 259k commits.
The first contribution introduces ClusterCommit, a JIT-SDP approach tailored for project clusters sharing libraries and functionalities. Unlike traditional methods, ClusterCommit employs a machine learning model trained on commits from various projects within a clus- ter. The study incorporates six machine learning and three deep learning models. The results reveal noteworthy improvements, with mean Area Under the Curve (AUC) values ranging from 4% to 12%, particularly prominent in complex models such as Random For- est (RF) and Support Vector Machine (SVM) when dealing with large clusters. In contrast, simpler models like Naive Bayes (NB), Logistic Regression (LR), Decision Tree (DT), and k-Nearest Neighbors (k-NN) do not perform as well when applied to clusters of projects.