Master Thesis Defense: Ahmad Tahmid
Speaker: Ahmad Tahmid
Supervisor: Dr. N. Tsantalis
Drs. T.-H. Chen, E. Shihab, Y.-G. Gueheneuc (Chair)
Title: Frequently Refactored Code Idioms
Date: Wednesday, November 27, 2019
Place: EV 2.260
It is important to refactor software source code from time to time to preserve its maintainability and understandability. Despite this, software developers rarely dedicate time for refactoring due to deadline pressure or resource limitations. To help developers take advantage of refactoring opportunities, researchers have been working towards automated refactoring recommendation systems for a long time. However, these techniques suffer from low precision, as many of their recommendations are irrelevant to the developers. As a result, most developers do not rely on these systems and prefer to perform refactoring based on their own experience and intuition. To shed more light on the practice of refactoring, we investigate the characteristics of the code fragments that get refactored most frequently across software repositories.
Finding the most repeatedly refactored fragments can be very challenging due to the following factors: i) Refactorings are highly influenced by the context of the code. Therefore, it is difficult to remove context-specific information and find similarities. ii) Refactorings are usually more complex than simple code edits such as line additions or deletions. Therefore, basic source code matching techniques, such as token sequence matching do not produce good results. iii) A higher level of abstraction is required to match refactored code fragments within projects of a different domain. At the same time, the structural detail of the code must be preserved to find accurate results. In this study, we tried to overcome the above-mentioned challenges and explore several ways to compare refactored code from different contexts. We transformed the refactored code to different source code representations and attempted to find non-trivial refactored code idioms, which cannot be identified using the standard code comparison techniques.
In this thesis, we are presenting the findings of our study, in which we examine the repetitiveness of refactorings on a large java codebase of 1,025 projects. We discuss how we analyze our dataset of 47,647 refactored code fragments using a combination of state-of-art code matching techniques. Finally, we report the most common refactoring actions and discuss the motivations behind those.