PhD Oral Exam - Zhenhao Li, Computer Science
Towards Providing Automated Supports to Developers on Making Logging Decisions and Log Analysis
This event is free
School of Graduate Studies
When studying for a doctoral degree (PhD), candidates submit a thesis that provides a critical review of the current state of knowledge of the thesis subject as well as the student’s own contributions to the subject. The distinguishing criterion of doctoral graduate research is a significant and original contribution to knowledge.
Once accepted, the candidate presents the thesis orally. This oral exam is open to the public.
Developers write logging statements to generate logs and record system execution behaviors. Such logs are widely used for a variety of tasks, such as debugging, testing, program comprehension, and performance analysis. However, there exist no practical guidelines on how to write logging statements; hence, making the logging decision a very challenging task.
Moreover, due to the large volume of logs, it is difficult to manually analyze the logs. It is also challenging to effectively parse the raw logs into a more structured format for further analysis (i.e., log abstraction).
In this thesis, we focus on two main challenges that developers are facing in logging practices: difficult to make proper logging decisions; and difficult to efficiently analyze the logs.
We propose a series of approaches to address the problem and help developers on logging practices in two aspects: 1) assist in making logging decisions, and 2) assist in log analysis.
For logging decisions, we tackle the challenge by providing suggestions on writing logging statements. We first provide suggestions for logging locations. We conduct a comprehensive manual study on the characteristics of logging locations and propose a deep learning framework to automatically suggest logging locations at the block level. We uncover six categories of logging locations and find that developers usually insert logging statements to record execution information in various types of code blocks. We then model the source code at the code block level using the syntactic and semantic information and propose a deep learning framework to automatically suggest logging locations at the block level. We find that our models are effective in suggesting logging locations at the block level.
We then study the verbosity levels in the logging statements.
We first conduct a manual study on the characteristics of log levels. We find that the syntactic context of logging statements and their messages, as well as the ordinal nature of log levels might be leveraged to help determine proper log
We then propose a deep learning based approach that can leverage the ordinal nature of log levels to make suggestions on choosing log levels.
Our approach outperforms the baseline approaches and are effective at suggesting log levels in both within-system and cross-system scenarios.
Finally, we investigate practitioners' expectation on the readability of log messages by conducting a series of semi-structured interviews with industrial practitioners. We derive three aspects that are related to the readability of log messages. We then explore the potential of automatically classifying the readability of log messages and find that both deep learning and machine learning approaches are effective at such classifications.
For log analysis, we focus on studying log abstraction, which is a crucial step for automated log analysis. We focus on studying dynamic variables, which are usually completely abstracted by prior log abstraction techniques. These abstracted dynamic variables may also contain important information that is useful based on the given tasks. We first empirically study the dynamic variables in logs including manual study and survey with industrial practitioners.
We find that different categories of dynamic variables record valuable information that can be important for different tasks, the distinction of different categories of dynamic variables may help the log-based downstream tasks. We then propose a deep learning based log abstraction approach, which can identify different categories of dynamic variables and abstract specified categories. Our approach outperforms state-of-the-art log abstraction techniques on general log abstraction that abstracts all the identified dynamic variables, and also achieves promising results on variable-aware log abstraction that further considers the category of each dynamic variable. We also find that variable-aware log abstraction can help improve the performance of log-based anomaly detection.
Through the research outcomes of these studies, we find that this thesis uncovers the possibility of mining software repositories and leveraging practitioners' knowledge to provide useful suggestions and supports to developers in the process of writing logging statements and log analysis.