Thesis Title: Modeling the Linux Page Cache for Accurate Simulation of Data-Intensive Applications
Date & Time: April 29th, 2021 @ 12:00 pm
Online (Zoom)
Examining Committee:
Dr. Essam Mansour - Chair
Dr. Tristan Glatard - Supervisor
Dr. Essam Mansour - Examiner
Dr. Todd Eavis - Examiner
Abstract:
The emergence of Big Data in recent years has led to a growing need in data processing and an increasing number of data-intensive applications. Processing and storage of this massive amount of data in those applications require large-scale solutions and thus must data-intensive applications be executed on large-scale infrastructures such as cloud or High Performance Computing (HPC) clusters. Although there are advancements of hardware/software stack that enable larger computing platforms, some relevant challenges remain in resource management, performance, scheduling, scalability, etc. As a results, there is an increasing demand for performance quantification as well as optimization when executing data-intensive applications on those platforms. While infrastructures with sufficient compute power and storage capacity are available, the I/O performance on disks remains as a bottleneck. To tackle this problem, apart from hardware improvements, the Linux page cache is an efficient architectural approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results.
This thesis proposes an I/O simulation model that captures the key features of the Linux page cache. We have implemented this model as part of the WRENCH work-flow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both single-threaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. The results show that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations.