Skip to main content
Oral defences & examinations, Thesis defences

Masters Thesis Defense: Hoang Dung Do


Date & time
Thursday, April 29, 2021
12 p.m. – 2 p.m.
Speaker(s)

Hoang Dung Do

Cost

This event is free

Where

Online

Candidate: Hoang Dung Do

Thesis Title: Modeling the Linux Page Cache for Accurate Simulation of Data-Intensive Applications

Date & Time: April 29th, 2021 @ 12:00 pm

Online (Zoom)


Examining Committee:

Dr. Essam Mansour - Chair

Dr. Tristan Glatard - Supervisor

Dr. Essam Mansour - Examiner

Dr. Todd Eavis -  Examiner

 

Abstract:

The emergence of Big Data in recent years has led to a growing need in data processing and an increasing number of data-intensive applications. Processing and storage of this massive amount of data in those applications require large-scale solutions and thus must data-intensive applications be executed on large-scale infrastructures such as cloud or High Performance Computing (HPC) clusters. Although there are advancements of hardware/software stack that enable larger computing platforms, some relevant challenges remain in resource management, performance, scheduling, scalability, etc. As a results, there is an increasing demand for performance quantification as well as optimization when executing data-intensive applications on those platforms. While infrastructures with sufficient compute power and storage capacity are available, the I/O performance on disks remains as a bottleneck. To tackle this problem, apart from hardware improvements, the Linux page cache is an efficient architectural approach to reduce I/O overheads, but few experimental studies of its interactions with Big Data applications exist, partly due to limitations of real-world experiments. Simulation is a popular approach to address these issues, however, existing simulation frameworks do not simulate page caching fully, or even at all. As a result, simulation-based performance studies of data-intensive applications lead to inaccurate results.

This thesis proposes an I/O simulation model that captures the key features of the Linux page cache. We have implemented this model as part of the WRENCH work-flow simulation framework, which itself builds on the popular SimGrid distributed systems simulation framework. Our model and its implementation enable the simulation of both single-threaded and multithreaded applications, and of both writeback and writethrough caches for local or network-based filesystems. We evaluate the accuracy of our model in different conditions, including sequential and concurrent applications, as well as local and remote I/Os. The results show that our page cache model reduces the simulation error by up to an order of magnitude when compared to state-of-the-art, cacheless simulations.

Back to top

© Concordia University