Disk Access Tracing for Data-Intensive Systems
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems have access to orders of magnitude faster persistent storage than traditional hard disks. Recently developed data management systems (e.g., Umbra, LeanStore), therefore, factor in fast persistent storage from the beginning as a design target. However, the data access behavior of modern and popular data-intensive systems over persistent storage has not been studied thoroughly unlike the memory access behavior of such systems. The goal of this project is to identify a set of popular state-of-the-art data-intensive systems and trace their data access patterns over persistent storage. This type of analysis is necessary to understand what the data systems need and how to morph their need to take better advantage of modern fast storage.
The project can be adjusted to run as an MSc thesis, a BSc thesis, or a regular semester project.