PROPOSAL

Going Beyond Memory with GPU-based Data Analytics


Supervisors: Pınar Tözün
Semester: Fall 2025
Tags: SSDs, GPU-centric IO, data analytics, modern storage

This rise of hardware accelerators to meet the demand of AI workloads has also led to a variety of novel methods to leverage GPUs for traditional data analytics workloads. A key concern for any data-intensive system using GPUs is the efficiency of moving the data to the accelerator. In this project, we will investigate ways to improve data movement to GPUs by focusing on the steps of the data path and performance of different data movement options from the network or SSDs to GPUs such as GPUDirect or BaM [1].

The first step would be to identify the target data analytics platform to use as a testbed (e.g., POLARS, Crystal, Proteus). Then, we will integrate the target data movement options into this platform, unless already supported by the platform. Finally, we will do a performance analysis based on this integration.

If you are interested in data management systems, GPUs, storage devices, benchmarking, and performance analysis in general, this project would be a great fit for you. This project would be suitable as a standalone project, a research project with an MSc thesis followup, or BSc thesis. We can adjust the scope of the project depending on the project and project group size.

[1] Torp et al. “Path to GPU-Initiated I/O for Data-Intensive Systems”, DaMoN 2025.