PROPOSAL

Framework for Systematic Performance Experiments for Machine Learning

Supervisors: Pınar Tözün, Ties Robroek
Semester: Fall 2025
Tags: benchmarking, data management, data visualization

Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware Data Systems Tracker. There is a variety of work to be done on this framework to improve it.

(1) Currently, this framework relies on the table schema set by the MLFlow framework to organize the experimental data being collected by radT. We would like to investigate alternative data organization or table schemas for managing the collected data in a scalable manner. If you are interested in databases and data management, this project will be ideal for you.

(2) We have so far used this framework mainly on server-grade hardware that is composed of AMD or Intel CPUs and NVIDIA GPUs. We would like investigate its behavior on other hardware platforms such as servers with AMD GPUs or edge devices with various forms of accelerators.

(3) The framework currently supports a variety of profiling and monitoring tools for CPUs and NVIDIA GPUs, as well as Carbontracker. We would like to add support for similar tools on wider-variety of hardware.

If you are interested in systems and performance aspects for machine learning, scalable data management, sustainability, and benchmarking in general, one or more of these project directions would be a great fit for you.

This project would be suitable as a standalone project or BSc or MSc thesis. Based on the size of the project and the interests of the student(s), we can target all or a subset of the tasks above.