A Framework for Systematic Performance Experiments for Machine Learning

Supervisors: Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization

Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware Data Systems Tracker. There is a varity of work to be done on this framework to improve it.

(1) Currently, this framework uses Postgres data management system to keep the experimental data. We would like to checkout alternatives to Postgres in our design. More specifically, test out the impact of using DuckDB instead, as it offers a leaner system design for data analytics and visualization of data. We already have an initial feasibility study with DuckDB. We will be building on the lessons-learned from that study.

(2) We have so far used this framework mainly on server-grade hardware that is composed of AMD or Intel CPUs and NVIDIA GPUs. We would like investigate its behavior on other hardware platforms such as servers with AMD GPUs or edge devices with various forms of accelerators.

(3) The framework currently supports a variety of profiling and monitoring tools for CPUs and NVIDIA GPUs. We would like to add support for similar tools on wider-variety of hardware.

If you are interested in systems and performance aspects for machine learning, scalable data management, and benchmarking in general, this project would be a great fit for you.

This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. Based on the size of the project and the interests of the student(s), we can target all or a subset of the tasks above.