PROPOSAL

BLOX for Deep Learning Task Scheduling with GPU Collocation


Supervisors: Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation

Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.

BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this project, our goal is to investigate ways to integrate a collocation-aware scheduler in this framework.

This project would be suitable as a BSc or MSc thesis as well as a standalone project at ITU during Fall 2024. If you are interested in machine learning systems and their efficiency in general, this project would be a great fit for you. Depending on the size of the project or thesis (BSc, MSc, etc.) and the number of students in the group, we can adjust the tasks of the project.