Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning