Extending Spark Scheduler for Heterogeneous Clusters

Supervisor: Iman Elghandour
Semester: Fall 2019

Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the available resources in the cluster.

Deliverables of the master thesis project
  • An overview of Spark applications and how they are divided into tasks.
  • A study of Spark scheduler.
  • An implementation to extend the Spark scheduler to account for heterogeneous com- puting resources in the Spark cluster.
  • An experimental validation of the developed scheduler.