PROPOSAL
Extending Spark Scheduler for Heterogeneous Clusters
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the available resources in the cluster.
Deliverables of the master thesis project
- An overview of Spark applications and how they are divided into tasks.
- A study of Spark scheduler.
- An implementation to extend the Spark scheduler to account for heterogeneous com- puting resources in the Spark cluster.
- An experimental validation of the developed scheduler.