Predicting Execution times Queries Executed on Accelerated Distributed Platforms

Supervisor: Iman Elghandour
Semester: Fall 2019

In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built using machine learning techniques among others.

Deliverables of the master thesis project
  • An overview of Spark applications and how they are divided into tasks.
  • A study of the architecture of GPUs and the main factors that affects the performance of code executed using them.
  • An implementation of performance prediction model for Spark tasks executed on GPUs.
  • An experimental validation of the developed model(s).