Apache Wayang (Big Data)

Supervisors: Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache

Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!

We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task/query with the goal of optimizing performance (or even monetary costs). For a general overview check this paper.

Potential topics:

  • Visual data analytics
  • Data debugging
  • Big data processing
  • database building
  • … just drop me an email ;)

Potential Intended Learning Outcomes depending on the concrete topic:

  • Ability to contribute to large open source codebases
  • Ability to use new technology
  • Ability to use data management techniques to improve systems performance
  • Ability to leverage machine learning techniques to improve systems performance

Prerequisites: good programming skills in Java; (preferably) knowledge in big data systems (e.g., Apache Flink, Apache Spark).