Project Proposals


Here you can see a list of all currently proposed projects. For a list of all previous proposals, see the proposal archive

Subjects
Supervisors
  1. Zoi Kaoudi
  2. Niclas Hedam
  3. Philippe Bonnet
  4. Veronika Cheplygina

Supervisor: Zoi Kaoudi

PROPOSAL

Are you interested in working with a big data open source project? You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors: Zoi Kaoudi
Semester: Spring 2023
Tags: big data, database, cross-platform data processing, open source, Apache

PROPOSAL

Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors: Zoi Kaoudi
Semester: Spring 2023
Tags: machine learning, database, query optimization, ranking

PROPOSAL

Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors: Zoi Kaoudi
Semester: Spring 2023
Tags: machine learning, training data, query optimizer


Supervisor: Niclas Hedam

PROPOSAL

The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors: Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA


Supervisor: Philippe Bonnet

PROPOSAL

The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors: Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA


Supervisor: Veronika Cheplygina

PROPOSAL

Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint. This project specifically focuses specifically on medical data. There are some recent efforts, for example …
Supervisors: Veronika Cheplygina
Semester: Spring 2023
Tags: machine learning, medical imaging, data analysis, resource consumption

PROPOSAL

There have been several situations where machine learning classifiers, trained to diagnose a particular disease (for example, lung cancer from chest x-rays), overfit on hidden features within the data. Examples include gridlines, surgical markers or evidence of treatment or text present in the images (see references for examples). This causes the classifier to fail on other type of images. …
Supervisors: Veronika Cheplygina
Semester: Spring 2023
Tags: machine learning, data science, medical imaging

PROPOSAL

GANs have been proposed for generation of synthetic cell image data [1], or in other words data augmentation. We want to perform a critical survey of the field of GANs as used for data augmentation and examine some alternatives. We believe that the notion that GANs are generating “new” data should be challenged; it in fact generates a variation of the data it is being fed for training …
Supervisors: Veronika Cheplygina
Semester: Spring 2023
Tags: machine learning, data science, medical imaging