PROPOSAL
Efficient Data Selection Methods for Machine Learning
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and resource-efficiency perspective in addition to their impact on model accuracy.
If your interests lie at the intersection of machine learning and systems performance, this project would be a great fit for you.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. Based on the size of the project and the interests of the student(s), we can target all or a subset of the tasks above.