Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, which is part of a larger collaboration, we would like to expand our investigation of state-of-the-art data selection mechanisms …
      
      
      
      
  
    Supervisors: 
    Pınar Tözün, Ties Robroek
  
    Semester: Fall 2025
 
  Tags: data selection, deep learning, machine learning, resource efficiency