We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
A medical Visual Question Answering (VQA) system can provide meaningful references for both doctors and patients during the treatment process. Different from normal images, a learning setting with medical images is more challenging due limited amounts of data, class-imbalance and the presence of label noise for diagnosis tasks. Moreover, little attention is paid to how the images and meta-data is …
Supervisors:
Amelia Jiménez-Sánchez
Semester: Fall 2023
Tags: medical imaging, deep learning, machine learning, transfer learning, meta-learning
In relevance feedback, the choice of images to present to the user is a difficult problem, as a naïve approach may present too many similar images. The challenge addressed in this project is to ensure diversity (aka “one of each”) as well as relevance. A particularly interesting project for students interested in efficient algorithms.
Read more…
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, scalability, diversity
In interactive learning systems, such as Exquisitor, the system presents potentially relevant images to users who label them as either relevant or irrelevant. Currently, Exquisitor uses a cluster-based index, which allows it to return results from a collection of 100 million images in 0.3 seconds. The goal of this project is to study the application of hash-based indexing to interactive learning …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: multimedia analytics, diversity
The goal of this project is to enhance PhotoCube as a competior for the Video Browser Showdown, an international video retrieval competition where competing systems are judged based on speed, accuracy and recall. We propose to develop new versions of the C++-based media server and JS-based media browser, to expand the data model to videos and improve the performance sufficiently to take part in …
Supervisors:
Björn Þór Jónsson
Semester: Fall 2021
Tags: video search, multimedia analytics, photocube
We are actively developing a new prototype for analysing large multimedia collections in virtual reality, based on the ObjectCube data model. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisors:
Aaron Duane, Björn Þór Jónsson
Semester: Fall 2021
Tags: virtual reality, multimedia analytics
The index structure used for Exquisitor is eCP, a very scalable index for high-dimensional retrieval. While multimedia collections are typically constantly updated, however, the index is unfortunately static. The goal of this project is to implement and compare approaches for index maintenance.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: index maintenance, high-dimensional indexing
In this project, we propose to implement media server and media browser encapsulating a new data model for analysing media collections, called Multimedia Analytics Data Services (MADS). To validate the design, some scalability experiments should be performed.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, scalability
Students at ITU have made a prototype version of the Exquisitor system for the Android mobile phone! The system is missing some of Exquisitor’s advanced functionality, such as search and indexing, and the goal is to add and evaluate this functionality.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: multimedia analytics, android
We propose to develop a new prototype for analysing large multimedia collections in Virtual Reality, using the new Valve Indexes. There are many ways in which students can contribute to the project, including work on the user interface and the back-end, and later on running large-scale user experiments.
Read more…
Supervisor: Björn Þór Jónsson
Semester: Fall 2020
Tags: virtual reality, multimedia analytics
The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.
The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.
The project is suitable for 1-3 well-qualified MSc students.
In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
The goal of this project is build a prototype of the Exquisitor system for mobile devices.
The project is suitable for 1-3 well-qualified MSc students.
Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we …
Supervisor: Björn Þór Jónsson
Semester: Fall 2019
Deep neural networks have been revolutionary in computer vision and publicly available image datasets played an important role in this success. Due to their size, neural networks require vast amounts of data for training. Yet when it comes to medical settings dataset sizes are very limited due to the cost of data annotation, privacy concerns, differences in imaging techniques, and others. In such …
Supervisors:
Dovile Juodelyte
Semester: Fall 2023
Tags: transfer learning, deep learning, medical imaging
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
Geospatial data refers to information that is tied to specific geographic locations on the Earth’s surface. It includes both the location coordinates (such as latitude, longitude, and, potentially, altitude) and attribute data associated with those locations. Geospatial data is categorized into two types: raster and vector.
Vector data represents geographic features as points, lines, and …
Supervisors:
Eleni Tzirita Zacharatou
Semester: Fall 2023
Tags: spatial data analysis, data science, data loading, GIS file formats, geospatial data
It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark …
Supervisor: Iman Elghandour
Semester: Fall 2019
Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the …
Supervisor: Iman Elghandour
Semester: Fall 2019
Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously …
Supervisor: Iman Elghandour
Semester: Fall 2019
In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built …
Supervisor: Iman Elghandour
Semester: Fall 2019
Do you like open-source systems? Would you like to experience working with an open-source system? Do you want to learn about big data research in practice? Then, this project is for you!
We have a number of thesis/project topics under the umbrella of Apache Wayang. Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, database, cross-platform data processing, open source, Apache
Do you have the feeling that only a few players are controlling the AI game? Would you like to make AI technology accesible to everyone? Then, come and help us to make Agora a reality!
We have a number of thesis/project topics under the umbrella of the Agora project. This project aims at building a unified data infrastructure for supporting AI ecosystems that bring together data, algorithms, …
Supervisors:
Jorge Quiané
Semester: Fall 2022
Tags: big data, AI ecosystems, compliant data processing, federated analytics, data markets
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
ITU is a partner of the Danish Student Cubesat Program, DISCOSAT. We launched our first satellite DISCO-1 into Low Earth Orbit in April 2023 and we will launch a second DISCO-2 in 2024. In this project you will gain experience with automating live satellite operations and communications, completing a groundstation at the Rued Langaards Vej site for use with both satellites.
The DISCO satellite …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The DISCO-2 satellite is an Earth observation satellite in collaboration with the Arctic Research Center in Aarhus and is designed to complement ground based field studies in Greenland. The satellite instrument consists of 2 high quality visible light and 1 infrared cameras, as well as and attitude control system and coral TPU ML coprocessor.
In this project you will develop software to control …
Supervisors:
Julian Priest
Semester: Fall 2023
Tags: satellite, climate change, image processing, ML, csp, embedded, space
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
ITU is a partner in the Danish Student Cubesat Program, DISCO which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
As part of this project ITU is installing a satellite ground station with a range of antenna rotators on the roof of Rued Langaards Vej building and the equipment has been purchased. The ground station will track the …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Software Defined Radio
ITU is a partner in the Danish Student Cubesat Program DISCO, which will launch a series of small satellites into orbit, starting with DISCO 1 in 2023 and followed by DISCO2 in 2024.
ITU is developing a hi-res multi camera imaging payload for earth observation primarily in the Arctic. We are developing an on satellite machine learning capability using an ML coprocessor, as well as models that can …
Supervisors:
Julian Priest
Semester: archive
Tags: Satellite, Image processing, Edge, Constrained Computing, Networks, Machine Learning, Embeded, Radio
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, visualisation, Python, OSM data
As a response to increased traffic congestion and the need to reduce carbon emissions, cities consider ways to modernise, build and extend transit systems. Transit network design solutions can benefit from analysing the large amount of crowd-sourced location data available, which provides valuable insights into population mobility needs. Designing efficient metro lines, bicycle paths, or bus …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, network design, Python, OSM data
The idea behind “15-minutes cities” is that within a short walk or bike ride people should have access to all necessary facilities that constitute the essence of urban living, such as parks, shops, cafes, schools, hospitals. Initiatives to transform cities according to this paradigm are currently being implemented across the world, in an attempt to make urban spaces more liveable, …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph summaries, Python, OSM data
Musical genres are inherently ambiguous and difficult to define. Even more so is the task of establishing how genres relate to one another. Yet, genre is perhaps the most common and effective way of describing musical experience. The number of possible genre classifications (e.g. Spotify has over 4000 genre tags, LastFM over 500,000 tags) has made the idea of manually creating music taxonomies …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: scalable algorithms, hyperbolic embeddings, Python, Spotify data
The integration of wind power in the energy grid is dependent on accurate production forecasts. The power output curves between neighbouring wind farms are often correlated temporally and spatially, but currently, these spatiotemporal dependencies are under-utilised in prediction models. Graph neural networks allow for modelling these dependencies. In this project the student will implement a …
Supervisor: Maria Astefanoaei
Semester: Fall 2021
Tags: spatial data analysis, graph neural networks, Python, timeseries data
Open-source JavaScript applications, such as browser-based web games, are typically developed by individual software engineers or small teams. These teams often have limited financial resources to use commercial logging frameworks and cloud-based analysis systems and may also lack knowledge and expertise in logging. However, log analysis is highly important for many reasons: monitoring application …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: open source, performance
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking
The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, …
Supervisors:
Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
With the recent hunger for being “data driven”, many organizations are eager for integrating ML in there decision making process. Unfortunately, competent data scientists are still relatively scarce, and manual model development cannot keep up with the demand for magic AI solutions. This is no less true when it comes to forecasting. Knowing the future is extremely handy when making …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: AutoML, ML, Forecasting, Energy Data, Smart Meters, Python, Data Science, Time Series Data
How much does our smart meter readings disclose about us? Can we disentangle the oven from the washing machine from the kettle? Can we identify demographics and behavior patterns from the stream of electricity data?
Most danish homes are now equipped so-called “smart meters” - networked electricity meters that report consumption and load at much higher rate than conventional meters. …
Supervisors:
Niels Ørbæk Chemnitz
Semester: Spring 2021
Tags: NILM, ML, IoT, Energy Data, Smart Meters, Python, Data Science, Time Series Data
Outlier detection is carried out when the information is stored at the server. However, with the new IoT computational capabilities, outlier detection can be developed locally. Therefore, it is necessary to know how much RAM/Flash is needed for this step and which IoT brands can handle it. This project is divided into two parts. The first is implementing light-heavy ML algorithms in single points …
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems
TinyML is a new trend to deploy deep learning in tiny devices. Therefore, it is necessary to deploy several applications to understand the challenges and opportunities which tinyML brings us. In this scenario, any idea with embedded computer vision, voice recognition, and sensors are welcome.
Supervisor: Paul Rosero
Semester: Spring 2022
Tags: data analysis, IoT, Python, Embedded systems, Computer vision, Voice recognition
The emergence of computational storage platforms like Delilah has transformed the data storage landscape, enabling new computing paradigms and facilitating data-intensive applications. Delilah is a cutting-edge computational storage platform developed by the IT University of Copenhagen. It runs on the Daisy OpenSSD and exposes an asynchronous computational storage protocol to the host, facilitated …
Supervisors:
Niclas Hedam, Philippe Bonnet
Semester: Fall 2023
Tags: Open Source, Testing, Computational Storage, Hardware, FPGA
Disaggregated storage has gained acceptance in data centers. With disaggregated storage, storage resources are decoupled from compute resources, and made available through fabric. We are particularly interested in storage resources composed of an ARM-based smartNIC, which acts as fabric target as well as storage controller for a collection of SSDs.
The performance characteristics of the storage …
Supervisors:
Philippe Bonnet
Semester: Fall 2021
Tags: benchmarking, ARM, SoC, fabric, SSD, computational storage
Reproducibility is a cornerstone of the scientific method. There are systems available today to build reproducible and sharable data and analysis pipelines including workflow engines (e.g., GWL, Nextflow), package managers (e.g., bioconda), and container systems (e.g., Singularity). However, validating their executions on high-performance computers remains an open issue. Indeed, there are many …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: ML, reproducibility, workflow, HPC
Reproducibility is a cornerstone of the scientific method. It is also a core element of compliance requirements for sensitive equipment, e.g., audit trails for medical equipment. Often, a prerequisite for computational reproducibility is the availability of software and data. However, this is problematic for edge devices whose goal is to reduce the amount of data transferred to the backend. On …
Supervisors:
Philippe Bonnet
Semester: Fall 2020
Tags: reproducibility, edge
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.
Supervisor: Philippe Bonnet
Semester: Fall 2019
In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered …
Supervisor: Philippe Bonnet
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2020
Tags: FPGA, SSD, computational storage
Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS. We …
Supervisor: Philippe Bonnet
Semester: Fall 2019
Tags: FPGA, SSD
GPU offers massive computational power and parallelism through its Streaming Multiprocessors (SMs). Efficient GPU utilization is critical for maximizing performance and optimizing compute resource usage, which is measured using various metrics such as SMACT (SM Activity) and SMOCC (SM Occupancy), and DRAMA (DRAM Active). These metrics provide insight into how effectively the GPU’s SMs and …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Spring 2024
Tags: machine learning systems, GPU Utilization, resource management, resource interference
The work on running data-intensive applications on very powerful, expensive, and power-hungry server hardware is very popular thanks to the growing size of data centers and high-performance computing (HPC) platforms. However, with the rise of new generation internet of things (IoT) applications, the lower-power and lower-budget hardware devices that specifically target IoT, the edge platforms, …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: edge, benchmarking, data-intensive applications, resource-constrained hardware
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep learning changed the landscape of many applications like computer vision, natural language processing, etc. On the other hand, deep learning require gigantic computing power offered by modern hardware. As a result data scientists rely on powerful hardware resources offered by shared high-performance computing (HPC) clusters or the cloud. Due to the long-running times of deep learning …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, checkpointing, scheduling, resource management
Workload collocation has been shown as an effective method to reduce the hardware requirements for certain deep learning (DL) training tasks. On the other hand, there hasn’t been many robust open-source implementations of schedulers that incorporate workload collocation on GPUs for DL.
BLOX is a framework that aims at standardizing the way we implement deep learning schedulers. In this …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2024
Tags: machine learning systems, scheduling, resource management, workload collocation
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Traditionally solid-state drives (SSDs) does not give the users the ability to control the data placement on the SSD. This often leads to suboptimal performance and lowers SSD lifetime, since SSDs internally don’t allow in-place updates. The updated disk pages are written elsewhere and the old versions have to be garbage collected. This poses problems if data with different lifetimes and …
Supervisors:
Pınar Tözün
Semester: Fall 2024
Tags: SSDs, data management systems, modern storage
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
It is common to process data to clean it, filter it, restructure it, get metadata out of it, etc. before feeding the data into a data analysis or machine learning pipeline. There are many tools and libraries out there to aide with this process with different strengths and functionality (DALI, RAPIDS, HoloClean, DAPHNE, DuckDB, etc.). In this project, we would like to analyze pros/cons of some of …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: data preprocessing libraries, heterogeneous hardware, machine learning
In the past decade, data management community has focused on main-memory systems or main-memory-optimized systems. This focus has put the commodity memory hierarchy (DRAM and processor caches) into center when it comes to workload characterization studies. Today, with the evolution of persistent storage technologies such as NVRAM (persistent memory solution of Intel) and NVMe SSDs, data systems …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: workload characterization, tracing, modern storage, data-intensive systems
DAPHNE is an EU project that aims at building a data system targeting integrated data analysis pipelines across data management and processing, high-performance computing (HPC), and machine learning (ML) training and scoring. The project had its first code release back in March. This project aims at adding a profiling infrastructure for DAPHNE codebase. If you are interested in learning about …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: integrated data analysis pipelines, profiling big data systems
State-of-the-art machine learning models are known to be compute- and power-hungry. On the other hand, modern servers come equipped with really powerful CPU-GPU co-processors. Not all machine learning models are able to use all the available hardware resources on such servers.
Workload collocation is a mechanism to increase hardware utilization when a single workload is not able to utilize all the …
Supervisors:
Pınar Tözün
Semester: Fall 2022
Tags: benchmarking, workload collocation, machine learning
Today, there are many compute- and memory-hungry data-intensive workloads from big data analytics applications to deep learning. These workloads increasingly run on shared hardware resources, which requires building hardware resource managers that can both serve the needs of workloads and utilize hardware well. Predicting the resource utilization of applications can aid such resource managers …
Supervisors:
Pınar Tözün, Ehsan Yousefzadeh-Asl-Miandoab
Semester: Fall 2022
Tags: benchmarking, hardware resource consumption estimation, machine learning
NVMe SSDs are not a uniform class of devices. IO software stack is not uniform either. Understanding the performance characteristics of new-generation SSDs and the impact of the IO stack on their performance is crucial while determining how to design data-intensive systems. In this project, we would like to characterize the performance of a range of NVMe SSDs (e.g., Samsung Z-SSD, Intel Optane, …
Supervisors:
Pınar Tözün
Semester: Fall 2021
Tags: SSD, benchmarking
A data science infrastructure orchestrates the execution of widely used machine learning frameworks (e.g., TensorFlow , PyTorch) on a heterogeneous set of processing units (e.g., CPU, GPU, TPU, FPGA) while powering an increasingly diverse and complex range of applications (e.g., fraud detection, healthcare, virtual assistance, automatic driving). Understanding the resource consumption …
Supervisor: Pınar Tözün
Semester: Fall 2021
Tags: benchmarking, hardware resource consumption, deep learning frameworks
Hash-based authentication is an effective way of protecting passwords in software systems. Hashing obscures the original passwords, such that it cannot be recovered in case of a database breach. However, as demonstrated by our paper titled Hash-Based Authentication Revisited in the Age of High-Performance Computers, the practical security depends on which hashing algorithm is used as well as the …
Supervisors:
Pınar Tözün, Niclas Hedam
Tags: benchmarking, hashing, security, GPU, hacking, HPC
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: hardware-software co-design
Improvements in modern processor hardware does not automatically enable more complex and higher quality games, animations, and virtual reality applications. A paradigm shift is necessary when it comes to how we develop these applications in order to exploit the resources of modern hardware (i.e., main memory, multicores) effectively. Unity Technologies has recently developed Data-Oriented …
Supervisor: Pınar Tözün
Semester: Fall 2020
Tags: memory hierarchy, concurrency
SSDs are not a uniform class of devices. SSD landscape is quite diverse now with many new-gen much faster / customizable devices being available. Understand their performance characteristics is crucial while determining what their impact on data systems software stack should be. In this project, we would like to characterize the performance of a broad range of such SSDs (e.g., ZNS, Samsung Z-SSD, …
Supervisors:
Pınar Tözün, Philippe Bonnet
Semester: Fall 2020
Tags: SSD, benchmarking
Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed …
Supervisor: Pınar Tözün
Semester: Fall 2019
The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems …
Supervisor: Pınar Tözün
Semester: Fall 2019
Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at …
Supervisor: Pınar Tözün
Semester: Fall 2019
The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of …
Supervisor: Pınar Tözün
Semester: Fall 2019
The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing …
Supervisor: Pınar Tözün
Semester: Fall 2019
In this project, we would specifically like to quantify the data movement savings of applying techniques like compression and model-based data filtering in the context of resource-constrained hardware and edge/IoT applications.
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Processing the data on …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
One of the key challenges with enabling efficient machine learning on resource-constrained devices is keeping the machine learning models deployed on these devices up-to-date without frequent retraining. This requires exploring the impact of different model update mechanisms at the edge.
This project would be suitable as a standalone project or BSc or MSc thesis at ITU during Fall 2024. If you are …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
To enable efficient data processing and machine learning on resource-constrained devices has many challenges. One is fitting the models into the restrictive memory and compute resources of these devices. In this project, first, we would like to explore the landscape of foundational, generative-AI, language, etc. models with respect to their size and compute needs to understand what could be a fit …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, ML model updates, tinyML
Today many data sources are small low-powered and hardware-constrained devices such as mobile phones, wearable or self-driving smart platforms, etc. Edge computing is a broad term that refers to computations performed on such edge devices. It becomes increasingly important to enable techniques that get more value out of data at the edge rather than always sending the data to a remote and more …
Supervisors:
Pınar Tözün, Robert Bayer
Semester: Fall 2024
Tags: resource-constrained hardware, data management, resource management, tinyML
The DISCO-2 satellite will have accelerated machine learning capability based on the inclusion of a Coral TPU ML accelerator module. This will allow images taken by the satellite to be analaysed on satellite using a variety of ML models, with only select images sent back to Earth. This approach allows for more flexibility in image aquisition and saves downlink bandwidth which is very constrained …
Supervisors:
Julian Priest, Robert Bayer
Semester: Fall 2023
Tags: satellite, ground station, software defined radio, automation, csp
This is not a single project, but rather a larger cluster of potential projects in the field of what could be summarized as extreme networking.
The networks we are interested in are typically wireless, and can be extreme in different senses of the word:
distance - hundreds of kilometers terrestrial, 10,000s of km to satellite latency - sub-ms latencies autonomy - off-grid quality - extreme remote …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: network, IoT, LoRa, LoRaWAN, satellites
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: satellites, LoRa, cubesat, IoT, embedded, electronics
LoRa is a long range, low bandwith networking protocol widely used in Internet of Things projects, sensor networks, low power, low cost and embedded systems. LoRa’s encoding schema allows for extremely long distance communications with small power usage and small simple antennas. This combination of features has made it attractive to small satellite operators flying cubesats and LoRa is now …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: IoT, LoRa, LoRaWAN, satellites
Optical fiber is the backbone of the internet’s communication, e.g. in the form of submarine fiber cables. It can also be employed as a sensor device, by means of combined opto-acoustic methods such as Distributed acoustic sensing (DAS) or State of Polarisation (SoP) sensing. Fiber is cabapble of sensing all kinds of vibrational/acoustic events, from animal sounds over seismic activity to …
Supervisors:
Sebastian Büttrich
Semester: Fall 2024
Tags: fiber, acoustics, audio, machine learning, DAS, SOP
The Danish Student Cubesat Program is an inter university collaboration that will launch 3 cubesats into Low Earth Orbit over the next 4 years. The satellites will be designed, operated, programmed and built by students and the project offers an opportunity for Master’s students to take part in a live satellite project. ITU is partnering with Aarhus University on DISCOSAT2 which will be an …
Supervisors:
Sebastian Büttrich, Julian Priest
Semester: Fall 2021
Tags: Satellite, Cubesat, Image processing, Machine Learning, edge, constrained computing
Invasive bird species can be a serious problem in cities, towns and in agriculture. The common pigeon is a very unwelcome guest on many balconies, roofs, terraces. Conventional scarecrows often show no effect, as these birds are known to be quite intelligent, and capable of learning fast. The idea is to built a sensor/camera enhanced scarecrow that - can recognize birds present within its …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, ML, machineLearning, sensors, security
For this project, you would be working with a partner company who are looking to re-establish wood as a building material for sustainable architecture, and thus are using sensors for quality control - to detect damages and deterioration in buildings. Wood such as timber may be analyzed by non-intrusive acoustic impact testing and subsequent waveform analysis, and the expectation is that machine …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, sensors, machine learning, acoustics
In LoRaWAN networks such as The Things Network, long distance transmissions, well beyond the limitations of line of sight in terrestrial geometry, are frequently observed. Tropospheric effects are seen as responsible for bending or guiding radio waves around the earth curvature. As an example, under the right weather conditions, the LoRaWAN gateway at ITU may collect packets from northern Germany, …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, troposphere, weather
Recent progress in LoRaWAN development has made a new generation of satellite communications offerings available to IoT devices. In these, the LoRaWAN gateway is satellite born, and collecting data from small inexpensive ground stations. So far, this is predominantly seen as a means of communciation for remote sensor data, e.g. in agriculture, logistics or wildlife monitoring. However, one can …
Supervisors:
Sebastian Büttrich
Semester: Fall 2021
Tags: IoT, LoRaWAN, LPWAN, satellite, networks, edge, security
There is currently a lot of progress in really small, yet powerful visual machine learning / computer vision, on hardware like the OpenMV Cam H7, Arduino Portenta Vision Shield, Luxonis LUX-ESP32, Himax WE-I Plus, Arducam Pico4ML, and Raspberry Pi, and on software platforms such as TinyML or OpenMV IDE.
While many popular use cases stem from fields like traffic analysis, wildlife monitoring, we …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: IoT, sensors, machine learning, computer vision
The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understanding of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines) Starting with sound: Piezo contact mics/transducers, MEMS sensors Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work Starting with consumer USB cams generate series of images or phone cams Characterization of state based on known signatures (classification problem) Characterization of state transitions (HMM) Experimentation on Coffee machine/Blender/3D printer at PitLab 1..k sensors; Local/cloud-based processing. …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Novo Nordisk
Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, data publication
Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind …
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Data Analysis
The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Forecasting, Machine Learning, Deep Learning
The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.
Supervisors:
Philippe Bonnet, Sebastian Büttrich
Semester: Fall 2019
Tags: Wind Energy, Energinet, Instrumentation, Sensors
Deliberately scoped very wide, this group contains a number of projects in different possible directions, from
Location services via LPWAN time-of-flight and GPS/GNSS, Vessel tracking and management in fisheries, tourism and logistcs Water quality anc chemistry sensing for Aquaculure, specifically Mariculture, Wave and tidal dynamics, e.g. in energy harvesting
and variations/combinations of …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, Image processing, Machine Learning, edge, constrained computing, IoT, sensors, location
swarm.space is a commercial company providing low-bandwidth satellite connectivity using ultra-small (quarter cubesat unit) satellites in a low orbit. Swarm satellites cover every point on Earth, enabling IoT devices to affordably operate in any location. Swarm uses a form of LoRa network.
While ground terminals for satellite networks traditionally were both big and expensive, modems and antennas …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
LoRa is a relatively new long-range low-power standard allowing for battery powered pocketsize nodes to transmit over 100s to 1000s of kilometers. ThingSat is a CubeSat communication payload constituted of an electronic board of several LoRa transmitters and a patch antenna operating in (868MHz, 2.4GHz). It is a guest payload of a shared 3U CubeSat.
Available projects under this platform include …
Supervisors:
Sebastian Büttrich
Semester: Fall 2022
Tags: Satellite, IoT
Observing how well machine learning systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: benchmarking, data management, data visualization
Deep convolutional networks are able to learn representation of images, scoring well in tasks such as image classification and object detection. During model training, these networks have the ability to process different input sizes without requiring changes to their architecture. In this project, we would like to investigate the effects that changing input sizes has on these kinds of models. We …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data attribution, deep learning, machine learning, resource efficiency
Today’s foundation models are trained on vast amounts of data. The quality and size of this data has a huge impact on the accuracy of these models. Selecting the right amount and variety of data for a given task, however, is a resource-intensive process. In this project, we would like to investigate various state-of-the-art data selection mechanisms from a hardware requirements and …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2024
Tags: data selection, deep learning, machine learning, resource efficiency
Observing how well data-intensive systems utilize hardware resources is a crucial preliminary step to improve system performance and reduce hardware waste. To do such observations, one has to collect a lot of monitoring data on hardware behavior through experiments. In our group, we have recently built a framework to aid the management of such monitoring data efficiently, called Resource-Aware …
Supervisors:
Pınar Tözün, Ties Robroek
Semester: Fall 2023
Tags: benchmarking, data management, data visualization
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
It has been observed that deep learning models are able to identify patient characteristics such as age, sex, and self-reported race with high accuracy from medical images such as chest x-ray recordings, even when medical doctors cannot. This raises the potential for such models to learn to (falsely) diagnose patients of different demographics differently, even if they present with the same …
Supervisors:
Amelia Jiménez-Sánchez, Eike Petersen, Veronika Cheplygina
Semester: Fall 2024
Tags: machine learning, data science, medical imaging
There is pressure on hospitals to implement AI systems which promise to improve diagnoses and save time for the doctors. One use-case could be related to the automation of protocoling based on a physician referral. Currently, this requires a referral letter from a physician who has examined a patient and evaluates that there is a need for additional imaging studies. In this case, the physician …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis
Machine learning models, especially larger models that are used in for example image or text datasets, can be expensive to train. During development models are usually trained multiple times for example to optimize hyperparameters, which can result in a large carbon footprint.
This project specifically focuses specifically on medical data. There are some recent efforts, for example by Selvan et …
Supervisors:
Veronika Cheplygina
Semester: Fall 2023
Tags: machine learning, medical imaging, data analysis, resource consumption
Machine learning is used extensively in different applications, including medical imaging and natural language processing. As different types of data are involved, it is reasonable to assume that different methods are needed for each application. However, there are also opportunities in translating a method successful in one application, to the other application where it is not widely used.
The …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, natural language processing, medical imaging, literature review
Machine learning algorithms for skin lesion classification typically learn from images which have been labeled as malignant (for example, melanoma) or not. Such tasks can still suffer from overfitting due to limited dataset size. In other computer vision tasks, crowdsourcing labels has been effective, but the average person typically does not have the background to classify skin lesions. However, …
Supervisors:
Veronika Cheplygina
Semester: Fall 2021
Tags: machine learning, medical imaging, crowdsourcing, similarity
Spectral learning priority is a useful tool in analyzing a model’s focus during training, it describes how a model may understand a given image from the spectrum perspective. For example, to distinguish cats and tortoises, learning to recognize their shapes would be enough, such embedding will result in higher learning priority at low frequencies representing shapes; while learning to …
Supervisors:
Yucheng Lu, Veronika Cheplygina
Semester: Fall 2024
Tags: Spectral analysis, Image classification, Medical imaging
The DISCO-2 project is driven by students and aims to develop and deploy a 3-unit CubeSat into low Earth orbit. Its mission focuses on conducting Earth observations over Greenland and supporting various research objectives. The satellite has three cameras onboard: infrared, wide-angle, and standard (main camera). Due to the limitations of the imaging hardware and the challenging conditions on the …
Supervisors:
Yucheng Lu, Julian Priest
Semester: Fall 2024
Tags: Image enhancement, Image processing, Machine learning
Are you interested in working with a big data open source project?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. For a general overview …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Knowledge graphs (KGs) are extensively used in many application domains, such as search engines, product recommendation, and bioinformatics. Knowledge graph completion (a.k.a.~link prediction), i.e.,~the task of inferring missing information from knowledge graphs, is a widely used task in the above applications. This project will investigate how to loosely-couple the data-driven power of knowledge …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: knowledge graph, LLMs, reasoning
Are you interested in working with a big data open source project and help the environment?
You are welcome to conduct your thesis/project in Apache Wayang. Apache Wayang is the first cross-platform framework that allows users to specify their task/query in a system-agnostic manner and Wayang will determine which is the best system(s) to execute this task with the goal of optimizing performance. …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: big data, database, cross-platform data processing, open source, Apache
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model that estimates the runtime of a plan. However, to build such a model …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, training data, query optimizer
Query optimization is crucial for any data management system to achieve good performance. Recent advancements in AI have led academia and industry to investigate learning-based techniques in query optimization. In particular, many works propose replacing the cost model used during plan enumeration with a machine learning model (typically a regression model) that estimates the runtime of a query …
Supervisors:
Zoi Kaoudi
Semester: Fall 2024
Tags: machine learning, database, query optimization, ranking
(This project will be carried out in collaboration with Xilinx Research Labs in Dublin)
Machine Learning operators are becoming increasingly commonly used in data management systems and, in this project, we will explore the challenges and benefits of integrating inference operators from FINN [1] within a so-called Smart Storage system [2]. Both the inference and data management aspects will be …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Data Management, MachineLearning
(This topic is going to be co-supervised by Bernardo Machado David [http://www.bmdavid.com/])
Database systems managing private data may leak sensitive information when queries are done in the clear, even if the data itself is encrypted. A recent line of research has looked into combining database engines supporting standard SQL queries with techniques for secure Multiparty Computation (MPC), …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Consensus mechanisms for ensuring consistency are some of the most expensive operations in managing large amounts of data. Often, there is a trade off that involves reducing the coordination overhead at the price of accepting possible data loss or inconsistencies. As the demand for more efficient data centers increases, it is important to provide better ways of ensuring consistency without …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Benchmarking, Distributed Systems
Given a private database that I can access only through specific queries, there is still a lot I can learn on its entries [1]. Differential Privacy (DP) tackles this: letting me learn the (approximate) result of complex queries on a database, but preventing me from learning much about its specific entries. The basic approach of DP often boils down to: “apply a privacy-preserving transformation T …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Theoretical Computer Science, Data Management, Security and Privacy
Blockchains are often used synonymously with crypto-currencies and unspent transaction output (UTXO) data models, but there are emerging blockchain platforms that offer a more general data model and smart contracts that can manipulate this data freely (e.g. Hyperledger Fabric [1]). As such, these platforms resemble in many ways distributed databases, storing a collection of records, organized as …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: Blockchain, Data Management, Benchmarking
Modern data analytics systems are composed of two types of nodes: compute and storage (e.g., Amazon S3, Redis, MongoDB, etc.). The storage nodes typically offer a key-value interface and are often used to store data encoded in a columnar format (e.g., Parquet files). Due to growing data sizes in datacenters, there is an increasing interest in using specialized hardware devices, namely Field …
Supervisors:
Zsolt István
Semester: Spring 2021
Tags: FPGA, Hardware-software Co-design, Security and Privacy