Current Project Topics (updated August 2019)

Projects with …

Philippe Bonnet

Projects with our partner Energinet [PB, SB]

1. Wind turbine electricity production analysis

Energinet has a model that describes the electricity production of a given wind turbine given wind conditions. The current model based on kNN is trained with DMI weather data and historical electricity production data for the wind turbine. The goal of the project is to improve the current model with lifelong learning, extended weather data and different models for a range of different wind turbines. Testing models in reverse, it should also be possible to derive wind conditions from a turbine’s electricity production.

2. Wind turbine electricity production prediction

The goal of the project is to explore the accuracy of electricity production predictions based on historical data and weather predictions. This may be tackled as a sequence prediction problem using recurrent neural networks The long term goal is to incorporate wind turbines in the reserve market for electricity.

3. Wind turbine non-invasive instrumentation

The goal of the project is to explore new ways of gathering data about wind turbines as well as local wind/weather conditions. To this end, sound/vibration-based and/or image-based instrumentation as well as innovative experiments such as balloons and light weight weather stations might be considered.

4. Wind turbine data publication

Wind turbine electricity production data is sensitive for Energinet (and for the wind turbine producers). Energinet would like to publish wind turbine electricity production data sets that can be used to train relevant models and to develop innovative applications, without giving away sensitive data. The goal of the project is to explore various data publishing methods for that purpose.

Projects with our partner Novo Nordisk [PB, SB]

A / Sound-based Predictive Maintenance

Survey of potential sensor modalities (sounds, ultrasounds, vibrations) and related work (e.g., wind turbines)
- Starting with sound: Piezo contact mics/transducers, MEMS sensors
Characterization of state based on known signatures (classification problem)
Characterization of state transitions (HMM)
Experimentation on Coffee machine/Blender/3D printer at PitLab
- 1..k sensors; Local/cloud-based processing.
Experimentation on roller compacter/tableting machines at NN

B / Temperature-based Predictive Maintenance

Survey of potential sensor modalities (IR temp sensor, thermal imager) and related work
- Starting with consumer USB cams generate series of images or phone cams
Characterization of state based on known signatures (classification problem)
Characterization of state transitions (HMM)
Experimentation on Coffee machine/Blender/3D printer at PitLab
- 1..k sensors; Local/cloud-based processing.
Experimentation on roller compacter/tableting machines at NN

Computational Storage [PB]

Offloading processing to storage is a means to avoid data movement and thus deal efficiently with very large volumes of stored data. In the 90s, there were pioneering efforts to develop Processing-in-Memory as well as Active Disks. We are considering data stored on Open-Channel SSDs with a programmable storage controller (i.e., a Linux-based ARM processor) integrated into a network switch (e.g., Broadcom Stingray or NXP LS2). Topics for thesis include (1) the design/implementation and evaluation of a prototype key-value store running on the storage controller, (2) the design/implementation/evaluation of a 100GB Ethernet-based RPC connection between host and storage controller, and (3) the development of a new recovery scheme for a user-space Flash Translation Layer embedded on the storage controller.

Database Performance Characteristics [PB]

Characterize the performance of commercial database systems on an NVIDIA Titan GPU, or Characterize the performance of DB2 PureScale on a cluster equipped with shared storage with a range of different benchmarks. Design and conduct experiment with a range of tuning strategies to measure their impact on performance and reliability.

Storage System Performance Characteristics [PB]

New forms of Solid State Drives have interesting characteristics in terms of performance (10 to 100x faster than previous generations of SSDs) and in terms of functionalities (SSDs can now suspend the execution of writes or erase operations to minimize read latency). The performance characteristics of these devices is not well understood yet. The topic of this thesis is to design and conduct experiments to fully characterize the performance of such SSDs.

FPGA-based Hardware Acceleration [PB]

Field Programmable Gate Arrays are now an integral part of public cloud infrastructures. You can for example run customized FPGA instances on AWS. A project focuses on FPGA-based hardware acceleration at the level of an SSD Flash Translation Layer, at the level of a Database storage manager or at the level of the database client. You will be able to experiment with FPGAs in the lab and on AWS.

Decentralized Cloud Management [PB]

In the context of the Orkney Cloud project, we are preparing the deployment of a decentralized cloud infrastructure on the archipelago. The infrastructure is composed of a collection of Pods (point of delivery) and a wireless core (5G + Wifi). Each Pod is equipped with storage, computing and communication components (so that it is connected to the core and to local endpoints). Each Pod is powered directly on a renewable source, via a power conditioner, and is thus equipped with batteries. It is necessary to assume that such Pods are intermittently powered and connected to the core. This thesis focuses on the design and implementation of prototype Pods in the lab, followed by a deployment in Orkney. We also consider Pods as gateways to sensor nodes. Thesis around this topic include projects on time-sensitive networking and projects on mobile gateways connecting to Pods.

Mobile Air Quality Monitoring [SB, PB]

New types of networks such as LoRaWAN and Sigfox enable us to deploy inexpensive mobile Air Quality Sensors, e.g. on bicycles or vehicles, boats or trains, and help map and understand urban pollution. Such data would be valuable for e.g. the international citizen science project Luftdaten https://luftdaten.info/en/home-en/ in which we are participating. We have an ongoing collaboration with the Asian Instite of Technology, Bangkok, experience in collaborationg with the Institute for Environmental Science, and a concrete request from a municipality here in Denmark.

A GUI for The Things Network LoRaWAN stack v3 [SB]

The Things Network Stack v3 for LoRaWAN is an open source LoRaWAN network stack suitable for large, global and geo-distributed public and private networks as well as smaller networks. The architecture follows the LoRaWAN Network Reference Model for standards compliancy and interoperability. - https://github.com/TheThingsNetwork/lorawan-stack This stack, currently in pre-rollout testing, however has no Web GUI yet, and is administered solely via a CLI. This project aims at building a user-friendly, inclusive and innovative web GUI, with hooks to new informative features such as node and gateway statistics, location services and others. An example of such additonal features are the gateway stats made available by https://ttngw.rexfue.de/Copenhagen. This project is largely a full stack web development project, ambitious in that it combines the need for understanding UI development with knowledge of basic Linux, docker, databases, CLI features and LoRaWAN architecture.

The TrekkTracker [SB]

In collaboration with the IoT Lab at Computer Science Dept at Kathmandu University, Nepal, we are developing a potential service for tracking trekkers, i.e. offering a security service for tourists trekking the Himalayas, in particular Mt. Everest. This service very critically depends on having a robust hardware component, the actual GPS/GNSS tracker. Requirements with respect to battery life, temperature, physical environment are extremely challenging, and need to be tackled in a combination of hardware and software optimization. The final product would be a miniature location sensor with additional environmental data sensors and explicit interaction features, such as simple buttons for confirming status. This project combines hardware engineering with embedded systems aspects.

The ms IoT gateway [PB, SB]

Most data collection in IoT does not critically depend on latency or speed from data collection to data analytics. Occasionally though we meet tasks that would benefit from near-realtime features, such as collection of wave and tidal dynamics around marine energy infrastructures. This project explores the limits of speed by bringing together a LoRa PHY, a LoRaWAN gateway, LoRaWAN stack, ultrafast storage and near data processing in one physical unit. This project combines hardware, software and networks aspects of embedded computing.

Small Smart City [SB, PB]

Not far from the IT University, citizens are buillding a “Small Smart City”, aiming to measure and analyze e.g. water and resource consumption. Help build the Small Smart City!

Satellite Data [SB, PB]

Can we combine satellite data with terrestrial and marine sensor data, to study their correlation, and benefit our understading of environmental processes, both in urban and agricultural/rural context? Potential collaboration with several organizations in Kenya and Orkney islands.

Pinar Tözün

Leveraging Heterogeneous Hardware [PT]

The computer architecture community is moving toward commoditization of hardware specialization instead of general purpose CPUs and more agile hardware development instead of years-long production cycles to enable faster, more energy-efficient, and more cost-effective hardware/software co-designs. This will lead to a disruption in the way we design and maintain the emerging data management systems as well. As the heterogeneity of hardware resources increase, it becomes essential for the data management systems to decide on the optimal design options based on the processor types they are running on. This project targets identifying the granularity of data management tasks for different workloads that can be offloaded from a general purpose CPU to specialized hardware (e.g., GPU, FPGA) or low-power cores (e.g., ARM), and figuring out how to perform this offloading efficiently. This would be split into several sub-projects targeting specific workloads and hardware types.

What is HTAP? [PT]

The popularity of large-scale real-time analytics applications (real-time inventory/pricing, recommendations from mobile apps, fraud detection, risk analysis, IoT, etc.) keeps rising. These applications require distributed data management systems that can handle fast concurrent transactions (OLTP) and analytics on the recent data. Some of them even need running analytical queries (OLAP) as part of transactions. Efficient processing of individual transactional and analytical requests, however, leads to different optimizations and architectural decisions while building a data management system. For the kind of data processing that requires both analytics and transactions, Gartner recently coined the term Hybrid Transactional/Analytical Processing (HTAP). Many HTAP solutions are emerging both from the industry as well as academia that target these new applications. However, there is no standard set of capabilities all of these systems support. The goal of this project is to understand the HTAP landscape and develop a benchmark suite that would be representative of the different set of use cases that fall under the HTAP umbrella.

Workload Characterization for Big Data Management [PT]

The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing traditional analytical processing), are not suitable to explain the behavior of the emerging big data applications (with heavy ingest rates and complex analytical queries involving machine learning) on modern hardware. While the behavior of TPC-C and TPC-H on commodity multicore servers are heavily studied, the behavior of the newer benchmarks are still a mystery to many people in the database community. A comprehensive workload characterization of these new benchmarks is crucial to understand in more detail how they differ from the older benchmarks and what they require from the data management systems and hardware.

Micro-architectural Analysis of SystemML [PT]

Apache SystemML is an open-source platform to run machine learning tasks efficiently thanks to the hardware-conscious query compilation techniques it adopts. It can be run standalone or on top of Apache Spark. It is considered to be state-of-the-art when running machine learning tasks (i.e., in ACM SIGMOD 2017, there were ~5 papers that used SystemML as a comparison point). This project aims at understanding how efficiently SystemML utilizes the resources of commodity server hardware, and how this differs from some other widely used systems used to run machine learning (e.g., Apache Spark MLlib).

Efficient OS-level Context-Switching for Thread Migration [PT]

Spreading the computation of similar concurrent tasks that have a large instruction footprint over multiple cores via thread migration is shown to improve the instruction cache utilization drastically since it allows instruction re-use across the concurrent tasks. However, thread migrations are costly due to the context switching overhead. To reduce this overhead, recent work mainly proposed techniques at the hardware-level. However, developing techniques at the level of the OS might be more effective in terms of the adoption of such thread migration mechanisms. The goal of this project is to investigate how to implement lightweight thread migration at the OS-level targeting data management workloads.

Iman Elghandour

Predicting Execution times Queries Executed on Accelerated Distributed Platforms

In the last few years, it became common to accelerate Hadoop and Spark by enabling them to execute tasks and jobs on accelerators such as GPUs and FPGAs. The objective of this master thesis is to study new approaches that efficiently predicts the execution time of Spark tasks and jobs executed on GPUs. Part of the work will be to build a performance prediction model for GPUs, which can be built using machine learning techniques among others.

Deliverables of the master thesis project

An overview of Spark applications and how they are divided into tasks.
A study of the architecture of GPUs and the main factors that affects the performance of code executed using them.
An implementation of performance prediction model for Spark tasks executed on GPUs.
An experimental validation of the developed model(s).

Extending Spark Scheduler for Heterogeneous Clusters

Spark assumes that it executes its applications on a homogeneous cluster of similar nodes. However, it is becoming common that in-house clusters have heterogeneous compute re- sources and it is good to exploit all of them in the most efficient way. The objective of this master thesis is to extend the Spark scheduler to be resources- aware and to efficiently schedule Spark tasks on all the available resources in the cluster.

Deliverables of the master thesis project

An overview of Spark applications and how they are divided into tasks.
A study of Spark scheduler.
An implementation to extend the Spark scheduler to account for heterogeneous com- puting resources in the Spark cluster.
An experimental validation of the developed scheduler.

Multi-query Optimization in Spark

Distributed computing platforms such as Hadoop and Spark focus on addressing the fol- lowing challenges in large systems: (1) latency, (2) scalability, and (3) fault tolerance. Dedicating computing resources for each application executed by Spark can lead to a waste of resources. Unified distributed file systems such as Alluxio has provided a platform for computing results among simultaneously running applications. However, it is up to the developers to decide on what to share. The objective of this master thesis is to optimize various applications running on a Spark platform, optimize their execution plans by autonomously finding sharing opportuni- ties, namely finding the RDDs that can be shared among these applications, and computing these shared plans once instead of multiple times for each query.

Deliverables of the master thesis project

An overview of the Apache Spark architecture.
Develop a performance model for queries executed by Spark.
An implementation that optimizes queries executed by Spark and identify sharing opportunities.
An experimental validation of the developed system.

Accelerated Distributed Platform for Spatial Queries

It is now common to query terabytes of spatial data. Several new frameworks extend distributed computing platforms such as Hadoop and Spark to enable them to efficiently process spatial queries by providing (1) mechanisms to efficiently store spatial data and index them ; and (2) packages of built in spatial operations for these platforms. Meanwhile, it is now common to accelerate Hadoop and Spark using accelerators such as GPUs and FPGAs. The objective of this master thesis is to build a framework that efficiently executes spa- tial queries on a an extended implementation of Spark that is enabled to run its tasks on GPUs.

Deliverables of the master thesis project

An overview of Spatial queries and frameworks for processing big spatial data.
A study of best approaches to represent spatial data while it is queried by Spark and GPUs.
An implementation of common spatial operations and computational geometry algo- rithm on GPUs and Spark.
An experimental validation of the developed system.

Björn Thór Jónsson

Media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for not only finding content in those collections, but also gaining insights into the collections and analyzing them. Below are some projects related to a unique large-scale media exploration system we are working on, called Exquisitor.

Competitor for the Video Browser Showdown

The goal of this project is to integrate Exquisitor with other pieces of existing technology and turn into a competitor for a live video retrieval competition. The project is suitable for 3-4 well-qualified MSc students.

The Video Browser Showdown (VBS) is a live competition for video search andretrieval, held at the International Conference on Multimedia Modeling (MMM). In VBS, the competition tasks involve finding videos from a collection of 1000 hours of videos. The tasks are either based on a textual or a visual description, and the competing systems are judged based on speed, accuracy and recall, depending on the task. The goal is to develop a new competitor for VBS, based on existing pieces of technology developed at ITU and UvA. These are: * The Exquisitor image browser (see figure), developed at ITU and UvA, which usesrelevance feedback and high-dimensional indexing to rapidly find relevant images [1]. * The O3 media server, and the corresponding P3 photo browser, developed at ITU, whichuse a novel data model to filter media and present contents [2]. * A video engine, developed at UvA, to process, index and search for videos based onvisual content.

The research question to answer is: how well do the video exploration concepts of these ITU/UvA tools address the workloads of VBS compared to existing tools? The task of the student group will be to integrate (and extend) the components into a single system, develop the communication with the VBS competition system, test the entire software stack with real users using competition data and workloads from VBS 2018, and prepare for VBS 2019. The project is suitable for 3-4 well-qualified MSc students. Participation in VBS includes an international conference paper. And the presentation of the MSc project should be exceptionally visual and interesting! References * [1]Jan Zahálka, Stevan Rudinac, Björn Þór Jónsson, Dennis C. Koelma, Marcel Worring. Blackthorn:Large-Scale Interactive Multimodal Learning. IEEE Transactions on Multimedia (TMM), 20(3), March2018. * [2]Snorri Gíslason, Björn Þór Jónsson, Laurent Amsaleg. Integration of Exploration and Search: A CaseStudy of the M3 Model. Proceedings of the International Conference on Multimedia Modeling (MMM),Thessaloniki, Greece, January 2019.

Diversity of Relevance Feedback

The goal of this project is ensure diversity in the relevance feedback results, to improve quality of the user experience.

The project is suitable for 1-3 well-qualified MSc students.

In many creative tasks, the designer will knowsome stock image is good for a design just stumbling upon the image. This “Aha!” moment requires browsing thousands of images by categories. In other words, it requires labeled and categorized images. Labeling hundreds of millions of images, such as the 99.2M images ofthe YFCC100M, is a daunting task, even whenusing some sort of crowdsourcing.An alternate approach consists of devising a mechanism to quickly explore the collection, presenting potentially relevant images to users who can label them as either relevant or irrelevant. Using the labeled examples, the system then refines the image selection presented to the user; this feedback loop continuesuntil the user is satisfied. An example of such a relevance feedback system is the Exquisitor system picturedabove (based on [1]). The choice of images to present to the user is a difficult problem, especially for the images that the system believes may interest the user. Finding images similar to a query in a collection is a well-understood problem, well served with simple similarity-based approaches. The challenge resides in finding at the sametime a diverse subset, something that can be thought of as selecting one of each in a collection. In otherwords, the challenge is to ensure diversity as well as proximity. Central to our proposal is the Half-Space Proximity (HSP) graph [2]. This is a sparse subgraph of the complete graph, where each node (that we call the center) in the HSP is connected to a natural number of similar neighbors (the spike of neighbors). Every node in the spike acts as a proxy of a direction. There is one spike for each object, and computing each spike is linear using a naïve algorithm, hence it has quadratic complexity for the entire collection. There are several challenges in this project, among them is computing efficiently the HSP, computing the spike of a query and assembling a prototype image browser with the described exploring mechanisms.The project is suitable for 2-3 well-qualified MSc students. The intention is to publish the results ininternational research venues, both as a conference paper and a journal paper. And the presentation of the MSc project should be exceptionally visual and interesting!

References * [1]Jan Zahálka, Stevan Rudinac, Björn Þór Jónsson, Dennis C. Koelma, Marcel Worring. Blackthorn:Large-Scale Interactive Multimodal Learning. IEEE Transactions on Multimedia (TMM), 20(3), March 2018. * [2]Chavez E. et al. (2006) Half-Space Proximal: A New Local Test for Extracting a Bounded Dilation Spanner ofa Unit Disk Graph. In: Anderson J.H., Prencipe G., Wattenhofer R. (eds) Principles of Distributed Systems.OPODIS 2005. Lecture Notes in Computer Science, vol 3974. Springer, Berlin, Heidelberg. * [3]Gylfi Þór Guðmundsson, Björn Þór Jónsson, Laurent Amsaleg. A Large-Scale Performance Study ofCluster-Based High-Dimensional Indexing. Proceedings of the Workshop on Very-Large-Scale MultimediaCorpus, Mining and Retrieval, Firenze, Italy, October 2010.

Mobile Version of Exquisitor

The goal of this project is build a prototype of the Exquisitor system for mobile devices.

The project is suitable for 1-3 well-qualified MSc students.

Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1].Based on past research results [2], we have built a prototype of such an exploration tool, called Exquisitor. The goal of this project is to implement a version of Exquisitor that can run on mobile phones.The project is suitable for 1-3 well-qualified MSc students. The intention is to demonstrate the new prototype in an international research conference. And the presentation of the MSc projectshould be quite interactive! References * [1]Björn Þór Jónsson, Marcel Worring, Jan Zahálka, Stevan Rudinac, Laurent Amsaleg.Ten Research Questions for Scalable Multimedia Analytics. Proceedings of the International Conference on Multimedia Modeling (MMM), Miami, FL, USA, January 2016. * [2]Jan Zahálka, Stevan Rudinac, Björn Þór Jónsson, Dennis C. Koelma, Marcel Worring. Blackthorn: Large-Scale Interactive Multimodal Learning. IEEE Transactions on Multimedia (TMM), 20(3), March2018.

Exploring Image Collections via Eye Tracking

The goal of this project is to use state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for Exquisitor.

The project is suitable for 1-3 well-qualified MSc students.

Image and media collections are becoming a central information resource for a growing number of domains. This calls for very effective tools for interactive exploration of the contents of those collections [1]. Since humans typically look at the targets of interest before making a selection, the eyes can become a very efficient means browsing media collections (e.g. to make selections and inform about user interest) [2]. We propose projects that intend to investigate eyetracking for efficient exploration of media using Exquisitor, a very scalable image browser based on relevance feedback [3]. The goal of the project is to use the state of the art in eye tracking to design, implement and evaluate different eye-tracking interfaces for the Exquisitor system. The project is suitable for 1-3 well-qualified MSc students. The intention is to publish the results, and demonstrate the system, in international research conferences. And the presentation of the MSc project should be quite interactive!References * [1]Björn Þór Jónsson, Marcel Worring, Jan Zahálka, Stevan Rudinac, Laurent Amsaleg. Ten ResearchQuestions for Scalable Multimedia Analytics. Proceedings of the International Conference onMultimedia Modeling (MMM), Miami, FL, USA, January 2016. * [2]Dan Witzner Hansen, Qiang Ji. In the Eye of the Beholder: A Survey of Models for Eyes and Gaze.IEEE Transactions on Pattern Recognition and Machine Intelligence (TPAMI), 32(3), March 2010.[3]Jan Zahálka, Stevan Rudinac, Björn Þór Jónsson, Dennis C. Koelma, Marcel Worring. Blackthorn: Large-Scale Interactive Multimodal Learning. IEEE Transactions on Multimedia (TMM), 20(3), March2018.