Research -- DASYA

We build the infrastructure of data science: scalable and efficient systems and applications supporting the data lifecycle, including collection, transfer, storage, processing, analytics and curation.

Data science enables forecasts, possibly in real-time, at ever lower cost and better accuracy for the benefit of our society. Today, data scientists are able to collect more data, access that data faster, and apply more complex data analysis than ever. These advances are mainly due to radical hardware evolution combined with a diversification of data systems and the emergence of analytics frameworks, boosting the productivity of data scientists.

Research in our group covers the following topics:

Computational Storage

Twenty years ago, Jim Gray wrote: Put Everything in Future Disk Controllers (it’s not “if”, it’s “when”). His argument was that running application code on disk controllers would be (a) possible because disks would be equipped with powerful processors and connected to networks via high-level protocols, and (b) necessary to minimize data movement. He concluded that there would be a need for a programming environment for disk controllers. Computational storage makes it possible to define a new storage interface directly on the storage controller or on a processing unit (FPGA or SoC) introduced on the data path.
Read more...

Data Management and Processing on Specialized Hardware

In the cloud today, the quantity of information that needs to be stored and processed is growing faster than the performance of general purpose processors. For decades, Moore’s Law has ensured that CPUs could keep up, but today it is unlikely that this trend can be maintained. To overcome this challenge, we investigate how software interacts with the underlying server hardware and explore ways in which we could tailor the latter to the application’s needs.
Read more...

Edge-Based AI

IoT systems deployed for predictive maintenance, asset management or virtual power plants collect and process large amounts of data locally. This data is used by “prescriptive” models implemented in micro data centers at the edge of the network. These models are adapted to changing local conditions, maintained with the least possible interference from operators and optimized for clear objectives agreed geographically or per sector. We research the methods and tools for developing AI-based solutins on edge-based software platforms.
Read more...

Multimedia Analytics

Multimedia analytics is a new and exciting research area that combines techniques from multimedia analysis, visual analytics, and data management, with a focus on creating systems for analysing large-scale multimedia collections. The size and complexity of media collections is ever increasing, as is the desire to harvest useful information from these collections, with expected impacts ranging from the advancement of science to increased company profits. Indeed, multimedia analytics sees potential applications in diverse fields, including data journalism, urban computing, lifelogging, digital heritage, healthcare, digital forensics, natural sciences, and social media.
Read more...

Pattern Recognition for Health

The focus of machine learning research on larger datasets, novelty and state-of-the-art results has lead to a lot of progress, but also has negative consequences such as propagating bias, a huge carbon footprint and de-democratization. We instead aim to recognize patterns within, and between problems with few examples, in particular related to the health domain. This includes: understanding similarity and diversity of datasets methods for learning with limited labeled data, such as transfer learning meta-research on machine learning in medical imaging People involved: Veronika Cheplygina, Bethany Chamberlain, Dovile Juodelyte, Ralf Raumanns (guest TU Eindhoven)
Read more...

Resource-Aware Machine Learning and Data-Intensive Systems

The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or high-performance computing (HPC) platforms. These hardware resources are also diverse today ranging from general-purpose CPUs and GPUs to programmable FPGAs and specialized machine learning hardware like TPUs.
Read more...

Trusted IoT DEVICES

Federated learning has been gaining adoption in mobile and cloud computing as a mechanism for decentralizing both computation and trust when building machine learning models based on user data. Proposals for using federated learning in IoT devices have been emerging but, even though they can be used to gain valuable insights faster and more efficiently from IoT devices, their adoption is lackluster. We argue that this slow adoption is due to two challenges: first, the federated learning methods are designed for cloud server and mobile device type processors, which are significantly more performant than usual IoT devices and, second, the level of trust we can have in code executed on IoT devices is significantly lower than in mobile devices or cloud servers.
Read more...