We build the infratructure of data science: scalable and efficient systems and applications supporting the data lifecycle, including collection, transfer, storage, processing, analytics and curation.
Data science enables forecasts, possibly in real-time, at ever lower cost and better accuracy for the benefit of our society. Today, data scientists are able to collect more data, access that data faster, and apply more complex data analysis than ever. These advances are mainly due to radical hardware evolution combined with a diversification of data systems and the emergence of analytics frameworks, boosting the productivity of data scientists.
Research in our group covers the following topics:
Twenty years ago, Jim Gray wrote: Put Everything in Future Disk Controllers (it’s not “if”, it’s “when”). His argument was that running application code on disk controllers would be (a) possible because disks would be equipped with powerful processors and connected to networks via high-level protocols, and (b) necessary to minimize data movement. He concluded that there would be a need for a programming environment for disk controllers.
Computational storage makes it possible to define a new storage interface directly on the storage controller or on a processing unit (FPGA or SoC) introduced on the data path.
IoT systems deployed for predictive maintenance, asset management or virtual power plants collect and process large amounts of data locally. This data is used by “prescriptive” models implemented in micro data centers at the edge of the network. These models are adapted to changing local conditions, maintained with the least possible interference from operators and optimized for clear objectives agreed geographically or per sector.
We research the methods and tools for developing AI-based solutins on edge-based software platforms.
Multimedia analytics is a new and exciting research area that combines techniques from multimedia analysis, visual analytics, and data management, with a focus on creating systems for analysing large-scale multimedia collections.
The size and complexity of media collections is ever increasing, as is the desire to harvest useful information from these collections, with expected impacts ranging from the advancement of science to increased company profits. Indeed, multimedia analytics sees potential applications in diverse fields, including data journalism, urban computing, lifelogging, digital heritage, healthcare, digital forensics, natural sciences, and social media.
The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or high-performance computing (HPC) platforms. These hardware resources are also diverse today ranging from general-purpose CPUs and GPUs to programmable FPGAs and specialized machine learning hardware like TPUs.