Resource-Aware Machine Learning and Data-Intensive Systems

The variety and complexity of data-intensive applications and systems have been increasing drastically the past decade. Tasks from a SQL-based big data analytics request running on Apache Spark can be very different from tasks from deep learning training using TensorFlow framework. Nevertheless, these data-intensive applications increasingly run on shared hardware resources in data centers or high-performance computing (HPC) platforms. These hardware resources are also diverse today ranging from general-purpose CPUs and GPUs to programmable FPGAs and specialized machine learning hardware like TPUs. There is a pressing need for a more resource-aware infrastructure that orchestrates the different data-intensive tasks over the heterogeneous processing units effectively. In order to achieve this, the goal is to first investigate the resource consumption characteristics of different data-intensive workloads, and then to establish and implement guidelines for hardware resource management for data-intensive systems.