Workload Characterization for Big Data Management

Supervisor: Pınar Tözün
Semester: Fall 2019

The Transaction Processing Performance Council (TPC) is a non-profit IT organization founded to define database benchmarks and disseminate objective, verifiable performance data to the industry. TPC has standardized several new benchmarks (e.g., TPCx-HS and TPCx-BB), in recent years. Older popular benchmarks, like TPC-C (representing high-performance transaction processing) and TPC-H (representing traditional analytical processing), are not suitable to explain the behavior of the emerging big data applications (with heavy ingest rates and complex analytical queries involving machine learning) on modern hardware. While the behavior of TPC-C and TPC-H on commodity multicore servers are heavily studied, the behavior of the newer benchmarks are still a mystery to many people in the database community. A comprehensive workload characterization of these new benchmarks is crucial to understand in more detail how they differ from the older benchmarks and what they require from the data management systems and hardware.