PROPOSAL

Deconstructed Cloud Databases, Part I: Atomic Counters


Supervisors: Martin Hentschel
Semester: Fall 2024
Tags: data management, performance, benchmarking, hacking

The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, reduces the cost of software development, and lowers the risk of errors by minimizing the number of components that need to be managed.

We hypothesize that the essential components are: (a) file-based storage, (b) atomic counters, (c) a versatile dictionary metadata format and data formats, and (d) a compiler coupled with an execution engine. We will not focus on file storage or compilers and execution engines. File storage is a well-established area with production-ready systems like AWS S3 and Azure Storage, and compilers and execution engines are active fields of research and development. We can leverage these existing advancements. However, atomic counters and dictionary formats are often overlooked yet are vital components. By concentrating on these, we aim to explore new ideas and develop open-source solutions suitable for both scientific research and production workloads.

Project Part I: Atomic Counters

The goal of this project is to develop an optimized, reliable, and scalable atomic counter service for an open cloud data management system. The service will address shortcomings in the performance and reliability of existing solutions. Identifying these shortcomings is part of the initial research phase of the project.

This project consists of the following tasks:

  • Exploration of state-of-the-art solutions for atomic counters and synchronization methods.
  • Comparison of the functionality of existing open-source tools, services, and commercial offerings.
  • Performance evaluation of available tools, including a study on the fastest options.
  • Analysis of pitfalls, drawbacks, and inconsistencies, such as whether counters can go backwards, produce duplicates, or be forced into inconsistencies.
  • Implementation of our own service if none of the existing solutions meet our needs. This service could be offered as a cloud-based solution with an open interface.

Expected Outcomes:

  • A study on existing technologies for atomic counters and synchronization methods, including a comparison of functionality and performance.
  • An open-source counter system that addresses the potential shortcomings of existing solutions.