PROPOSAL

Deconstructed Cloud Databases, Part II: Open Dictionary Metadata


Supervisors: Martin Hentschel
Semester: Fall 2024
Tags: data management, security, open source, open standards

The Deconstructed Cloud Databases project stems from a simple question: What are the minimum components required to build a data management system in the cloud? Our motivation for this project is based on the idea that reducing a system to its minimum set of components makes it easier to build, test, and maintain cloud data management systems. This approach requires less engineering effort, reduces the cost of software development, and lowers the risk of errors by minimizing the number of components that need to be managed.

We hypothesize that the essential components are: (a) file-based storage, (b) atomic counters, (c) versatile dictionary metadata format and data formats, and (d) a compiler coupled with an execution engine. We will not focus on file storage or compilers and execution engines. File storage is a well-established area with production-ready systems like AWS S3 and Azure Storage, and compilers and execution engines are active fields of research and development. We can leverage these existing advancements. However, atomic counters and dictionary formats are often overlooked yet are vital components. By concentrating on these, we aim to explore new ideas and develop open-source solutions suitable for both scientific research and production workloads.

Project Part II: Open Dictionary Metadata

The Open Dictionary Metadata project aims to create an open standard for managing dictionary metadata in cloud data management systems, with a focus on security components such as user management and role-based access control (RBAC). This standard will address the current gap in dictionary metadata management, namely the absence of such a standard.

This project consists of the following tasks:

  • Analysis of Dictionary Metadata Management: Examine how existing database systems and cloud data management systems manage dictionary metadata, especially around table management, user management, and RBAC. Identify best practices, limitations, and gaps in current approaches.
  • Creation of Minimum Requirements: Develop a unified set of minimum requirements and features necessary for dictionary metadata management based on the analysis.
  • Definition of an Open Standard: Define an open standard for dictionary metadata management that includes dictionary components (such as databases, tables, and views) as well as security components (such as users, logins, roles, privileges, and grants). The standard must be extensible to accommodate future needs.
  • Development of a Dictionary Metadata Service: Implement a reference dictionary metadata service based on the defined open standard. The service will store its contents in files and track changes over time using atomic counter values (see Part I of the Deconstructed Cloud Databases project). Optimize your reference implementation and demonstrate that it achieves great performance.

Expected Outcomes:

This project can be divided into the following parts:

  • Part 1: A study on existing approaches to dictionary metadata management, including an analysis of current practices and systems.
  • Part 2: Definition of a common open standard for dictionary metadata management, incorporating requirements and features identified in the first part.
  • Part 3: Implementation of the open standard as a cloud-based service, including development, testing, and deployment.