Platform Engineering Team/Personal Development Share Back/Distributed Storage Transactions

Idea
Distributed data stores can provide massive scalability, fault-tolerance, and replication semantics for robust geographic distribution, compelling features for an organization like Wikimedia. However, these systems have also sacrificed important properties for the sake of their distribution, such as joins, or ACID transactions. We are therefore required to evaluate these systems as a set of trade-offs between their unique capabilities, and what must be sacrificed to make use of them.

It is unlikely that we'll ever both have our cake, and be able to eat it as well, but there is a growing body of research that explores the idea of adding transactions to distributed databases. Even limited support for multi-item transactions could be a game-changer, opening the door to use-cases that could benefit from distribution, but would otherwise be considered intractable.

What is proposed here is a long-term, open-ended project to evaluate, research, and experiment with technologies and techniques to address these missing capabilities. This work will focus particularly on Apache Cassandra, since it is a system already in use at Wikimedia.

Phase 1
Create a greenfield implementation of Cherry Garcia, an abstraction for the client-coordinated transaction commitment protocol outlined in "Scalable Distributed Transactions across Heterogeneous Stores".