Platform Engineering Team/Personal Development Share Back/Distributed Storage Transactions

Idea
Distributed data stores can provide massive scalability, fault-tolerance, and the replication semantics necessary to power robust geographic distribution; Compelling features for an organization like Wikimedia. However, these systems have also sacrificed important properties for the sake of their distribution, such as those that make ACID transactions possible. This requires us to evaluate the use of these systems as a set of trade-offs between their unique capabilities, and what must be sacrificed in the absence of transactions.

It is unlikely that we'll ever both have our cake, and be able to eat it as well, but there is a growing body of research that explores the idea of adding transactions to distributed databases. Even limited support for multi-item transactions could be a game-changer, opening the door to use-cases that could benefit from distribution, but would otherwise be considered intractable.

What is proposed here is a long-term, open-ended project to evaluate, research, and experiment with technologies and techniques to address these missing capabilities. This work will focus particularly on Apache Cassandra, since it is a system already in use at Wikimedia.

Phase 1
Create a greenfield implementation of Cherry Garcia, an abstraction for the client-coordinated transaction commitment protocol outlined in "Scalable Distributed Transactions across Heterogeneous Stores".