CMSC 34702-1 Cluster Computing Basics Junchen Jiang The University of Chicago October 8, 2018
MapReduce: Simplified Data Processing on Large Clusters The Google File System Bigtable: A Distributed Storage System for Structured Data Cassandra - A Decentralized Structured Storage System
Consistency, Availability, Partition Tolerance x x Replica 1 Replica 2
Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y x Replica 1 Replica 2
Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y y Replica 1 Replica 2
Consistency, Availability, Partition Tolerance Any read must return the last written value set(y) y get() y y Replica 1 Replica 2
Consistency, Availability, Partition Tolerance Any read must return the last written value x get() Availability Every request must result in a response x x Replica 1 Replica 2
Consistency, Availability, Partition Tolerance Any read must return the last written value y get() Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers y x Replica 1 Replica 2
Cassandra: Gossip-based consensus protocol Consistency Any read must return the last written value set(y) x get() Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers
Bigtable: Paxos-based consensus protocol (Chubby) Consistency Any read must return the last written value set(y) Availability Every request must result in a response Chubby Master Chubby Slave Partition Tolerance Network can lose any messages between servers Chubby Slave Chubby Slave Service is unavailable until a quorum is reached
Is it possible to achieve all three simultaneously? Cassandra Bigtable Impossible Consistency Any read must return the last written value Availability Every request must result in a response Partition Tolerance Network can lose any messages between servers Unfortunately, No. (CAP Theorem) (Eric Brewer. https://people.eecs.berkeley.edu/~brewer/cs262b-2004/PODC-keynote.pdf)
This Class: Stream Processing