Jason Baker, Chris Bond, James C. Corbett, JJ Furman, Andrey Khorlin, James Larson,Jean Michel L´eon, Yawei Li, Alexander Lloyd, Vadim Yushprakh Megastore Scalable Highly Available Storage for Interactive Systems Presented By: Hamid Seyedmoradi Ayoub Hamidi Ehsan Mohamad Nezamian Advanced Database Systems SRBIAU, Kurdistan Campus 10May2012
Megastore Motivation Introduction NoSQL & RDBMS Megastore Paxos SRBIAU, Kurdistan Campus Advanced Database Systems2
Megastore wow! more than 3 billion write and 20 billion read daily key Contribution Data Model and Storage System Paxos Replication Report on Experience SRBIAU, Kurdistan Campus Advanced Database Systems3
AVAILABILITY & SCALE Replication For Availability, we implemented a synchronous, fault tolerant log replicator optimized for long distance links Partitioning and Locality for scale, we partitioned data into a vast space of small databases SRBIAU, Kurdistan Campus Advanced Database Systems4
AVAILABILITY & SCALE Replication Strategies Asynchronous Master/Slave Synchronous Master/Slave Optimistic Replication We decided to use Paxos SRBIAU, Kurdistan Campus Advanced Database Systems5
Technology Options SRBIAU, Kurdistan Campus Advanced Database Systems 6
7 Technology Options
AVAILABILITY & SCALE Partitioning and Locality Replication SRBIAU, Kurdistan Campus Advanced Database Systems8 Datacenters Entity Groups Partition the datastore Each entity group is synchronously replicated across datacenters ACID semantics within an entity group Looser consistency across entity groups Entity group data and replication metadata stored in scalable NoSQL datastores
AVAILABILITY & SCALE Partitioning and Locality Operations: SRBIAU, Kurdistan Campus Advanced Database Systems9 Entities (Units of data) Entity Group 1 Most transactions are within a single entity group Cross Entity group transactions supported via Two – Phase Commit Asynch communication between entity groups supported by Queues Global Indexes span entity groups but have weaker consistency Entity Group 2 Local Index Send queue receive
AVAILABILITY & SCALE Partitioning and Locality Entity Groups Selecting Entity Group Boundaries Example Blogs Physical Layout SRBIAU, Kurdistan Campus Advanced Database Systems10
Megastore API Design Philosophy Data Model Pre-Joining with Keys SCATTER Indexes Storing Clause Repeated Indexes. Inline Indexes Mapping to Bigtable SRBIAU, Kurdistan Campus Advanced Database Systems11
Megastore SRBIAU, Kurdistan Campus Advanced Database Systems12
Megastore Transactions and Concurrency Control Read current snapshot inconsistent Transaction Lifecycle 1-Read 3-Commit 5-Clean up 2-Application logic 4-Apply SRBIAU, Kurdistan Campus Advanced Database Systems13
Megastore Queues Two Phase Commit SRBIAU, Kurdistan Campus Advanced Database Systems14
REPLICATION Brief Summary of Paxos Megastore’s Approach Fast Reads Fast Writes Replica Types Witness Replica Architecture SRBIAU, Kurdistan Campus Advanced Database Systems15
Architecture SRBIAU, Kurdistan Campus Advanced Database Systems16
Data Structures and Algorithms Replicated Logs SRBIAU, Kurdistan Campus Advanced Database Systems17
Data Structures and Algorithms Reads Query Local Find Position Local read Majority read Catchup Validate Query Data SRBIAU, Kurdistan Campus Advanced Database Systems18
Data Structures and Algorithms SRBIAU, Kurdistan Campus Advanced Database Systems19
Data Structures and Algorithms Writes Accept Leader Prepare Accept Invalidate Apply SRBIAU, Kurdistan Campus Advanced Database Systems20
Feedback SRBIAU, Kurdistan Campus Advanced Database Systems21
END With Thanks Question ? SRBIAU, Kurdistan Campus Advanced Database Systems22