PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Research Mina Farid University of Waterloo CS 848 Presentation 8 February 2010
Outline Mina Farid2 Motivation Data and Query Model Consistency System Architecture Applications Experiments
Motivation Mina Farid3 Scalability Response Time (SLAs) High Availability and Fault Tolerance Relaxed Consistency Guarantees Serializable Transactions Eventual Consistency: update any replica, all updates are propagated to all replicas, but potentially in different orders
Data and Query Model Mina Farid4 Simplified Relational Data Model (tables, records, attributes) Flexible schemas Query: Selection and Projection from a single table. Specific applications Scans a few records No ad-hoc queries Support for hashed and ordered tables
Consistency Mina Farid5 In between One record updates Per-record timeline consistency: replicas of a record apply updates in the same order For one version, all replicas contain the same information General SerializabilityEventual Consistency
Consistency (cont’d) Mina Farid6 Master replica for each record. Updates are forwarded to this master replica Master record carries the version info API calls - Consistency Read-any Read-critical(required_version) Read-latest Write Test-and-set-write(required_version)
System Architecture Mina Farid7 Tablet Controlle r Storage Unit 1 Storage Unit 2 Storage Unit N Routers Message Broker.... Region T1SU1 T2SU2 T3SU3 T4SU1
System Architecture – Data Storage and Retrieval Mina Farid8 Regions with full complement of system and data Tables are partitioned into tablets Tablet is just a group of records of a certain table Tablets are stored on storage units servers Storage units respond to: get() scan() set()
Tablet 1Tablet 2Tablet 3Tablet 4 Routers’ Mapping – Ordered Table Mina Farid9 Routers decide: Which tablets contain which records Which SU holds which tablets Banana.. Grape.. Lemon.. MAX_STRING MIN_STRING.. T1SU1 T2SU2 T3SU3 T4SU1 MINT1 BananaT2 GrapeT3 LemonT4
System Architecture Mina Farid10 Tablet Controlle r Storage Unit 1 Storage Unit 2 Storage Unit N Routers Message Broker.... Region T1SU1 T2SU2 T3SU3 T4SU1 MINT1 BananaT2 GrapeT3 LemonT4 MINT1 BananaT2 GrapeT3 LemonT4 T1SU1 T2SU2 T3SU3 T4SU1
System Architecture Mina Farid11 Tablet Controller Routers Message Broker Tablet Controller Routers Message Broker Storage Units Region 1Region 2 T1SU 1 T2SU 2 T3SU 3 T4SU 1 T1SU 1 T2SU 2 T3SU 3 T4SU 1 MINT1 BananaT2 GrapeT3 LemonT4 T1SU1 T2SU2 T3SU3 T4SU1 MINT1 BananaT2 GrapeT3 LemonT4 T1SU1 T2SU2 T3SU3 T4SU1
System Architecture – Replication and Consistency Mina Farid12 1- Yahoo! Message Broker Reliable topic based publish/subscribe Updates are asynchronously propagated to all replicas Provides ‘Partial Ordering’: Messages published to a particular YMB will be delivered to all subscribers in the same order. Messages published to different YMBs may be delivered in any order Solution: per-record mastership
System Architecture – Replication and Consistency Mina Farid13 2- Consistency and Record Mastership One copy of a record as a master Updates are forwarded to that master copy Publish update (commit) Different records in the same table can be mastered in different clusters Who is the master record? How it is selected? Each record carries meta-data information about the identity of the master record (changeable) Record receiving most updates
Query Processing Mina Farid14 Multi-record querying Scatter-gather engine (Router) Split multi-record request to multiple single-record requests Initiates parallel queries Assemble and evaluate results, and send it back to the client Handles range and scan queries (also supports top-k)
Applications Mina Farid15 User Databases Millions of records, frequent updates, important data, relaxed consistency Social Application Flexible schemas, large number of small updates, no real-time requirements (relaxed consistency) Content Meta-Data Manage structured metadata, scalable, consistent Session Data Scalable storage to manage states, but low consistency required
Experiments Mina Farid16 Main criteria: Average Request Latency ( response time ) Experiment Setup 3 Regions (2 West, 1 East) 1- Inserting data 2- Varying Load 3- Varying number of Storage Units
Future Enhancements Mina Farid17 Includes adding the following features: Indexing, Materialized Views Bundled updates (atomic non-isolated updates for multiple records)
Conclusion Mina Farid18
Mina Farid19 Thank You! Questions?
Mina Farid20
Google BigTable Mina Farid21 Record-oriented access to very large tables Does not support: Geographic replication Secondary indexes Materialized views Hash-organized tables
Dynamo Mina Farid22 Focuses on availability Provides geographic replication via ‘gossip’ mechanism Eventual consistency model does not suit all applications “Updates are committed in different orders at different replicas”, then replicas are eventually reconciled (updates may roll back) Does not support: Ordered tables
Boxwood Mina Farid23 Provides B-tree implementation The design favors consistency over scalability (tens of machines)