PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall 20111 Some slides/illustrations.

Slides:



Advertisements
Similar presentations
P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Advertisements

High throughput chain replication for read-mostly workloads
Scalable Content-Addressable Network Lintao Liu
Peer-to-Peer (P2P) Distributed Storage 1Dennis Kafura – CS5204 – Operating Systems.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Amazon’s Dynamo Simple Cloud Storage. Foundations 1970 – E.F. Codd “A Relational Model of Data for Large Shared Data Banks”E.F. Codd –Idea of tabular.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Small-Scale Peer-to-Peer Publish/Subscribe
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
Applications over P2P Structured Overlays Antonino Virgillito.
2/18/2004 Challenges in Building Internet Services February 18, 2004.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
1 A Scalable Content- Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Presentation by Krishna
7/2/2015EECS 584, Fall Bigtable: A Distributed Storage System for Structured Data Jing Zhang Reference: Handling Large Datasets at Google: Current.
Distributed Databases
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
How WebMD Maintains Operational Flexibility with NoSQL Rajeev Borborah, Sr. Director, Engineering Matt Wilson – Director, Production Engineering – Consumer.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
Chord & CFS Presenter: Gang ZhouNov. 11th, University of Virginia.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Cassandra - A Decentralized Structured Storage System
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Bigtable: A Distributed Storage System for Structured Data
CSci8211: Distributed System Techniques & Case Studies: I 1 Detour: Distributed Systems Techniques & Case Studies I  Distributing (Logically) Centralized.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
Cassandra - A Decentralized Structured Storage System
Dr.S.Sridhar, Director, RVCET, RVCE, Bangalore
PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
CHAPTER 3 Architectures for Distributed Systems
Chapter 19: Distributed Databases
Accessing nearby copies of replicated objects
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT -Sumanth Kandagatla Instructor: Prof. Yanqing Zhang Advanced Operating Systems (CSC 8320)
EECS 498 Introduction to Distributed Systems Fall 2017
Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)
Small-Scale Peer-to-Peer Publish/Subscribe
Chapter 21: Parallel and Distributed Storage
Presentation transcript:

PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations from Adam Silberstein’s PNUTS presentation in OSCON July

What is PNUTS? Trivia: Abbreviation Real meaning Goals/Requirements –Scalability, Scalability, Scalability –Low latency –Availability (…or else, $--) –A certain degree of Consistency EECS 584, Fall 20112

PNUTS Applications Yahoo!’s User Database Social Applications Content Meta-Data Listings Management Session Data EECS 584, Fall 20113

Design Outlines Data and Query Model –Simple relational model –Key / Range scan (Hash and Ordered tables) Global Accesses –Asynchronous (global) replication –Low latency local access –Consistency Fault tolerant and Load balancing EECS 584, Fall 20114

Data Storage and Retrieval Horizontal partitioned table into tablets (groups of records) Storage unit: where the tablets are stored Router –Interval mapping: boundaries of each tablet –Locates the storage unit given a tablet –Is the cache version of… Tablet controller –Stores all the mappings –Polled by router for changes –Determines when to split/move tablets EECS 584, Fall 20115

PNUTS-Single Region EECS 584, Fall Maintains map from database.table.key to tablet to storage-unit Routes client requests to correct storage unit Caches the maps from the tablet controller Routes client requests to correct storage unit Caches the maps from the tablet controller Stores records Services get/set/delete requests Stores records Services get/set/delete requests From Silberstein’s OSCON 2011 slide

Data Retrieval for Hash Hash space is divided to intervals each corresponds to a tablet Key -> hash value -> interval -> storage unit found EECS 584, Fall 20117

Data Model: Hash EECS 584, Fall Primary KeyRecord Grape{"liquid" : "wine"} Lime{"color" : "green"} Apple{"quote" : "Apple a day keeps the …"} Strawberry{"spread" : "jam"} Orange{"color" : "orange"} Avocado{"spread" : "guacamole"} Lemon{"expression" : "expensive crap"} Tomato{"classification" : "yes… fruit"} Banana{"expression" : "goes bananas"} Kiwi{"expression" : "New Zealand"} Tablet 0x0000 0x911F 0x2AF3

Data Retrieval for Ordered Table Key -> interval -> storage unit found –As opposed to Hash: Key -> hash value -> interval -> storage unit EECS 584, Fall 20119

Data Model: Ordered EECS 584, Fall Tablet Primary KeyRecord Apple{"quote" : "Apple a day keeps the …"} Avocado{"spread" : "guacamole"} Banana{"expression" : "goes bananas"} Grape{"liquid" : "wine"} Kiwi{"expression" : "New Zealand"} Lemon{"expression" : "expensive crap"} Lime{"color" : "green"} Orange{"color" : "orange"} Strawberry{"spread" : "jam"} Tomato{"classification" : "yes… fruit"}

Tablet Splitting & Balancing EECS 584, Fall Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers From Silberstein’s OSCON 2011 slide

Data Model: Limitations No constraints –Is a challenge to the asynchronous system No ad hoc queries (joins, group-by, …) –Is a challenge while maintaining response-time SLA MapReduce Interface –Hadoop: MapReduce implementation for large scale data analysis –Pig: the language EECS 584, Fall

Yahoo! Message Broker Pub/sub (publish/subscribe) –Clients subscribe to updates –Updates done by a client are propagated to other clients –Thus, the subscribing clients are notified of the updates Usages –Updates: updates are “committed” only when they are published to YMB –Notifications EECS 584, Fall

Geographical Replication EECS 584, Fall Same copies of a “single region” node across data centers –Why? Reduced latency for regions distant from the “master” region Backup –But...more latency for updates On updates, propagate the changes to all replicas –This is done using Yahoo! Message Broker –This is one where the latency comes from

PNUTS Multi-Region From Silberstein’s OSCON 2011 slide

Consistency Model Per-record Timeline Consistency EECS 584, Fall

Record-level Mastering Assigns one cluster is the “master” for each record Local update -> record master for complete “commit” Choosing the right master –85% of writes are typically done from the same datacenter –Therefore, choose the “frequent writer” datacenter as the master –If the writes moves, reassign the master accordingly Tablet master (not record master) enforces primary key constraints EECS 584, Fall

Recovery Find the replica to copy from Checkpoint message for in-flight updates Copy lost tablets from the chosen replica Which replica to choose? –A nearby “backup region” can efficiently accommodate this need EECS 584, Fall

Asynchronous Replication From Silberstein’s OSCON 2011 slide

Experiments Metric: latency Being compared: hash and ordered tables Clusters: three-region PNUTS cluster –2 to the west, 1 to the east Parameters EECS 584, Fall

Inserting Data One region (West 1) is the tablet master Hash: 99 clients (33 per region), MySQL: 60 clients 1 million records, 1/3 per region Result: –Hash: West1: 75.6ms; West2: 131.5ms, East 315.5ms –Ordered: West1: 33ms; West2: 105.8ms, East 324.5ms Lesson: MySQL is faster than hash, although more vulnerable to contention More observations EECS 584, Fall

Varying Load Requests vary between 1200 – 3600 requests/second with 10% writes Result: EECS 584, Fall Observation or Anomaly?

Varying Read/Write Ratio Ratios vary between 0 and 50% Fixed 1,200 requests/second EECS 584, Fall

Varying Number of Storage Units Storage units per region vary from % writes, 1,200 requests/seconds EECS 584, Fall

Varying Size of Range Scans Range scan between 0.01 to 0.1% size Ordered table only 30 clients vs. 300 clients EECS 584, Fall

Thank you Questions? EECS 584, Fall