Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester, 2014-2015.

Slides:



Advertisements
Similar presentations
Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Advertisements

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
Consistency and Replication Chapter 7 Part II Replica Management & Consistency Protocols.
Replication and Consistency (2). Reference r Replication in the Harp File System, Barbara Liskov, Sanjay Ghemawat, Robert Gruber, Paul Johnson, Liuba.
1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
1 Cheriton School of Computer Science 2 Department of Computer Science RemusDB: Transparent High Availability for Database Systems Umar Farooq Minhas 1,
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Database Replication techniques: a Three Parameter Classification Authors : Database Replication techniques: a Three Parameter Classification Authors :
CS 582 / CMPE 481 Distributed Systems
Overview Distributed vs. decentralized Why distributed databases
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
CS 425 / ECE 428 Distributed Systems Fall 2014 Indranil Gupta (Indy) Lecture 18: Replication Control All slides © IG.
Distributed Databases
Case Study - GFS.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Presented by: Alvaro Llanos E.  Motivation and Overview  Frangipani Architecture overview  Similar DFS  PETAL: Distributed virtual disks ◦ Overview.
Distributed File Systems Concepts & Overview. Goals and Criteria Goal: present to a user a coherent, efficient, and manageable system for long-term data.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
CSE 486/586, Spring 2012 CSE 486/586 Distributed Systems Case Study: Amazon Dynamo Steve Ko Computer Sciences and Engineering University at Buffalo.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
Replication and Consistency. Reference The Dangers of Replication and a Solution, Jim Gray, Pat Helland, Patrick O'Neil, and Dennis Shasha. In Proceedings.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
04/18/2005Yan Huang - CSCI5330 Database Implementation – Distributed Database Systems Distributed Database Systems.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Replication Database replication is the process of sharing data between databases in different locations. Tables and/or fragments (replicas) copied at.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation Data Versioning Lecturer.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Databases Illuminated
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
Service Primitives for Internet Scale Applications Amr Awadallah, Armando Fox, Ben Ling Computer Systems Lab Stanford University.
SQL Server 2005 Implementation and Maintenance Chapter 12: Achieving High Availability Through Replication.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Caching Consistency and Concurrency Control Contact: Dingshan He
Geo-distributed Messaging with RabbitMQ
Introduction.  Administration  Simple DBMS  CMPT 454 Topics John Edgar2.
Chapter 4 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University Building Dependable Distributed Systems.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Introduction to Distributed Databases Yiwei Wu. Introduction A distributed database is a database in which portions of the database are stored on multiple.
DATABASE REPLICATION DISTRIBUTED DATABASE. O VERVIEW Replication : process of copying and maintaining database object, in multiple database that make.
CSci8211: Distributed System Techniques & Case Studies: I 1 Detour: Distributed Systems Techniques & Case Studies I  Distributing (Logically) Centralized.
1 Information Retrieval and Use De-normalisation and Distributed database systems Geoff Leese September 2008, revised October 2009.
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
CSE-291 Cloud Computing, Fall 2016 Kesden
CSE-291 (Cloud Computing) Fall 2016
PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
Google Filesystem Some slides taken from Alan Sussman.
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Lecture 21: Replication Control
THE GOOGLE FILE SYSTEM.
Lecture 21: Replication Control
CSE 486/586 Distributed Systems Case Study: Amazon Dynamo
Presentation transcript:

Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,

Yahoo! PNUTS 2  A massively parallel and geographically distributed database system for Yahoo!’s web applications  provides data storage organized as hashed or ordered tables  low latency for large numbers of concurrent requests including updates and queries  per-record consistency guarantees

Consistency 3  Serializability of general transaction is inefficient and often unnecessary  If a user changes an avatar, posts new pictures, or invites several friends to connect, little harm is done if the new avatar is not initially visible to one friend  Many distributed applications go to the extreme of providing only eventual consistency  Too weak and inadequate for web applications  PNUTS suggests a consistency model that falls between those two extremes

SYSTEM ARCHITECTURE 4  Data is organized into tables of records with attributes  In addition to typical data types, “blob” is a valid data type, allowing arbitrary structures inside a record  Data tables are horizontally partitioned into groups of records called tablets.  Tablets are scattered across many servers  each server might have hundreds or thousands of tablets, but each tablet is stored on a single server within a region

Distributed Hash Table 5 0x0000 0x911F 0x2AF3 Tablet

Distributed Hash Table 6 Tablet clustered by key range

Query model 7  PNUTS supports very simple queries sacrificing rich API in favor of response time and overall simplicity  No joins, group-by, etc.  This is stated as future work  The system is designed to work well with queries that read and write single records or small groups of records

PNUTS-Single Region 8 a single pair of active/standby servers Maintains map from database.table.key to tablet to storage-unit a single pair of active/standby servers Maintains map from database.table.key to tablet to storage-unit Routes client requests to correct storage unit Caches the maps from the tablet controller Routes client requests to correct storage unit Caches the maps from the tablet controller Stores records Services get/set/delete requests Stores records Services get/set/delete requests

Tablet Splitting & Balancing Each storage unit has many tablets (horizontal partitions of the table) Tablets may grow over time Overfull tablets split Storage unit may become a hotspot Shed load by moving tablets to other servers 9

Consistency Options Availability Consistency 10  Eventual Consistency  Low latency updates and inserts done locally  Record Timeline Consistency  Each record is assigned a “master region”  Inserts succeed, but updates could fail during outages  Primary Key Constraint + Record Timeline  Each tablet and record is assigned a “master region”  Inserts and updates could fail during outages

Record Timeline Consistency 11  One of the replicas is designated as the master  Per record  All updates to that record are forwarded to the master  If a replica is receiving the majority of write requests – it becomes the master  Each update advances the generation of the record

Record Timeline Consistency Transactions:  Alice changes status from “Sleeping” to “Awake”  Alice changes location from “Home” to “Work” (Alice, Home, Sleeping)(Alice, Home, Awake) Region 1 (Alice, Home, Sleeping)(Alice, Work, Awake) Region 2 Awake Work (Alice, Work, Awake) Work (Alice, Work, Awake) No replica should see record as (Alice, Work, Sleeping ) 12

API calls 13  Read-any  Returns a possibly stale version of the record.  The returned record is always a valid one from the record’s history.  This call has lower latency than other read calls with stricter guarantees  Read-critical(required version)  Returns a version of the record that is strictly newer than, or the same as the required version.  Read-latest  Returns the latest copy of the record that reflects all writes that have succeeded.  Write  This call gives the same ACID guarantees as a transaction with a single write operation in it. This call is useful for blind writes, e.g., a user updating his status on his profile.  Test-and-set-write(required version)  This call performs the requested write to the record if and only if the present version of the record is the same as required version.

Eventual Consistency  Timeline consistency comes at a price  Writes not originating in record master region forward to master and have longer latency  The mastership of a record can migrate between replicas  When master region down, record is unavailable for write  eventual consistency mode  On conflict, latest write per field wins  Target customers  Those that externally guarantee no conflicts  Those that understand/can cope 14

Yahoo! Message Broker (YMB)  A topic-based publish/subscribe system  Data updates are considered “committed” when they have been published to YMB.  At some point after being committed, the update will be asynchronously propagated to different regions and applied to their replicas  YMB guarantees that published messages will be delivered to all topic subscribers even in the presence of single broker machine failures  by logging the message to multiple disks on different servers. two copies are logged initially, and more copies are logged as the message propagates  The message is not purged from the YMB log until PNUTS has verified that the update is applied to all replicas of the database  YMB provides partial ordering of published messages.  Mes sages published to a particular YMB cluster will be delivered to all subscribers in the order they were published 15

Recovery  Recovering from a failure involves copying lost tablets from another replica.  A three step process  The tablet controller requests a copy from a particular remote replica (the “source tablet”).  A “checkpoint message” is published to YMB, to ensure that any in- flight updates at the time the copy is initiated are applied to the source tablet.  The source tablet is copied to the destination region.  To support this recovery protocol, tablet boundaries are kept synchronized across replicas, and tablet splits are conducted by having all regions split a tablet at the same point  coordinated by a two-phase commit between regions. 16

For more info 17