Web-Scale Data Serving with PNUTS

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Web Data Management Raghu Ramakrishnan Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy.
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
P2P: Advanced Topics Filesystems over DHTs and P2P research Vyas Sekar.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Ymir Vigfusson Adam Silberstein Brian Cooper Rodrigo Fonseca.
Concurrency Control & Caching Consistency Issues and Survey Dingshan He November 18, 2002.
Wide-area cooperative storage with CFS
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Introduction to Hadoop and HDFS
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Cassandra - A Decentralized Structured Storage System
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Bigtable: A Distributed Storage System for Structured Data
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
Plan for Final Lecture What you may expect to be asked in the Exam?
Cloud Computing and Architecuture
100% Exam Passing Guarantee & Money Back Assurance
Cassandra - A Decentralized Structured Storage System
HBase Mohamed Eltabakh
Distributed Cache Technology in Cloud Computing and its Application in the GIS Software Wang Qi Zhu Yitong Peng Cheng
Real-time analytics using Kudu at petabyte scale
Software Systems Development
Client/Server Databases and the Oracle 10g Relational Database
Dr.S.Sridhar, Director, RVCET, RVCE, Bangalore
CS122B: Projects in Databases and Web Applications Winter 2017
Large-scale file systems and Map-Reduce
Parallel Databases.
CLOUDERA TRAINING For Apache HBase
CSE-291 Cloud Computing, Fall 2016 Kesden
CSE-291 (Cloud Computing) Fall 2016
PNUTS: Yahoo!’s Hosted Data Serving Platform
Google Filesystem Some slides taken from Alan Sussman.
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
AWS DevOps Engineer - Professional dumps.html Exam Code Exam Name.
2018 Amazon AWS DevOps Engineer Professional Dumps - DumpsProfessor
The Google File System Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung Google Presented by Jiamin Huang EECS 582 – W16.
Massively Parallel Cloud Data Storage Systems
Arrested by the CAP Handling Data in Distributed Systems
آزمايشگاه سيستمهای هوشمند علی کمالی زمستان 95
EECS 498 Introduction to Distributed Systems Fall 2017
AWS Cloud Computing Masaki.
THE GOOGLE FILE SYSTEM.
by Mikael Bjerga & Arne Lange
Chapter 21: Parallel and Distributed Storage
Presentation transcript:

Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research

Outline PNUTS Architecture Recent Developments Adoption at Yahoo! New features New challenges Adoption at Yahoo!

Yahoo! Cloud Data Systems Large Data Analysis Hadoop Structured Record Storage PNUTS Large Blob Storage MobStor CRUD Point lookups and short scans Index organized table and random I/Os Scan oriented workloads Focus on Sequential disk I/O Object retrieval and streaming Scalable file storage

What is PNUTS? Structured, flexible schema Geographic replication CREATE TABLE Parts ( ID VARCHAR, StockNumber INT, Status VARCHAR … ) Key1 42342 E Key2 42521 W Key3 66354 Key4 12352 Key5 75656 C Key6 15677 Key1 42342 E Key2 42521 W Key3 66354 Key4 12352 Key5 75656 C Key6 15677 Structured, flexible schema Key1 42342 E Key2 42521 W Key3 66354 Key4 12352 Key5 75656 C Key6 15677 Geographic replication Parallel database Hosted, managed infrastructure

PNUTS Design Features Simplicity Global Access Operability Scalability via commodity servers Elasticity: add capacity with growth APIs: key lookup or range scan Global Access Asynchronous Replication across data centers Low Latency local access Consistency: Timeline, Eventual Operability Resilience and automatic recovery Automatic load balancing Single multi-tenant hosted service

Distributed Hash Table Primary Key Record Grape {"liquid" : "wine"} Lime {"color" : "green"} Apple {"quote" : "Apple a day keeps the …"} Strawberry {"spread" : "jam"} Orange {"color" : "orange"} Avocado {"spread" : "guacamole"} Lemon {"expression" : "expensive crap"} Tomato {"classification" : "yes… fruit"} Banana {"expression" : "goes bananas"} Kiwi {"expression" : "New Zealand"} 0x0000 0x2AF3 Tablet 0x911F

Distributed Ordered Table Primary Key Record Apple {"quote" : "Apple a day keeps the …"} Avocado {"spread" : "guacamole"} Banana {"expression" : "goes bananas"} Grape {"liquid" : "wine"} Kiwi {"expression" : "New Zealand"} Lemon {"expression" : "expensive crap"} Lime {"color" : "green"} Orange {"color" : "orange"} Strawberry {"spread" : "jam"} Tomato {"classification" : "yes… fruit"} Tablet clustered by key range

PNUTS-Single Region Routes client requests to correct storage unit Caches the maps from the tablet controller Maintains map from database.table.key to tablet to storage-unit Stores records Services get/set/delete requests

Tablet Splitting & Balancing Each storage unit has many tablets (horizontal partitions of the table) Storage unit may become a hotspot Overfull tablets split Tablets may grow over time Shed load by moving tablets to other servers

PNUTS Multi-Region

Asynchronous Replication

Consistency Options Eventual Consistency Record Timeline Consistency Low latency updates and inserts done locally Record Timeline Consistency Each record is assigned a “master region” Inserts succeed, but updates could fail during outages* Primary Key Constraint + Record Timeline Each tablet and record is assigned a “master region” Inserts and updates could fail during outages* Availability Consistency

Record Timeline Consistency Transactions: Alice changes status from “Sleeping” to “Awake” Alice changes location from “Home” to “Work” (Alice, Home, Sleeping) (Alice, Home, Awake) (Alice, Work, Awake) Region 1 Awake Work (Alice, Work, Awake) Work (Alice, Home, Sleeping) (Alice, Work, Awake) Region 2 No replica should see record as (Alice, Work, Sleeping)

Eventual Consistency Timeline consistency comes at a price Writes not originating in record master region forward to master and have longer latency When master region down, record is unavailable for write We added eventual consistency mode On conflict, latest write per field wins Target customers Those that externally guarantee no conflicts Those that understand/can cope

Outline PNUTS Architecture Recent Developments Adoption at Yahoo! New features New challenges Adoption at Yahoo!

Ordered Table Challenges apple MIN B L MAX MIN I S MAX carrot tomato banana avocado lemon Carefully choose initial tablet boundaries Sample input keys Same goes for any big load Pre-split and move tablets if needed

Ordered Table Challenges Dealing with skewed workloads Tablet split, tablet moves Initially operator driven Now driven by Yak load balancer Yak Collect storage unit stats Issue move, split requests Be conservative, make sure loads are here to stay! Moves are expensive Splits not reversible

Notifications Many customers want a stream of updates made to their tables Update external indexes, e.g., Lucene-style index Maintain cache Dump as logs into Hadoop Under the covers, notification stream is actually our pub/sub replication layer, Tribble client pnuts not. client client index, logs, etc.

Materialized Views Items Index on type! Key Value item123 type=bike, price=100 item456 type=toaster, price=20 item789 type=bike, price=200 Async updates via pub/sub layer Does not efficiently support list all bikes for sale! Index on type! Key Value bike_item123 price=100 bike_item789 price=200 toaster_item456 price=20 Adding/deleting item triggers add/delete on index Updating item type trigger delete and add on index Get bikes for sale with prefix scan: bike*

Bulk Operations HDFS PNUTS 1) User click history logs stored in HDFS 2) Hadoop job builds models of user preferences 3) Hadoop reduce writes models to PNUTS user table PNUTS 4) Models read from PNUTS help decide users’ frontpage content Candidate content

PNUTS-Hadoop Writing to PNUTS Reading from PNUTS set Map or Reduce Hadoop Tasks PNUTS Router set 1. Call PNUTS set to write output Reading from PNUTS Hadoop Tasks scan(0x2-0x4) scan(0xa-0xc) scan(0x8-0xa) scan(0x0-0x2) scan(0xc-0xe) Map PNUTS Split PNUTS table into ranges Each Hadoop task assigned a range Task uses PNUTS scan API to retrieve records in range Task feeds scan results and feeds records to map function Record Reader

Bulk w/Snapshot Per-tablet snapshot files Hadoop tasks PNUTS Storage units Snapshot daemons foo PNUTS tablet map foo Send map to tasks Tasks write output to snapshot files Sender daemons send snapshots to PNUTS Receiver daemons load snapshots into PNUTS

Selective Replication PNUTS replicates at the table-level, potentially among 10+ data centers Some records only read in 1 or a few data centers Legal reasons prevent us from replicating user data except where created Tables are global, records may be local! Storing unneeded replicas wastes disk Maintaining unneeded replicas wastes network capacity

Selective Replication Static Per-record constraints Client sets mandatory, disallowed regions Dynamic Create replicas in regions where record is read Evict replicas from regions where record not read Lease-based When a replica read, guaranteed to survive for a time period Eviction lazy; when lease expires, replica deleted on next write Maintains minimum replication levels Respects explicit constraints

Outline PNUTS Architecture Recent Developments Adoption at Yahoo! New features New challenges Adoption at Yahoo!

PNUTS in production Over 100 Yahoo! applications/platforms on PNUTS Movies, Travel, Answers Over 450 tables, 50K tablets Growth, past 18 months 10s to 1000s of storage servers Less than 5 data centers to over 15

Customer Experience PNUTS is a hosted service Customer interaction Customers don’t install Customers usually don’t wait for hardware requests Customer interaction Architects and dev mailing list help with design Ticketing to get tables Latency SLA and REST API Ticketing ensured PNUTS stays sufficiently provisioned for all customers We check on intended use, expected load, etc.

Sandbox Self-provisioned system for getting test PNUTS tables Start using REST API in minutes No SLA Just running on a few storage servers, shared among many clients No replication Don’t put production data here!

Thanks! Adam Silberstein Further Reading silberst@yahoo-inc.com System Overview: VLDB 2008 Pre-planning for big loads: SIGMOD 2008 Materialized views: SIGMOD 2009 PNUTS-Hadoop: SIGMOD 2011 Selective replication: VLDB 2011 YCSB: https://github.com/brianfrankcooper/YCSB/, SOCC 2010