Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.

Slides:



Advertisements
Similar presentations
From Startup to Enterprise A Story of MySQL Evolution Vidur Apparao, CTO Stephen OSullivan, Manager of Data and Grid Technologies April 2009.
Advertisements

CASSANDRA-A Decentralized Structured Storage System Presented By Sadhana Kuthuru.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Scalable Content-Addressable Network Lintao Liu
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Sanjay Ghemawat, Howard Gobioff and Shun-Tak Leung
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
Chapter 13 (Web): Distributed Databases
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Managing Data in the Cloud
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
Distributed Databases
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Massively Distributed Database Systems - Distributed DBS Spring 2014 Ki-Joune Li Pusan National University.
Cassandra - A Decentralized Structured Storage System
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
CEPH: A SCALABLE, HIGH-PERFORMANCE DISTRIBUTED FILE SYSTEM S. A. Weil, S. A. Brandt, E. L. Miller D. D. E. Long, C. Maltzahn U. C. Santa Cruz OSDI 2006.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao, Eugene J. Shekita, Sandeep Tata IBM Almaden Research Center PVLDB,
Eduardo Gutarra Velez. Outline Distributed Filesystems Motivation Google Filesystem Architecture The Metadata Consistency Model File Mutation.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NOSQL DATABASE Not Only SQL DATABASE
Bigtable: A Distributed Storage System for Structured Data
CSci8211: Distributed System Techniques & Case Studies: I 1 Detour: Distributed Systems Techniques & Case Studies I  Distributing (Logically) Centralized.
1 Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Bigtable A Distributed Storage System for Structured Data.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
CS 405G: Introduction to Database Systems
and Big Data Storage Systems
Slicer: Auto-Sharding for Datacenter Applications
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
NOSQL.
PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
NOSQL databases and Big Data Storage Systems
Massively Parallel Cloud Data Storage Systems
Benchmarking Cloud Serving Systems with YCSB
Chapter 21: Parallel and Distributed Storage
Presentation transcript:

Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni Presenters Daniel Burgener, Gautam Bhawsar

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Introduction PNUTS is Massively parallel Geographically distributed database system Designed Yahoo! Used by their web application Shared between several applications

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Background * taken from Pub/Sub Model Sending Applications (Publishers) Receiving Applications (Subscribes) Communicate through asynchronous messaging paradigm

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Requirements PNUTS is designed to meet the following requirements: Scalability Response Time and Geographic Scope High Availability & Fault Tolerance Relaxed Consistency Grantees

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

PNUTS Overview & Functionality Data Model & Features Fault Tolerance Pub-Sub Message System Record-level Mastering Hosting

PNUTS Overview & Functionality (Cont’d) Functionality Data & Query Model Consistency Model

Data & Query Model Simplified relational data model Organizes data into tables of records with attributes Allows arbitrary structure inside a record – “blob” Schema are flexible New attribute is added without halting query or update activity Allow to have empty attribute in the record Query language Supports selection and projection in single table Updates & deletes with primary key only

Consistency Model Hide the complexity of replication Considered between general serializability & eventual consistency Per-record timeline consistency “All replica of given record apply all updates to the record in the same order”

Consistency Model (Cont’d) Support range of API calls with different levels of consistency Read-any Read-critical(required_version) Read-latest Write Test-and-set-write(required_version)

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

System Arch. & App. Data tables are horizontally partitioned into groups of records called tablets

System Arch. & App. (Cont’d) Data Storage & Retrieval Replication & Consistency PNUTS Applications

Data Storage & Retrieval Ordered table Primary-key space of a table is divided into intervals Each interval corresponds to one tablet The router stores interval mapping For a given PMK, binary search is used to find the tablet

Data Storage & Retrieval (Cont’d) Hash-organized table n-bit hash function H(), 0 ≤ H() < 2 n [ n ) is divided into intervals Each interval corresponds to single tablet To map a key to a tablet, 1. Hash the key 2. Search set of interval using binary search

Replication & Consistency No redo log The system uses asynchronous replication To ensure low-latency updates Yahoo! Message Broker (YMB) Used for replication & logging because: 1.Multiple steps are applied before committed to DB 2.YMB is designed for wide-area replication

Replication & Consistency (Cont’d) Consistency via YMB & mastership Per-record timeline consistency One copy of a record considered as master Direct all updates to the master copy This is called Record-level mechanism Mastership is assigned on a record-by-record basis Different master records in the same table can be in different clusters All updates are propagated to non-master replicas by publishing them to YMB and delivered as commit order

Replication & Consistency (Cont’d) Recovery from failure (3 Steps) 1. the tablet controller requests a copy from the source tablet3. the source tablet is copied to the destination region2. “checkpoint message” is published to YMB

PNUTS Applications User Database Social Applications Content Meta-Data Listings Management Session Data

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Experimental Results 3 regions PNUTS cluster 2 on the west coast and 1 on the east coast Storage engine for hash table “Yahoo! propriety disk-based hashtable” Storage engine for ordered tables MySQL using InnoDB Written primarily in C++ Some components written in PHP & Perl

Experimental Results (Cont’d) Experimental parameters: The coming experiments show The impact of several factors on the average latency for request

Varying Load

Varying Read/Write Ratio

Varying Skew

Varying Number of Storage Units

Varying Size of Range Scan

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Comparison to Competitors Google BigTable Geographic replication Secondary indexes Materialized views Create multiple tables Hash organized tables

Comparison to Competitors Amazon Dynamo Eventual consistency too weak No support for ordered tables

Comparison to Competitors Sharding No automated data migration No shard splitting

Comparison to Competitors DFS Hard to scale Less rich database functionality

Road Map Introduction Background Requirements PNUTS Overview & Functionality System Architecture & Applications Experimental Results Comparison to Competitors Conclusion & Future Work

Conclusion PNUTS is Massively parallel Geographically distributed database system Designed Yahoo! to be used by their web application Yahoo!s Hosted Data Serving Platform Architecture of PNUTS is based on record-level Consistency model Delivers the data management as hosted service

Future Work Improving query functionality Enforce Constraints such as referential integrity Complex ad hoc queries such as join & group-by Query optimization techniques Provide better technique than simple incremental scanning Add more API calls in consistency model: Bundled Update Relaxed Consistency

Thank You