PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.

Slides:



Advertisements
Similar presentations
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Advertisements

Distributed Processing, Client/Server and Clusters
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
G O O G L E F I L E S Y S T E M 陳 仕融 黃 振凱 林 佑恩 Z 1.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
BY VAIBHAV NACHANKAR ARVIND DWARAKANATH Evaluation of Hbase Read/Write (A study of Hbase and it’s benchmarks)
Distributed components
Web Data Management Raghu Ramakrishnan Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy.
Managing Data in the Cloud
Lecture 6 – Google File System (GFS) CSE 490h – Introduction to Distributed Computing, Winter 2008 Except as otherwise noted, the content of this presentation.
System Integration Week 7 – Lecture 1. For a successful client/server request We need –To identify the host and process that can provide the service –To.
Large Scale Sharing GFS and PAST Mahesh Balakrishnan.
Distributed Systems Fall 2009 Replication Fall 20095DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
Wide-area cooperative storage with CFS
Distributed Databases
Recovery Techniques in Distributed Databases Naveen Jones December 5, 2011.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Databases with Scalable capabilities Presented by Mike Trischetta.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,
Managing Service Metadata as Context The 2005 Istanbul International Computational Science & Engineering Conference (ICCSE2005) Mehmet S. Aktas
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Publisher Mobility in Distributed Publish/Subscribe Systems Vinod Muthusamy, Milenko Petrovic, Dapeng Gao, Hans-Arno Jacobsen University of Toronto June.
Mobile Agent Technology for the Management of Distributed Systems - a Case Study Claudia Raibulet& Claudio Demartini Politecnico di Torino, Dipartimento.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
CDN Brokering* Presented By Nick Arnold Authors Alexandros Biliris, et. Al.
Overview of Cloud Computing Sven Rosvall ACCU
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
C-Store: Concurrency Control and Recovery Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun. 5, 2009.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao, Eugene J. Shekita, Sandeep Tata IBM Almaden Research Center PVLDB,
CS 501: Software Engineering Fall 1999 Lecture 12 System Architecture III Distributed Objects.
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Introduction to cloud computing Jiaheng Lu Department of Computer Science Renmin University of China
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
Chapter 7: Consistency & Replication IV - REPLICATION MANAGEMENT By Jyothsna Natarajan Instructor: Prof. Yanqing Zhang Course: Advanced Operating Systems.
Bigtable: A Distributed Storage System for Structured Data
CSci8211: Distributed System Techniques & Case Studies: I 1 Detour: Distributed Systems Techniques & Case Studies I  Distributing (Logically) Centralized.
Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.
1 Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
CS 405G: Introduction to Database Systems
Slicer: Auto-Sharding for Datacenter Applications
NOSQL.
PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
CHAPTER 3 Architectures for Distributed Systems
Chapter 19: Distributed Databases
NOSQL databases and Big Data Storage Systems
Plethora: Infrastructure and System Design
NoSQL Systems Overview (as of November 2011).
Distributed P2P File System
Benchmarking Cloud Serving Systems with YCSB
HBase on MapR Lohit VijayaRenu, MapR Technologies, Inc.
Distributed Systems (15-440)
OBJECT STORAGE AND INTEROPERABILITY
Caching 50.5* + Apache Kafka
Chapter 21: Parallel and Distributed Storage
Presentation transcript:

PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen, Nick Puz, Daniel Weaver and Ramana Yerneni Yahoo! Research

Motivation And Goals Web applications: – Simple query needs – Relaxed consistency guarantees – Example: Flickr.com Widely Distributed Systems – Earth’s round trip time: ms Goals – Response time guarantees – Load balancing – Scalability, high-availability, fault tolerance

Data Model and Query Language Relational model of data – Tuples with attributes – BLOBs – Flexible schema (JSON) Simplified query language – Point access (hash tables) – Range access (ordered tables) – Relaxed consistency

System Overview

Consistency Model Per-record serializability – Record-level mastering – Events: insert, update, delete – Master is chooses by locality

Query Language Read-any Read-critical (version) Read-latest Write [blind write] Test-and-set (version) [optimistic transactions]

System Overview Yahoo Message Broker – Topic based publish-subscribe – Guaranteed delivery Used for – Distributing updates – Notification service

System Architecture

Query Processing Scatter-gather engine – Receives multi-record requests – Splits it and execute in parallel – Collects the results – Better usage of TCP stack

Failure Tolerance Three step recovery – Request for a remote copy – Checkpoint-message – Actual tablet delivery

Experiments Setup – Three regions (east, west1, west2) – 128 tablets per region – 1 Kb records – 100 client-threads per region – Locality: 0.8

Experiment 1 : INSERTs 1 million records insertion Hash tables (100 clients): – West 1 : 75.6 ms (per request) – West 2 : ms – East : ms Ordered tables (60 clients): – West 1 : 33 ms – West 2 : ms – East : ms Adding clients -> contention

Experiment 2: varying request rate

Experiment 3: varying w/r ratio

Experiment 4: Zipfian workload

Experiment 5: adding storage units

Experiment 6: range queries

Thank you! Q&A time!