Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.

Slides:



Advertisements
Similar presentations
Universität Innsbruck Leopold Franzens Copyright 2006 DERI Innsbruck LarCK Workshop, ISWC/ASWC Busan, Korea 16-Feb-14 Towards Scalable.
Advertisements

Case Study - Amazon. Amazon r Amazon has many Data Centers r Hundreds of services r Thousands of commodity machines r Millions of customers at peak times.
Scalable Content-Addressable Network Lintao Liu
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
ZHT 1 Tonglin Li. Acknowledgements I’d like to thank Dr. Ioan Raicu for his support and advising, and the help from Raman Verma, Xi Duan, and Hui Jin.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Web-Scale Data Serving with PNUTS Adam Silberstein Yahoo! Research.
PNUTS: Yahoo’s Hosted Data Serving Platform Jonathan Danaparamita jdanap at umich dot edu University of Michigan EECS 584, Fall Some slides/illustrations.
Transaction.
Chapter 13 (Web): Distributed Databases
PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen,
PNUTS: Yahoo!’s Hosted Data Serving Platform Yahoo! Research present by Liyan & Fang.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Web Data Management Raghu Ramakrishnan Research QUIQ Lessons Structured data management powers scalable collaboration environments ASP Multi-tenancy.
Managing Data in the Cloud
Parallel and distributed databases R & G Chapter 22.
©Silberschatz, Korth and Sudarshan19.1Database System Concepts Lecture-10 Distributed Database System A distributed database system consists of loosely.
©Silberschatz, Korth and Sudarshan18.1Database System Concepts Centralized Systems Run on a single computer system and do not interact with other computer.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
Module 14: Scalability and High Availability. Overview Key high availability features available in Oracle and SQL Server Key scalability features available.
Distributed Databases
Recovery Techniques in Distributed Databases Naveen Jones December 5, 2011.
Inexpensive Scalable Information Access Many Internet applications need to access data for millions of concurrent users Relational DBMS technology cannot.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Distributed Systems Tutorial 11 – Yahoo! PNUTS written by Alex Libov Based on OSCON 2011 presentation winter semester,
Where in the world is my data? Sudarshan Kadambi Yahoo! Research VLDB 2011 Joint work with Jianjun Chen, Brian Cooper, Adam Silberstein, David Lomax, Erwin.
PNUTS: Y AHOO !’ S H OSTED D ATA S ERVING P LATFORM B RIAN F. C OOPER, R AGHU R AMAKRISHNAN, U TKARSH S RIVASTAVA, A DAM S ILBERSTEIN, P HILIP B OHANNON,
IBM Almaden Research Center © 2011 IBM Corporation 1 Spinnaker Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao Eugene.
1 © Prentice Hall, 2002 Physical Database Design Dr. Bijoy Bordoloi.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Ahmad Al-Shishtawy 1,2,Tareq Jamal Khan 1, and Vladimir Vlassov KTH Royal Institute of Technology, Stockholm, Sweden {ahmadas, tareqjk,
VLDB2012 Hoang Tam Vo #1, Sheng Wang #2, Divyakant Agrawal †3, Gang Chen §4, Beng Chin Ooi #5 #National University of Singapore, †University of California,
Apache Cassandra - Distributed Database Management System Presented by Jayesh Kawli.
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
Cassandra - A Decentralized Structured Storage System
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
PNUTS PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno.
VICTORIA UNIVERSITY OF WELLINGTON Te Whare Wananga o te Upoko o te Ika a Maui SWEN 432 Advanced Database Design and Implementation MongoDB Architecture.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Using Paxos to Build a Scalable, Consistent, and Highly Available Datastore Jun Rao, Eugene J. Shekita, Sandeep Tata IBM Almaden Research Center PVLDB,
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Dynamo: Amazon’s Highly Available Key-value Store DAAS – Database as a service.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NOSQL DATABASE Not Only SQL DATABASE
Google File System Robert Nishihara. What is GFS? Distributed filesystem for large-scale distributed applications.
CSci8211: Distributed System Techniques & Case Studies: I 1 Detour: Distributed Systems Techniques & Case Studies I  Distributing (Logically) Centralized.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
1 Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan and Russell Sears Yahoo! Research.
Introduction to Core Database Concepts Getting started with Databases and Structure Query Language (SQL)
Introduction to NoSQL Databases Chyngyz Omurov Osman Tursun Ceng,Middle East Technical University.
Amazon Web Services. Amazon Web Services (AWS) - robust, scalable and affordable infrastructure for cloud computing. This session is about:
Presented by: Aaron Stanley King.  Benefits of SQL Azure  Features of SQL Azure  Demos, Demos, Demos!  How to query in SQL Azure  More Demos!  Recent.
CSCI5570 Large Scale Data Processing Systems NoSQL Slide Ack.: modified based on the slides from Adam Silberstein James Cheng CSE, CUHK.
Web-Scale Data Serving with PNUTS
Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung
Cassandra - A Decentralized Structured Storage System
NOSQL.
PNUTS: Yahoo!’s Hosted Data Serving Platform
PNUTS: Yahoo!’s Hosted Data Serving Platform
Introduction to NewSQL
Chapter 19: Distributed Databases
NOSQL databases and Big Data Storage Systems
Plethora: Infrastructure and System Design
7.1. CONSISTENCY AND REPLICATION INTRODUCTION
Benchmarking Cloud Serving Systems with YCSB
by Mikael Bjerga & Arne Lange
Presentation transcript:

Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88

Introduction PNUTS Overview Functionality Architecture Applications Experimental Results Conclusion 2 A.Angabini - PNUTS

 Main requirements of Web Apps  Scalability  Response Time and Geographic Scope  High Availability & Fault Tolerance  Relaxed Consistency Guarantees 3 A.Angabini - PNUTS

 PNUTS is  Massively parallel  Geographically distributed database system  Designed Yahoo!  Used by their web application  Shared between several applications 4 A.Angabini - PNUTS

 Data Model & Features  Fault Tolerance  Pub-Sub Message System  Hosting 5 A.Angabini - PNUTS

 Data & Query Model  Consistency Model 6 A.Angabini - PNUTS

Simplified relational data model Organizes data into tables of records with attributes Allows arbitrary structure inside a record – “blob” Schema are flexible New attribute is added without halting query or update activity Allow to have empty attribute in the record Query language Supports selection and projection in single table Updates & deletes with primary key only 7 A.Angabini - PNUTS

Hide the complexity of replication Considered between general serializability & eventual consistency Per-record timeline consistency “All replica of given record apply all updates to the record in the same order” 8 A.Angabini - PNUTS

 Support range of API calls with different levels of consistency  Read-any  Read-critical(required_version)  Read-latest  Write  Test-and-set-write(required_version) 9 A.Angabini - PNUTS

 Data tables are horizontally partitioned into groups of records called tablets 10 A.Angabini - PNUTS

 Ordered table  Primary-key space of a table is divided into intervals  Each interval corresponds to one tablet  The router stores interval mapping  For a given PMK, binary search is used to find the tablet 11 A.Angabini - PNUTS

 Hash-organized table  n-bit hash function H(), 0 ≤ H() < 2 n  [ n ) is divided into intervals  Each interval corresponds to single tablet  To map a key to a tablet, 1. Hash the key 2. Search set of interval using binary search 12 A.Angabini - PNUTS

 The system uses asynchronous replication  To ensure low-latency updates  Yahoo! Message Broker (YMB)  Used for replication & logging because: 1.Multiple steps are applied before committed to DB 2.YMB is designed for wide-area replication 13 A.Angabini - PNUTS

 Recovery from failure (3 Steps) 1. the tablet controller requests a copy from the source tablet3. the source tablet is copied to the destination region2. “checkpoint message” is published to YMB 14 A.Angabini - PNUTS

 User Database  Social Applications  Content Meta-Data  Listings Management  Session Data 15 A.Angabini - PNUTS

Three PNUTS regions 2 west coast, 1 east coast 5 storage units, 2 message brokers, 1 router West: Dual 2.8 GHz Xeon, 4GB RAM, 6 disk RAID 5 array East: Quad 2.13 GHz Xeon, 4GB RAM, 1 SATA disk Workload requests/second 0-50% writes 80% locality Storage engine for hash table “Yahoo! propriety disk-based hashtable” Storage engine for ordered tables MySQL using InnoDB 16 A.Angabini - PNUTS

The coming experiments show The impact of several factors on the average latency for request 17 A.Angabini - PNUTS

18 A.Angabini - PNUTS

19 A.Angabini - PNUTS

20 A.Angabini - PNUTS

21 A.Angabini - PNUTS

22 A.Angabini - PNUTS

Rich database functionality and low latency at massive scale. Tradeoffs between functionality, performance and scalability. Choose asynchronous replication to ensure low write latency. Delivers the data management as hosted service 23 A.Angabini - PNUTS

B. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, H. Jacobsen, N. Puz, D. Weaver, and R. Yerneni, "PNUTS: Yahoo!'s hosted data serving platform," Proceedings of the VLDB Endowment archive, vol. 1, 2008, p. 1277–1288. Technical report, Raghu Ramakrishnan, Yahoo! Research and Platform Engineering Team 24 A.Angabini - PNUTS

Thanks For Your Attention ? 25 A.Angabini - PNUTS