Ymir Vigfusson Adam Silberstein Brian Cooper Rodrigo Fonseca.

Slides:



Advertisements
Similar presentations
Adam Jorgensen Pragmatic Works Performance Optimization in SQL Server Analysis Services 2008.
Advertisements

PNUTS: Yahoo!’s Hosted Data Serving Platform Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, HansArno Jacobsen,
Erhan Erdinç Pehlivan Computer Architecture Support for Database Applications.
Esma Yildirim Department of Computer Engineering Fatih University Istanbul, Turkey DATACLOUD 2013.
1. Aim High with Oracle Real World Performance Andrew Holdsworth Director Real World Performance Group Server Technologies.
1 Storage Today Victor Hatridge – CIO Nashville Electric Service (615)
Coordinated Workload Scheduling A New Application Domain for Mechanism Design Elie Krevat.
IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Benchmarking Cloud Serving Systems with YCSB Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, Russell Sears Yahoo! Research Presenter.
Truthful Approximation Mechanisms for Scheduling Selfish Related Machines Motti Sorani, Nir Andelman & Yossi Azar Tel-Aviv University.
Disk Access Model. Using Secondary Storage Effectively In most studies of algorithms, one assumes the “RAM model”: –Data is in main memory, –Access to.
© 2011 Citrusleaf. All rights reserved.1 A Real-Time NoSQL DB That Preserves ACID Citrusleaf Srini V. Srinivasan Brian Bulkowski VLDB, 09/01/11.
Chapter 6: CPU Scheduling. 5.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th Edition, Feb 2, 2005 Chapter 6: CPU Scheduling Basic.
Database Implementation Issues CPSC 315 – Programming Studio Spring 2008 Project 1, Lecture 5 Slides adapted from those used by Jennifer Welch.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
Everest: scaling down peak loads through I/O off-loading D. Narayanan, A. Donnelly, E. Thereska, S. Elnikety, A. Rowstron Microsoft Research Cambridge,
Workload management in data warehouses: Methodology and case studies presentation to Information Resource Management Association of Canada January 2008.
PNUTS: YAHOO!’S HOSTED DATA SERVING PLATFORM FENGLI ZHANG.
Hadoop & Cheetah. Key words Cluster  data center – Lots of machines thousands Node  a server in a data center – Commodity device fails very easily Slot.
Lecture 11: DMBS Internals
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Extents, segments and blocks in detail. Database structure Database Table spaces Segment Extent Oracle block O/S block Data file logical physical.
Alireza Angabini Advanced DB class Dr. M.Rahgozar Fall 88.
Applications hitting a wall today with SQL Server Locking/Latching Scale-up Throughput or latency SLA Applications which do not use SQL Server.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Maximum Network Lifetime in Wireless Sensor Networks with Adjustable Sensing Ranges Cardei, M.; Jie Wu; Mingming Lu; Pervaiz, M.O.; Wireless And Mobile.
Physical Database Design I, Ch. Eick 1 Physical Database Design I About 25% of Chapter 20 Simple queries:= no joins, no complex aggregate functions Focus.
1 CS 430 Database Theory Winter 2005 Lecture 16: Inside a DBMS.
Module 10 Administering and Configuring SharePoint Search.
NoDB: Querying Raw Data --Mrutyunjay. Overview ▪ Introduction ▪ Motivation ▪ NoDB Philosophy: PostgreSQL ▪ Results ▪ Opportunities “NoDB in Action: Adaptive.
Srik Raghavan Principal Lead Program Manager Kevin Cox Principal Program Manager SESSION CODE: DAT206.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Authors Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana.
Operational Research & ManagementOperations Scheduling Economic Lot Scheduling 1.Summary Machine Scheduling 2.ELSP (one item, multiple items) 3.Arbitrary.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
Static Process Scheduling
1 Adaptive Parallelism for Web Search Myeongjae Jeon Rice University In collaboration with Yuxiong He (MSR), Sameh Elnikety (MSR), Alan L. Cox (Rice),
Introduction to Information Retrieval Introduction to Information Retrieval Lecture 4: Index Construction Related to Chapter 4:
Memory Management OS Fazal Rehman Shamil. swapping Swapping concept comes in terms of process scheduling. Swapping is basically implemented by Medium.
03/02/20061 Evaluating Top-k Queries Over Web-Accessible Databases Amelie Marian Nicolas Bruno Luis Gravano Presented By: Archana and Muhammed.
1 Efficient Computation of Diverse Query Results Erik Vee joint work with Utkarsh Srivastava, Jayavel Shanmugasundaram, Prashant Bhat, Sihem Amer Yahia.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Bigtable: A Distributed Storage System for Structured Data
1 Chapter 5: CPU Scheduling. 2 Basic Concepts Scheduling Criteria Scheduling Algorithms.
Log-Structured Memory for DRAM-Based Storage Stephen Rumble and John Ousterhout Stanford University.
Lecture 16: Data Storage Wednesday, November 6, 2006.
Introduction | Model | Solution | Evaluation
Chapter 6: CPU Scheduling
Chapter 6: CPU Scheduling
Database Implementation Issues
Disk Storage, Basic File Structures, and Buffer Management
Database Implementation Issues
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
DATABASE IMPLEMENTATION ISSUES
Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.
Data Placement Problems in Database Applications
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Database Implementation Issues
Computational Advertising and
Chapter 6: CPU Scheduling
Database Implementation Issues
Module 5: CPU Scheduling
Database Implementation Issues
Presentation transcript:

Ymir Vigfusson Adam Silberstein Brian Cooper Rodrigo Fonseca

Well understood database tasks are challenging again at web-scale – Millions of users – Terabytes of data – Low latency essential Shared-nothing databases – Scale by adding servers with local disks Introduction

PNUTS Workload: – Analogous to OLTP – Random reads and updates of individual records – Online applications Introduction

Important for web workloads – Time ordered data Range Queries Listing dateItem for sale Used cowbell Toyota Red Snuggie Ford Harpsicord Listing dateItem for sale Used cowbell Toyota Red Snuggie Ford Harpsicord New items

Important for web workloads – Time ordered data – Secondary indexes Range Queries PriceItem ID $ $ $1, $5, $10, Item IDItem 1727Used cowbell Toyota 971Red Snuggie 5578Harpsicord Ford Item IDItem 1727Used cowbell Toyota 971Red Snuggie 5578Harpsicord Ford PriceItem ID $ $ $1, $5, $10, IndexBase Table

Important for web workloads – Time ordered data – Secondary indexes – etc. Could scan large fraction of the table – Short time to first result is crucial – Possibly a large number of results – Predicate might be selective Range Queries FIND ITEMS UNDER $100 WHERE Category =‘Cars’ AND Quantity>0 FIND ITEMS OVER $100

Makes sense to parallelize queries – Tables are partitioned over many servers How parallel should a query be? – Too sequential: Slow rate of results – Too parallel: Client overloaded, wasting server resources Executing Range Queries

Makes sense to parallelize queries – Tables are partitioned over many servers How parallel should a query be? – Too sequential: Slow rate of results – Too parallel: Client overloaded, wasting server resources Executing Range Queries

PNUTS 101 RoutersStorage Units Partition map Load balancer Server monitor Partition Controller A B T L E Clients get set delete

Routers PNUTS + Range scans Clients Storage Units SCAN = query = ideal parallelism level = ideal server concurrency

– Equals number of disks (experimentally) – In our testbed, – When does parallelism help? range size: longer = more benefit query selectivity: fewer results = more benefit client speed: faster = more benefit – Diminishing returns Range query parameters = ideal parallelism level = ideal server concurrency

Determining As shown, many influential factors End-to-end insight: match rate of new results to client’s uptake rate % Client Capacity

Determining % Client Capacity As shown, many influential factors End-to-end insight: match rate of new results to client’s uptake rate Hard to measure client rate directly

On Heads Increase K by 1 Improvement? Keep increasing K No? Revert to old K On Tails Decrease K by 1 Performance same? Keep decreasing K No? Revert to old K Adaptive server allocation (ASA) Time Optimal K = 1 Measure client uptake speed If higher, increase K by 1 and repeat Client saturated, enter next phase Periodically, flip a coin

Adaptive server allocation (ASA) Time Real K = 1 Measure client uptake speed If higher, increase K by 1 and repeat Client saturated, enter next phase Periodically, flip a biased coin w.p. 2/3 On heads, increase or decrease K by 1 If no improvement, revert to old K Otherwise, keep K and go in the same direction next time we change it. A B T A B T L E 20M records 1024 partitions 100MB/partition 100 partitions/server Experimental Testbed x 4Clients Routers Storage Units x 1 x 10

Adaptive server allocation (ASA) x 1

Adaptive server allocation (ASA) x 4

Adaptive server allocation (ASA) x 4

Adaptive server allocation (ASA) x 2x 2 K=5 K=2K=8

Single Client Routers Clients Storage Units SCAN

Multiple Clients Clients Routers Storage Units SCAN

Multiple Clients Clients Routers Storage Units SCAN Scheduler

Scheduler Service Scan engine shares query information with central scheduler – Servers which hold needed partitions – Parallelism level Scheduler decides on who-should-go- where-when to minimize contention How do we schedule multiple queries?

Scheduler Service Scan engine shares query information with central scheduler – Servers which hold needed partitions – Parallelism level Scheduler decides on who-should-go- where-when to minimize contention How do we schedule multiple queries?

Scheduling Wishlist a)All queries should finish quickly b)Give preference to higher priority queries c)Prefer steady rate of results over bursts

Scheduling Wishlist a)All queries should finish quickly b)Give preference to higher priority queries c)Prefer steady rate of results over bursts Thm: The makespan of any greedy heuristic is at most 2∙OPT = Minimize makespan

Scheduling Heuristic a)All queries should finish quickly b)Give preference to higher priority queries c)Prefer steady rate of results over bursts Greedily schedule queries in order by Current idle period Priority preference parameter

Scheduler - Priority Priority=1 Priority=2 Priority=4

Scheduler – Result stream Priority=1 Priority=4

Scheduler All queries should finish quickly Give preference to higher priority queries Prefer steady rate of results over bursts

Supporting range queries at web-scale is challenging Too little: client suffers, too much: servers suffer Adaptive server allocation algorithm Greedy algorithm run by central scheduler Provably short makespan Respects priorities and provides a steady stream of results Our solution is entering production soon. How parallel should a query be? How do we schedule multiple queries?

Adaptive server allocation (ASA) x 1 K=2 K=3 K=5 K=8

Scheduling Wishlist a)All queries should finish quickly b)Give preference to higher priority queries c)Prefer steady rate of results over bursts Thm: The makespan of any greedy heuristic is at most 2∙OPT Thm: OPT is exactly = Minimize makespan

Scheduling Wishlist E.g., gold/silver/bronze customers Or minimum completion time = OPT Priority bias parameter Greater penalty for long idle time Set of idle time periods a)All queries should finish quickly b)Give preference to higher priority queries c)Prefer steady rate of results over bursts

Impact of parallelism Range size: Longer queries = more benefit from parallelism. – Diminishing returns Query Selectivity: Fewer results = more benefit from parallelism Client Rate: Faster clients = more benefit from parallelism – Saturated clients do not benefit from extra servers

Scheduler - Priority Greedily schedule queries in order by Current idle period Query lengths 1,2,4,8% Max. completion time Avg. completion time Time of first result

Scheduler - Priority Greedily schedule queries in order by Query lengths 1,2,4,8%Different priorities

Road map System and problem overview Adaptive server allocation Multi-query scheduling Conclusion

Experimental test bed Clients A B T L E Routers Storage Units x 1 x 10 x 4 20M records 1024 partitions 100MB/partition 100 partitions/server

Adaptive server allocation (ASA) x 4

PNUTS 101 Clients RoutersStorage Units Partition map Load balancer Server monitor Tablet Controller T A B L E A B TL E