Eddies: Continuously Adaptive Query Processing Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein.

Slides:



Advertisements
Similar presentations
Reinforcement Learning
Advertisements

Distributed Query Processing
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Planning under Uncertainty
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.
Parallel DBMS Slides adapted from textbook; from Joe Hellerstein; and from Jim Gray, Microsoft Research. DBMS Textbook Chapter 22.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Traffic Engineering With Traditional IP Routing Protocols
Scheduling for Embedded Real-Time Systems Amit Mahajan and Haibo.
1 CS 201 Compiler Construction Lecture 13 Instruction Scheduling: Trace Scheduler.
1 Query Optimization. 2 Why Optimize? Given a query of size n and a database of size m, how big can the output of applying the query to the database be?
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Adaptive Dataflow Joe Hellerstein UC Berkeley. Overview Trends Driving Adaptive Dataflow Lessons –networking flow control, event programming, app-level.
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Performance Issues in Adaptive Query Processing Fred Reiss U.C. Berkeley Database Group.
Query Evaluation and Optimization Main Steps 1.Translate into RA: select/project/join 2.Greedy optimization of RA: by pushing selection and projection.
1 Database Query Execution Zack Ives CSE Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
CS Reinforcement Learning1 Reinforcement Learning Variation on Supervised Learning Exact target outputs are not given Some variation of reward is.
Query Processing Presented by Aung S. Win.
Relational Database Performance CSCI 6442 Copyright 2013, David C. Roberts, all rights reserved.
1 Implementation of Relational Operations: Joins.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.
Access Path Selection in a Relational Database Management System Selinger et al.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
How Much Randomness Makes a Tool Randomized? Petr Fišer, Jan Schmidt Faculty of Information Technology Czech Technical University in Prague
PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
CS411 Database Systems Kazuhiro Minami 11: Query Execution.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
CS4432: Database Systems II Query Processing- Part 2.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
CSCI1600: Embedded and Real Time Software Lecture 33: Worst Case Execution Time Steven Reiss, Fall 2015.
An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington.
Query Processing CS 405G Introduction to Database Systems.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Concurrency and Performance Based on slides by Henri Casanova.
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
Proactive Re-optimization
Ripple Joins for Online Aggregation
Evaluation of Relational Operations: Other Operations
CSCI1600: Embedded and Real Time Software
Database Query Execution
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Implementation of Relational Operations
Evaluation of Relational Operations: Other Techniques
Adaptive Query Processing (Background)
CSCI1600: Embedded and Real Time Software
Evaluation of Relational Operations: Other Techniques
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

Eddies: Continuously Adaptive Query Processing Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein.

State-of-Art in Query Optimization Given: 1.Database state and statistics known a-priori 2.One (short) user query to process 3.Query may be run only once Query Processing: 1.A-priori decide on a (static) query plan 2.Run query using this one plan Also: 1.Possibly update statistics sometimes (in steady state)

Adaptive Systems: General Flavor Repeat: 1.Observe (model) environment 2.Use observation to choose behavior 3.Take action

Adaptivity in Current DBs Limited & coarse grain Repeat: 1.Observe (model) environment –runstats (once per week!!): model changes in data 2.Use observation to choose behavior –query optimization: fixes a single static query plan 3.Take action –query execution: blindly follow plan

Query Optimization –Adaptivity at a per-week frequency! Not suited for volatile environments

A Networking Problem!? Networks do dataflow! Significant history of adaptive techniques –E.g. TCP congestion control –E.g. routing But traditionally much lower function –Ship bitstreams –Minimal, fixed code Lately, moving up the foodchain? –app-level routing –active networks

Query Plans are Dataflow Programming model: iterators –old idea, widely used in DB query processing –object with three methods: Init(), GetNext(), Close() –input/output types –query plan: graph of iterators pipelining: iterators that return results before children Close()

Querying in Volatile Environments Federated query processors –No control over stats, performance, admin (DataJoiner) Shared-Nothing Systems –No control over “system balance” User control of running queries –No control over user interaction (online aggregation) Sensor Nets: the next killer app –No control over anything!

Varying … Computing resources –Data flows unpredictably from sources –Code performs unpredictably along flows –Continuous volatility due to many decentralized systems Data Characteristics –Distributions –Burstiness User preferences –What get fast –How much data

Toward Continuous Adaptivity Need much more frequent adaptivity –Goal: adapt per tuple of each relation?? –The traditional runstats-optimize-execute loop is far too coarse-grained –So, continuously perform all 3 functions, at runtime –Aim for adaptivity over best-case performance (as the later never exists for long)

Road Map Adaptive Query Processing Intra-join adaptivity –Synchronization Barriers –Moments of Symmetry Eddies –Encapsulated, adaptive dataflow

Adaptable Operators and Plans Moments of symmetry = query processing stage during which pipelined query operators or inputs can be easily reordered (with no or minimal state management) Synchronization barriers = require inputs from different sources to be coordinated and possibly restricted to the rate of the slower input We need “good” operators.

Adaptable Joins, Issue 1 Synchronization Barrier: merge join –Right input frozen, waiting for left –Can’t adapt while waiting for barrier! –So, favor joins that have: no barriers or seldom barriers at worst, adaptable barriers 

Adaptable Joins, Issue 2 Would like to reorder in-flight (pipelined) joins Base case: swap inputs to a join ?? Moment of symmetry: –inputs can be swapped with no/little state management Aim for frequent moments of symmetry  more frequent adaptivity

Adaptable Joins, Issue 2 Moments of Symmetry –Suppose you can adapt an in-flight query plan How would you do it? –Base case: reorder inputs of a single join Nested loops join R S R S S R

Moments of Symmetry –Suppose you can adapt an in-flight query plan How would you do it? –Base case: reorder inputs of a single join Nested loops join Cleaner if you wait til end of inner loop R S Adaptable Joins, Issue 2

Moments of Symmetry –Suppose you can adapt an in-flight query plan How would you do it? –Base case: reorder inputs of a single join Nested loops join Cleaner if you wait til end of inner loop –Hybrid Hash Reorder while “building”? R S

Moments of Symmetry, cont. Moment of Symmetry: –Can swap join inputs w/o state modification –Nested Loops join: end of each inner loop –Hybrid Hash join: never –Sort-Merge join: essentially always More frequent moments of symmetry  more frequent adaptivity

Joins for Adaptivity –Pipelined hash join (hash ripple or Xjoin) No synchronization barriers Continuous symmetry Good for equi-join –Simple (or block) ripple join Synchronization barriers at “corners” Moments of symmetry at “corners” Good for non-equi-join –When symmetry: At corners, i.e., for each “new” tuple, once it has been processed using the given operator ‘s state R S 

Beyond Binary Joins Think of swapping “inners” –Can be done at a global moment of symmetry Intuition: like an n-ary join –Except that each pair can be joined by a different algorithm! So… –Need to introduce n-ary joins to a query engine

Need well-behaved join algorithms –Pipelining –Avoid synch barriers –Frequent moments of symmetry

Continuous Adaptivity Goal: Eddies Avoid need for traditional cost estimation Avoid generation of a ‘good’ query plan Eddy

Continuous Adaptivity: Eddies A pipelining n-ary tuple-routing iterator (just like join or sort) –works well with ops that have frequent moments of symmetry Eddy

Continuous Adaptivity: Eddies Adjusts flow adaptively –Tuples flow in different orders –Visit each op once before output Eddy

Routing: Eddies Naïve routing policy: –All ops fetch from eddy as fast as possible –Previously-seen tuples precede new tuples Eddy

Schedule : Grab when Ready? –Two expensive selections s1 and s2 Selectivity(s1)=Selectivity(s2)=50% Cost(s2) = 5. Vary Cost(s1). –What expect? ? Does it make a difference at all?

Cost Factor? –Two expensive selections, 50% selectivity Cost(s2) = 5. Vary cost of s1. Favors faster operation

But is it Enough? –Given two expensive selections: Cost same, say cost(s1)=cost(s2)=5 Selectivity(s2) = 50%. Vary selectivity of s1. –Does that make a difference?

Selectivity-based? –Two expensive selections, cost 5 Selectivity(s2) = 50%. Vary selectivity of s1.

Schedule: Selectivity-based? –Conclude: Heavy tuple shedder early on is good.

How to choose? If we knew all selectivities and all costs (and they were static), maybe we could pick the best overall “schedule” here. Otherwise, we need a cheap means to observe their changes And, we need a means to react in a simply manner based on those perceived changes

An Aside: How to choose? A machine learning problem? –Each agent pays off differently –Explore Or Exploit? –Heuristics ? Sometimes want to randomly choose one Usually want to go with the best If probabilities are stationary, dampen exploration over time

Eddies with Lottery Scheduling Operator gets 1 ticket when it takes a tuple –Favor operators that run fast (low cost) Operator loses a ticket when it returns a tuple –Favor operators that drop tuples (low selectivity) Winner? –Large number of tickets == measure of goodness Lottery Scheduling: –When two operators vie for the same tuple, hold a lottery –Never let any operator go to zero tickets Support occasional random “exploration”

Lottery-Based Eddy –Two expensive selections, cost 5 Selectivity(s2) = 50%. Vary selectivity of s1.

In a Volatile Environment Two index joins –Slow: 5 second delay; Fast: no delay –Toggle after 30 seconds

Related Work –Late Binding: Dynamic, Parametric [HP88,GW89,IN+92,GC94,AC+96,LP97] –Per Query: Mariposa [SA+96], ASE [CR94] –Competition: RDB [AZ96] –Inter-Op: [KD98], Tukwila [IF+99] –Query Scrambling: [AF+96,UFA98] Survey: Hellerstein, Franklin, et al., DE Bulletin 2000 System R Late Binding Per Query Competition & Sampling Inter-Operator Query Scrambling Eddies Ingres DECOMP Frequency of Adaptivity Future Work

Summary Eddies: Continuously Adaptive Dataflow –Suited for volatile performance environments Changes in operator/machine peformance Changes in selectivities (e.g. with sorted inputs) Changes in data delivery –Currently adapts join order Competitive methods to adapt access & join methods? Requires well-behaved join algorithms –Pipelining –Avoid synch barriers –Frequent moments of symmetry The end of the runstats/optimizer/executor boundary! –At best, System R is good for “hints” on initial ticket distribution