Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley.

Slides:



Advertisements
Similar presentations
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Advertisements

CS 4432query processing - lecture 161 CS4432: Database Systems II Lecture #16 Join Processing Algorithms Professor Elke A. Rundensteiner.
CS 540 Database Management Systems
CS CS4432: Database Systems II Operator Algorithms Chapter 15.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Online Aggregation Joe Hellerstein UC Berkeley Online Aggregation: Motivation Select AVG(grade) from ENROLL; A “fancy” interface: + Query Results AVG.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Information Capture and Re-Use Joe Hellerstein. Scenario Ubiquitous computing is more than clients! –sensors and their data feeds are key –smart dust.
Continuously Adaptive Processing of Data and Query Streams Michael Franklin UC Berkeley April 2002 Joint work w/Joe Hellerstein and the Berkeley DB Group.
Midterm Review Spring Overview Sorting Hashing Selections Joins.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
Federated Facts and Figures Joseph M. Hellerstein UC Berkeley.
Query Processing (overview)
Interactive query processing partial query results dynamic user control during query execution adaptive query execution interactive data cleaning and transformation.
Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.
Telegraph: An Adaptive Global- Scale Query Engine Joe Hellerstein.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
Adaptive Dataflow Joe Hellerstein UC Berkeley. Overview Trends Driving Adaptive Dataflow Lessons –networking flow control, event programming, app-level.
1 Parallel DBMS Slides by Joe Hellerstein, UCB, with some material from Jim Gray, Microsoft Research. See also:
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Towards Adaptive Dataflow Infrastructure Joe Hellerstein, UC Berkeley.
Streaming Data, Continuous Queries, and Adaptive Dataflow Michael Franklin UC Berkeley NRC June 2002.
Telegraph: A Universal System for Information. Telegraph History & Plans Initial Vision –Carey, Hellerstein, Stonebraker –“Regres”, “B-1” Sweat, ideas.
Telegraph: Ideas & Status. Overview Folks –Amol Deshpande, Mohan Lakhamraju, VijayShankar Raman –Rob von Behren, Steve Gribble, Matt Welsh –Kris Hildrum.
CS 4432query processing - lecture 171 CS4432: Database Systems II Lecture #17 Join Processing Algorithms (cont). Professor Elke A. Rundensteiner.
Eddies: Continuously Adaptive Query Processing Based on a SIGMOD’2002 paper and talk by Avnur and Hellerstein.
1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.
CONTROL group Joe Hellerstein, Ron Avnur, Christian Hidber, Bruce Lo, Chris Olston, Vijayshankar Raman, Tali Roth, Kirk Wylie, UC Berkeley CONTROL: Continuous.
1 Database Query Execution Zack Ives CSE Principles of DBMS Ullman Chapter 6, Query Execution Spring 1999.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Parallel DBMS Instructor : Marina Gavrilova
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
Telegraph Continuously Adaptive Dataflow Joe Hellerstein.
1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
Online aggregation Joseph M. Hellerstein University of California, Berkley Peter J. Haas IBM Research Division Helen J. Wang University of California,
Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Nag Prajval B.C.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
MapReduce and Data Management Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Eddies: Continuously Adaptive Query Processing Ross Rosemark.
CS4432: Database Systems II Query Processing- Part 2.
An Adaptive Query Execution Engine for Data Integration Zachary Ives, Daniela Florescu, Marc Friedman, Alon Levy, Daniel S. Weld University of Washington.
Query Processing CS 405G Introduction to Database Systems.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Implementation of Database Systems, Jarek Gryz 1 Parallel DBMS Chapter 22, Part A.
Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.
Ripple Joins for Online Aggregation
Telegraph: An Adaptive Global-Scale Query Engine
Evaluation of Relational Operations: Other Operations
Database Query Execution
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Chapters 15 and 16b: Query Optimization
Lecture 2- Query Processing (continued)
Implementation of Relational Operations
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Sorting We may build an index on the relation, and then use the index to read the relation in sorted order. May lead to one disk block access for each.
Evaluation of Relational Operations: Other Techniques
Information Capture and Re-Use
Parallel DBMS DBMS Textbook Chapter 22
Presentation transcript:

Eddies: Continuously Adaptive Query Processing Ron Avnur Joseph M. Hellerstein UC Berkeley

Road Map Adaptive Query Processing: Setting Intra-join adaptivity –Synchronization Barriers –Moments of Symmetry Eddies –Encapsulated, adaptive dataflow Future Work

Querying in Volatile Environments Federated query processors a reality –Cohera, DataJoiner, RDBMSs –No control over stats, performance, administration Shared-Nothing Systems “Scaling Out” –E.g. NOW-Sort –No control over “system balance” User “CONTROL” of running queries –E.g. Online Aggregation –No control over user interaction Sensor Nets: the next killer app –E.g. “Smart Dust” –No control over anything! Telegraph –Engine for these environments

Toward Continuous Adaptivity Adaptivity in System R: Repeat: 1.Observe (model) environment: daily/weekly (runstats) 2.Use observation to choose behavior (optimizer) 3.Take action (executor) –Adaptivity at a per-week frequency! Not suited for volatile environments Need much more frequent adaptivity –Goal: adapt per tuple of each relation –The traditional runstats-optimize-execute loop is far too coarse-grained –So, continuously perform all 3 functions, at runtime

Adaptable Joins, Issue 1 Synchronization Barriers –One input frozen, waiting for the other –Can’t adapt while waiting for barrier! –So, favor joins that have: no barriers at worst, adaptable barriers 

Adaptable Joins, Issue 2 Would like to reorder in-flight (pipelined) joins Base case: swap inputs to a join –What about per-input state? Moment of symmetry: –inputs can be swapped w/o state management E.g. –Nested Loops: at the end of each inner loop –Merge Join: any time* –Hybrid or Grace Hash: never! More frequent moments of symmetry  more frequent adaptivity

Ripple Joins: Prime for Adaptivity Ripple Joins –Pipelined hash join (a.k.a. hash ripple, Xjoin) No synchronization barriers Continuous symmetry Good for equi-join –Simple (or block) ripple join Synchronization barriers at “corners” Moments of symmetry at “corners” Good for non-equi-join –Index nested loops Short barriers No symmetry Note: Ripple corners are adaptable! –Accommodate barriers in simple/block ripple R S 

Beyond Binary Joins Think of swapping “inners” –A la KBZ/IK optimizers –Can be done at a global moment of symmetry Intuition: like an n-ary join –Except that each pair can be joined by a different algorithm! So… –Need to introduce n-ary joins to a traditional query engine

Continuous Adaptivity: Eddies A pipelining tuple-routing iterator (just like join or sort) –works well with ops that have frequent moments of symmetry Eddy

Continuous Adaptivity: Eddies Adjusts flow adaptively –Tuples flow in different orders –Visit each op once before output Naïve routing policy: –All ops fetch from eddy as fast as possible –Previously-seen tuples precede new tuples Eddy

Back-Pressure –Two expensive selections, 50% selectivity Cost(s2) = 5. Vary cost of s1. Backpressure favors faster op!

Back-Pressure Not Enough! –Two expensive selections, cost 5 Selectivity(s2) = 50%. Vary selectivity of s1.

An Aside: n-Arm Bandits A little machine learning problem: –Each arm pays off differently –Explore? Or Exploit? Sometimes want to randomly choose an arm Usually want to go with the best If probabilities are stationary, dampen exploration over time

Eddies with Lottery Scheduling Operator gets 1 ticket when it takes a tuple –Favor operators that run fast (low cost) Operator loses a ticket when it returns a tuple –Favor operators with low selectivity Lottery Scheduling: –When two ops vie for the same tuple, hold a lottery –Never let any operator go to zero tickets Support occasional random “exploration”

Lottery-Based Eddy –Two expensive selections, cost 5 Selectivity(s2) = 50%. Vary selectivity of s1.

In a Volatile Environment Two index joins –Slow: 5 second delay; Fast: no delay –Toggle after 30 seconds

Related Work –Late Binding: Dynamic, Parametric [HP88,GW89,IN+92,GC94,AC+96,LP97] –Per Query: Mariposa [SA+96], ASE [CR94] –Competition: RDB [AZ96] –Inter-Op: [KD98], Tukwila [IF+99] –Query Scrambling: [AF+96,UFA98] Survey: Hellerstein, Franklin, et al., DE Bulletin 2000 System R Late Binding Per Query Competition & Sampling Inter-Operator Query Scrambling Eddies Ingres DECOMP Frequency of Adaptivity Future Work

Tune & formalize ticket policy –E.g., Handle delayed sources better –Joint work w/ Hildrum, Papadimitriou, Russell, Sinclair Competitive Eddies –Access & Join method selection –Requires Duplicate Management Parallelism –Eddies + Rivers [AAT+99] Eddy R2R1 R3 S1S2 S3  hash blockindex1  index2

Summary Eddies: Continuously Adaptive Dataflow –Suited for volatile performance environments Changes in operator/machine peformance Changes in selectivities (e.g. with sorted inputs) Changes in data delivery Changes in user behavior (CONTROL, e.g. online agg) –Currently adapts join order Competitive methods to adapt access & join methods? Requires well-behaved join algorithms –Pipelining –Avoid synch barriers –Frequent moments of symmetry The end of the runstats/optimizer/executor boundary! –At best, System R is good for “hints” on initial ticket distribution

Backup slides The following slides are extra, in case of questions

Telegraph Today’s Focus: adaptive query processing –continuous adaptivity for volatile environments performance –adaptive configuration manageability & scalability –interactivity and partial results streaming results and the corresponding HCI issues Other issues –integrated storage ACID xacts, , files, etc. –exporting dataflow out of query processing –your favorite semantic integration problems wrapping via Cohera Net Query data cleaning via CONTROL project’s “Potter’s Wheel”

Adaptable Joins, Issue 2 Moments of Symmetry –Suppose you can adapt an in-flight query plan How would you do it? –Base case: reorder inputs of a single join Nested loops join R S R S S R

Moments of Symmetry –Suppose you can adapt an in-flight query plan How would you do it? –Base case: reorder inputs of a single join Nested loops join Cleaner if you wait til end of inner loop R S Adaptable Joins, Issue 2

Moments of Symmetry –Suppose you can adapt an in-flight query plan How would you do it? –Base case: reorder inputs of a single join Nested loops join Cleaner if you wait til end of inner loop –Hybrid Hash Reorder while “building”? R S

Moments of Symmetry, cont. Moment of Symmetry: –Can swap join inputs w/o state modification –Nested Loops join: end of each inner loop –Hybrid Hash join: never! –Sort-Merge join: essentially always But alas, has barrier problems More frequent moments of symmetry  more frequent adaptivity