Adaptive Query Processing Part II Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems October 23, 2008 Slides co-authored.

Slides:



Advertisements
Similar presentations
Query Optimization Reserves Sailors sid=sid bid=100 rating > 5 sname (Simple Nested Loops) Imperative query execution plan: SELECT S.sname FROM Reserves.
Advertisements

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.
EXECUTION PLANS By Nimesh Shah, Amit Bhawnani. Outline  What is execution plan  How are execution plans created  How to get an execution plan  Graphical.
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Adaptive Query Processing with Eddies Amol Deshpande University of Maryland.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Optimization Goal: Declarative SQL query
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems January 26, 2005 Content on hashing.
Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.
Amol Deshpande, University of Maryland Zachary G. Ives, University of Pennsylvania Vijayshankar Raman, IBM Almaden Research Center Thanks to Joseph M.
Interactive query processing partial query results dynamic user control during query execution adaptive query execution interactive data cleaning and transformation.
Adaptive Query Processing with Eddies Amol Deshpande University of Maryland.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
1 Adaptive Query Processing | Dec 22, 2005 Adaptive Query Processing Amol Deshpande, University of Maryland Vijayshankar Raman, IBM Almaden Research Center.
CMSC724: Database Management Systems Instructor: Amol Deshpande
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
Adapted from a Tutorial given at VLDB 2007 by: Amol Deshpande, University of Maryland Zachary G. Ives, University of Pennsylvania Vijayshankar Raman, IBM.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Query Optimization Chapter 15.
Adaptive Query Processing Amol Deshpande, University of Maryland Joseph M. Hellerstein, University of California, Berkeley Vijayshankar Raman, IBM Almaden.
Query Optimization Overview Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems December 2, 2004 Some slide content derived.
Adaptive Query Processing Adapted from a Tutorial given at SIGMOD 2006 by: Amol Deshpande, University of Maryland Joseph M. Hellerstein, University of.
1 Relational Operators. 2 Outline Logical/physical operators Cost parameters and sorting One-pass algorithms Nested-loop joins Two-pass algorithms.
Query Processing Presented by Aung S. Win.
Query Optimization, part 2 CS634 Lecture 13, Mar Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Access Path Selection in a Relational Database Management System Selinger et al.
Query Optimization. overview Histograms A histogram is a data structure maintained by a DBMS to approximate a data distribution Equiwidth vs equidepth.
Optimizing Query Execution Zachary G. Ives University of Pennsylvania CIS 650 – Implementing Data Management Systems September 18, 2008 Content on hashing.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.
Unit 2 Architectural Styles and Case Studies | Website for Students | VTU NOTES | QUESTION PAPERS | NEWS | RESULTS 1.
Lecture 15- Parallel Databases (continued) Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
Joseph M. Hellerstein Peter J. Haas Helen J. Wang Presented by: Calvin R Noronha ( ) Deepak Anand ( ) By:
Introduction to Query Optimization, R. Ramakrishnan and J. Gehrke 1 Introduction to Query Optimization Chapter 13.
CS4432: Database Systems II Query Processing- Part 2.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Computing & Information Sciences Kansas State University Wednesday, 08 Nov 2006CIS 560: Database System Concepts Lecture 32 of 42 Monday, 06 November 2006.
CS 540 Database Management Systems
By: Peter J. Haas and Joseph M. Hellerstein published in June 1999 : Presented By: Sthuti Kripanidhi 9/28/20101 CSE Data Exploration.
Query Processing and Query Optimization Database System Implementation CSE 507 Some slides adapted from Silberschatz, Korth and Sudarshan Database System.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
CS 540 Database Management Systems
CS 440 Database Management Systems
Adaptive Query Processing Part I
Applying Control Theory to Stream Processing Systems
CS222P: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Database Performance Tuning and Query Optimization
Introduction to Query Optimization
Evaluation of Relational Operations: Other Operations
Database Query Execution
Outline Introduction Background Distributed DBMS Architecture
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#13: Query Evaluation
Overview of Query Evaluation
Chapter 11 Database Performance Tuning and Query Optimization
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Overview of Query Evaluation: JOINS
Yan Huang - CSCI5330 Database Implementation – Query Processing
CS222: Principles of Data Management Lecture #15 Query Optimization (System-R) Instructor: Chen Li.
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

Adaptive Query Processing Part II Zachary G. Ives University of Pennsylvania CIS 650 – Database & Information Systems October 23, 2008 Slides co-authored by Amol Deshpande & Vijayshankar Raman

Last Time…  We considered two classes of techniques for adaptive query processing:  Partitioning the query plan into stages  Reordering selection operations  Today we discuss reordering joins, without staging execution 2

Pipelined Execution Part II: Adaptive Join Processing Title slide

Adaptive Join Processing: Outline  Single streaming relation  Left-deep pipelined plans  Multiple streaming relations  Execution strategies for multi-way joins  History-independent execution  History-dependent execution

Left-Deep Pipelined Plans Simplest method of joining tables  Pick a driver table (R). Call the rest driven tables  Pick access methods (AMs) on the driven tables (scan, hash, or index)  Order the driven tables  Flow R tuples through the driven tables For each r  R do: look for matches for r in A; for each match a do: look for matches for in B; … R B NLJ C A

Adapting a Left-deep Pipelined Plan Simplest method of joining tables  Pick a driver table (R). Call the rest driven tables  Pick access methods (AMs) on the driven tables  Order the driven tables  Flow R tuples through the driven tables For each r  R do: look for matches for r in A; for each match a do: look for matches for in B; … Almost identical to selection ordering R B NLJ C A

Adapting the Join Order  Let c i = cost/lookup into i’th driven table, s i = fanout of the lookup  As with selection, cost = |R| x (c 1 + s 1 c 2 + s 1 s 2 c 3 )  Caveats:  Fanouts s 1,s 2,… can be > 1  Precedence constraints  Caching issues  Can use rank ordering, A-greedy for adaptation (subject to the caveats) R B NLJ C A R C B A (c 1, s 1 )(c 2, s 2 )(c 3, s 3 )

Adapting a Left-deep Pipelined Plan Simplest method of joining tables  Pick a driver table (R). Call the rest driven tables  Pick access methods (AMs) on the driven tables  Order the driven tables  Flow R tuples through the driven tables For each r  R do: look for matches for r in A; for each match a do: look for matches for in B; … ? R B NLJ C A

Adapting a Left-deep Pipelined Plan  Key issue: Duplicates  Adapting the choice of driver table: [L+07] carefully use indexes to achieve this  Adapting the choice of access methods  Static optimization: explore all possibilities and pick best  Adaptive: Run multiple plans in parallel for a while, and then pick one and discard the rest [Antoshenkov’ 96]  Cannot easily explore combinatorial options  SteMs [RDH’03] handle both as well R B NLJ C A

Adaptive Join Processing: Outline  Single streaming relation  Left-deep pipelined plans  Multiple streaming relations  Execution strategies for multi-way joins  History-independent execution  MJoins  SteMs  History-dependent execution  Eddies with joins  Corrective query processing

Example Join Query & Database NameLevel JoeJunior JenSenior NameCourse JoeCS1 JenCS2 CourseInstructor CS2Smith select * from students, enrolled, courses where students.name = enrolled.name and enrolled.course = courses.course Students Enrolled NameLevelCourse JoeJuniorCS1 JenSeniorCS2 Enrolled Courses Students Enrolled Courses NameLevelCourseInstructor JenSeniorCS2Smith

Symmetric/Pipelined Hash Join [RS86, WA91] NameLevel JenSenior JoeJunior NameCourse JoeCS1 JenCS2 JoeCS2 select * from students, enrolled where students.name = enrolled.name NameLevelCourse JenSeniorCS2 JoeJuniorCS1 JoeSeniorCS2 Students Enrolled  Simultaneously builds and probes hash tables on both sides  Widely used:  adaptive query processing  stream joins  online aggregation ……  Naïve version degrades to NLJ once memory runs out  Quadratic time complexity  memory needed = sum of inputs  Improved by XJoins [UF 00], Tukwila DPJ [IFFLW 99]

Multi-way Pipelined Joins over Streaming Relations Three alternatives  Using binary join operators  Using a single n-ary join operator (MJoin) [VNB’03]  Using unary operators [RDH’03]

NameLevel JenSenior JoeJunior NameCourse JoeCS1 JenCS2 Enrolled HashTable E.Name HashTable S.Name Students CourseInstructor CS2Smith HashTable E.Course HashTable C.course Courses NameLevelCourse JenSeniorCS2 JoeJuniorCS1 NameLevelCourseInstructor JenSeniorCS2Smith Materialized state that depends on the query plan used History-dependent ! JenSeniorCS2

Multi-way Pipelined Joins over Streaming Relations Three alternatives  Using binary join operators  History-dependent execution  Hard to reason about the impact of adaptation  May need to migrate the state when changing plans  Using a single n-ary join operator (MJoin) [VNB’03]  Using unary operators [RDH’03]

NameCourse JoeCS1 JenCS2 NameLevel JoeJunior JenSenior Students HashTable S.Name HashTable E.Name Enrolled NameLevelCourseInstructor JenSeniorCS2Smith NameCourse JoeCS1 JenCS2 HashTable E.Course HashTable C.course Courses Probing Sequences Students tuple: Enrolled, then Courses Enrolled tuple: Students, then Courses Courses tuple: Enrolled, then Students Probe Hash tables contain all tuples that arrived so far Irrespective of the probing sequences used History-independent execution ! CourseInstructor CS2Smith JenCS2Smith JenCS2Senior

Multi-way Pipelined Joins over Streaming Relations Three alternatives  Using binary join operators  History-dependent execution  Using a single n-ary join operator (MJoin) [VNB’03]  History-independent execution  Well-defined state easy to reason about  Especially in data stream processing  Performance may be suboptimal [DH’04]  No intermediate tuples stored  need to recompute  Using unary operators [RDH’03]

Breaking the Atomicity of Probes and Builds in an N-ary Join [RDH’03] NameLevel JenSenior JoeJunior NameCourse JoeCS1 JenCS2 JoeCS2 Students HashTable S.Name HashTable E.Name Enrolled NameLevel JenSenior JoeJunior HashTable E.Course NameLevel JenSenior JoeJunior HashTable C.course Courses Eddy SteM S SteM E SteM C

Multi-way Pipelined Joins over Streaming Relations Three alternatives  Using binary join operators  History-dependent execution  Using a single n-ary join operator (MJoin) [VNB’03]  History-independent execution  Well-defined state easy to reason about  Especially in data stream processing  Performance may be suboptimal [DH’04]  No intermediate tuples stored  need to recompute  Using unary operators [RDH’03]  Similar to MJoins, but enables additional adaptation

Adaptive Join Processing: Outline  Single streaming relation  Left-deep pipelined plans  Multiple streaming relations  Execution strategies for multi-way joins  History-independent execution  MJoins  SteMs  History-dependent execution  Eddies with joins  Corrective query processing

MJoins [VNB’03] Choosing probing sequences  For each relation, use a left-deep pipelined plan (based on hash indexes)  Can use selection ordering algorithms Independently for each relation Adapting MJoins  Adapt each probing sequence independently e.g., StreaMon [BW’01] used A-Greedy for this purpose A-Caching [BMWM’05]  Maintain intermediate caches to avoid recomputation  Alleviates some of the performance concerns

State Modules (SteMs) [RDH’03] SteM is an abstraction of a unary operator  Encapsulates the state, access methods and the operations on a single relation By adapting the routing between SteMs, we can  Adapt the join ordering (as before)  Adapt access method choices  Adapt join algorithms  Hybridized join algorithms  e.g. on memory overflow, switch from hash join  index join  Much larger space of join algorithms  Adapt join spanning trees Also useful for sharing state across joins  Advantageous for continuous queries [MSHR’02, CF’03]

Adaptive Join Processing: Outline  Single streaming relation  Left-deep pipelined plans  Multiple streaming relations  Execution strategies for multi-way joins  History-independent execution  MJoins  SteMs  History-dependent execution  Eddies with binary joins  State management using STAIRs  Corrective query processing

Eddies with Binary Joins [AH’00] StudentsEnrolled Output Courses E C S E Eddy S E C S E E C Output S.Name like “..” s1 For correctness, must obey routing constraints !!

Eddies with Binary Joins [AH’00] StudentsEnrolled Output Courses E C S E Eddy S E C S E E C Output S.Name like “..” e1 For correctness, must obey routing constraints !!

Eddies with Binary Joins [AH’00] StudentsEnrolled Output Courses E C S E Eddy S E C S E E C Output S.Name like “..” e1c1 For correctness, must obey routing constraints !! Use some form of tuple-lineage

Eddies with Binary Joins [AH’00] StudentsEnrolled Output Courses E C S E Eddy S E C S E E C Output S.Name like “..” Can use any join algorithms But, pipelined operators preferred Provide quick feedback

Eddies with Symmetric Hash Joins Eddy S E C Output S E HashTable S.Name HashTable E.Name E C HashTable E.Course HashTable C.Course JoeJr JenSr CS2Smith JoeCS1 JoeJrCS1 JenCS2 JenCS2Smith

Burden of Routing History [DH’04] Eddy S E C Output S E HashTable S.Name HashTable E.Name E C HashTable E.Course HashTable C.Course JoeJr JenSr CS2Smith JoeCS1 JoeJrCS1 JenCS2 JenCS2Smith As a result of routing decisions, state gets embedded inside the operators History-dependent execution !!

Modifying State: STAIRs [DH’04] Observation:  Changing the operator ordering not sufficient  Must allow manipulation of state New operator: STAIR  Expose join state to the eddy  By splitting a join into two halves  Provide state management primitives  That guarantee correctness of execution  Able to lift the burden of history  Enable many other adaptation opportunities  e.g. adapting spanning trees, selective caching, pre- computation

Recap: Eddies with Binary Joins Routing constraints enforced using tuple-level lineage Must choose access methods, join spanning tree beforehand  SteMs relax this restriction [RDH’03] The operator state makes the behavior unpredictable  Unless only one streaming relation Routing policies explored are same as for selections  Can tune policy for interactivity metric [RH’02]

Adaptive Join Processing: Outline  Single streaming relation  Left-deep pipelined plans  Multiple streaming relations  Execution strategies for multi-way joins  History-independent execution  MJoins  SteMs  History-dependent execution  Eddies with binary joins  State management using STAIRs  Corrective query processing

F(fid,from,to,when) F 0 T(ssn,flight) T 0 C(parent,num) C 0 F 1 T 1 C 1 Carefully Managing State: Corrective Query Processing (CQP) [I’02,IHW’04] Focus on stateful queries:  Join cost grows over time  Early: few tuples join  Late: may get x-products  Group-by may not produce output until end Consider long-term cost, switch in mid-pipeline  Optimize with cost model  Use pipelining operators  Measure cardinalities, compare to estimates  Replan when different  Execute on new data inputs Stitch-up phase computes cross- phase results Group[fid,from] max(num) F 0 T 0 C 0 Plan0 1 C T 1 F 1 1 Shared Group- by Operator U T 0 C 0 T 1 C 1 F 0 F 1 T 0 C 0 Except T 0 C 0 F 0 T 1 C 1 F 1 Stitch-up Plan SELECT fid, from, max(num) FROM F, T, C WHERE fid=flight AND parent=ssn GROUP BY fid, from

CQP Discussion Each plan operates on a horizontal partition: Clean algebraic interpretation! Easy to extend to more complex queries  Aggregation, grouping, subqueries, etc. Separates two factors, conservatively creates state:  Scheduling is handled by pipelined operators  CQP chooses plans using long-term cost estimation  Postpones cross-phase results to final phase Assumes settings where computation cost, state are the bottlenecks  Contrast with STAIRS, which move state around once it’s created!

Putting it all in Context

How Do We Understand the Relationship between Techniques? Several different axes are useful:  When are the techniques applicable?  Adaptive selection ordering  History-independent joins  History-dependent joins  How do they handle the different aspects of adaptivity?  How to EXPLAIN adaptive query plans?

Adaptivity Loop Measure what ? Cardinalities/selectivities, operator costs, resource utilization Measure when ? Continuously (eddies); using a random sample (A-greedy); at materialization points (mid-query reoptimization) Measurement overhead ? Simple counter increments (mid-query) to very high Actuate Plan Analyze Measure

Adaptivity Loop Analyze/replan what decisions ? (Analyze actual vs. estimated selectivities) Evaluate costs of alternatives and switching (keep state in mind) Analyze / replan when ? Periodically; at materializations (mid-query); at conditions (A-greedy) Plan how far ahead ? Next tuple; batch; next stage (staged); possible remainder of plan (CQP) Planning overhead ? Switch stmt (parametric) to dynamic programming (CQP, mid-query) Actuate Measure Plan Analyze

Adaptivity Loop Actuation: How do they switch to the new plan/new routing strategy ? Actuation overhead ? At the end of pipelines  free (mid-query) During pipelines: History-independent  Essentially free (selections, MJoins) History-dependent  May need to migrate state (STAIRs, CAPE) Measure Plan Analyze Actuate

Adaptive Query Processing “Plans”: Post-Mortem Analyses After an adaptive technique has completed, we can explain what it did over time in terms of data partitions and relational algebra e.g., a selection ordering technique may effectively have partitioned the input relation into multiple partitions… … where each partition was run with a different order of application of selection predicates  These analyses highlight understanding how the technique manipulated the query plan

Research Roundup

Measurement & Models Combining static and runtime measurement Finding the right model granularity / measurement timescale  How often, how heavyweight? Active probing? Dealing with correlation in a tractable way There are clear connections here to:  Online algorithms  Machine learning and control theory  Bandit problems  Reinforcement learning  Operations research scheduling

Understanding Execution Space Identify the “complete” space of post-mortem executions:  Partitioning  Caching  State migration  Competition & redundant work  Sideways information passing  Distribution / parallelism! What aspects of this space are important? When?  A buried lesson of AQP work: “non-Selingerian” plans can win big!  Can we identify robust plans or strategies? Given this (much!) larger plan space, navigate it efficiently  Especially on-the-fly

Wrap-up Adaptivity is the future (and past!) of query processing Lessons and structure emerging  The adaptivity “loop” and its separable components Relationship between measurement, modeling / planning, actuation  Horizontal partitioning “post-mortems” as a logical framework for understanding/explaining adaptive execution in a post-mortem sense  Selection ordering as a clean “kernel”, and its limitations  The critical and tricky role of state in join processing A lot of science and engineering remain!!!