Download presentation
Presentation is loading. Please wait.
1
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ. of Wisconsin, Madison)
2
IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing (AQP) Systems: Publication Timeline …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Introduction
3
AQP FamiliesComparisonNew IdeasConclusions Motivation Plenty of recent work on Adaptive Query Processing (AQP) in different contexts –Conventional DBMS query processing, data integration, continuous queries in stream systems No exhaustive, in-depth categorization and comparison of AQP systems to date Difficult to answer questions like: –Will techniques from one system work on another? –What are the shortcomings of each system? –Which system is best for a new application domain? Introduction
4
AQP FamiliesComparisonNew IdeasConclusions Our Contributions Detailed study of current AQP systems Classification of AQP systems into 3 families Comparison across families in terms of AQP tasks Identification of shortcomings & new approaches to address them Introduction
5
AQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned
6
IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Traditional Query Processing Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Introduction Statistics Tracker: Creates/updates stats Runstats
7
IntroductionAQP FamiliesComparisonNew IdeasConclusions Need for Adaptive Query Processing Introduction Correlated & skewed data distributions Errors in stats estimates, optimizer mistakes Detect plan suboptimality, re-optimize Stats & system conditions may change while query is running Monitor for changes, re-optimize Continuous queries, long-running queries AQP is integral to the current CS-wide push towards autonomic computing
8
IntroductionAQP FamiliesComparisonNew IdeasConclusions Our Focus: AQP for a Single Query Introduction AQP System: –A system that interleaves the optimization and execution aspects of query processing, possibly multiple times, during the processing of a single query
9
IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned
10
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP System Families Plan-based AQP systems –AQP for traditional plan-based DBMSs Continuous-Query-based (CQ-based) AQP systems –AQP for long-running continuous queries over data streams Routing-based AQP systems –AQP for DBMSs and continuous queries based on adaptive tuple routing AQP Families
11
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Plan-based Systems Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats + Extra operators Collected stats AQP Families
12
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Plan-based Systems Optimizer: Chooses best plan Query Catalog (Original + observed stats) Uses stats to cost plans Executor: Runs chosen plan Chosen plan Statistics Tracker: Creates/updates stats Runstats + Extra operators Collected stats AQP Families Re-optimize
13
IntroductionAQP FamiliesComparisonNew IdeasConclusions Example Plan-based AQP Systems …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres AQP Families
14
IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Continuous Query Processing Continuous Queries (CQs) are long-running queries usually over data streams –Example CQ: Filtering packet streams Stream properties or system conditions may change while query is running best plan may change σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets AQP Families
15
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Chooses best plan Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (table sizes, histograms) Statistics Tracker: Creates/updates stats Runstats Uses stats to cost plans
16
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Chooses best plan Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans
17
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Optimizer: Ensures that plan is best for current stats Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Uses stats to cost plans
18
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in CQ-based Systems Continuous Query Executor: Runs chosen plan Chosen plan AQP Families Catalog (stream rates, data distr.) Statistics Tracker: Monitors stream stats and system conditions Stats to track Re-optimize Combined in-part for efficiency Uses stats to cost plans Optimizer: Ensures that plan is best for current stats
19
IntroductionAQP FamiliesComparisonNew IdeasConclusions …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example CQ-based AQP Systems AQP Families
20
IntroductionAQP FamiliesComparisonNew IdeasConclusions Primer on Routing-based Processing Non-plan-based architecture where tuples are routed individually through operators No optimizer Exemplified by Eddies [AH00] AQP Families σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Using a plan σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Tuple Router Using tuple routing
21
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Routing-based Systems Executor: Runs chosen plan Chosen plan AQP Families Optimizer: Chooses best plan Query Catalog (table sizes, histograms) Statistics Tracker: Creates/updates stats Runstats Uses stats to cost plans
22
IntroductionAQP FamiliesComparisonNew IdeasConclusions AQP in Routing-based Systems Tuple Router: Integrated Optimizer & Stats Tracker Query or Continuous Query AQP Families Executor: Runs chosen plan Chosen plan Executor: Pool of operators Selective routing of tuples In-memory catalog (operator costs, selectivities, etc.) Uses stats to choose efficient routes
23
IntroductionAQP FamiliesComparisonNew IdeasConclusions …197619771989199019911992199319941995199619971998199920002001200220032004 Parametric opt. RedBrick DEC-Rdb Query Scrambling Re-Opt Tukwila River DQE Conquest Expected cost opt. Pipeline sch. Memory adap. POP CAPE Corrective processing Eddies NiagaraCQ STREAM Ingres Example Routing-based AQP Systems AQP Families
24
IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned
25
IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability
26
IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparison Across AQP System Families Goal: To bring out AQP algorithms and features, not performance numbers Comparison Models, assumptions, and approach Techniques for tracking statistics Re-optimization subtasks When and how to re-optimize Switching between plans Pros & cons of using a conventional optimizer Performance issues Quality of re-optimization Run-time overhead & thrashing Scalability
27
IntroductionAQP FamiliesComparisonNew IdeasConclusions Techniques for Tracking Statistics Observation –Mostly in Plan-based systems Competition –Mostly in Plan-based systems Profiling –Mostly in CQ-based systems Exploration –In Routing-based systems Comparison
28
IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Observation [KD98] Collect statistics on operator behavior or intermediate subexpressions in a plan Comparison σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Selectivity of 1 on input stream can be observed here
29
IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Competition [A93] Extra processing to collect statistics Comparison Packets σ1σ1 σ2σ2 σ3σ3 Chosen packets Selectivity of on input stream σ2σ2 Selectivity of on input stream
30
IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Profiling [BMM + 04] Extra processing on a fraction of the input tuples (e.g., a random sample) to collect statistics Builds a “statistical profile” that can be used to estimate many individual statistics Comparison σ1σ1 σ2σ2 σ3σ3 Profiled tuples
31
IntroductionAQP FamiliesComparisonNew IdeasConclusions Tracking Statistics: Exploration [AH00] A fraction of tuples are routed along routes different from the current best route to track statistics along those routes No redundant processing Comparison σ1σ1 σ2σ2 σ3σ3 Packets Chosen packets Tuple Router
32
IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Extra Overhead Introduced Comparison Increasing overhead Observation Exploration (inefficient routes for some tuples) Profiling (extra processing on some tuples) Competition (lots of extra work)
33
IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Coverage of Different Statistics Comparison Increasing coverage Observation & Competition (limited by plan) Exploration (limited by large number of routes) Profiling (highest since it builds statistics profile)
34
IntroductionAQP FamiliesComparisonNew IdeasConclusions Comparing Statistics-Tracking Techniques: Accuracy of Estimation Comparison Increasing accuracy Observation & Competition Exploration (but, susceptible to routing bias) Profiling (depends on sampling fraction)
35
IntroductionAQP FamiliesComparisonNew IdeasConclusions Roadmap Introduction to AQP The three AQP system families Comparison across families in terms of AQP tasks Summary of what we learned
36
IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (1) Many similarities in internals of different AQP families Can re-use many current (and new) AQP techniques across families Ex: Profiling from CQ-based systems –Enables, e.g., faster detection of plan suboptimality in Plan-based systems –Generates more accurate statistics at lower cost in Routing-based systems New Ideas Example Query: p1 and p2 (R) S ⋈ R INLJ Unclustered index S ⋈
37
IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (2) Current AQP systems are reactive –E.g., do not consider sensitivity to errors/changes in stats New Ideas Example Query: p1 and p2 (R) S ⋈ | σ( R)| Hash Join INLJ Cost Proactive Re-optimization R S Hash Join ⋈ R INLJ Unclustered index S ⋈
38
IntroductionAQP FamiliesComparisonNew IdeasConclusions What have we learned? (3) Challenging meta problems in AQP for continuous queries need to be addressed 1.Larger and more complex plan spaces higher costs for statistics tracking and re-optimization 2.Tracking “Return-of-Investment” on AQP 3.Avoiding thrashing, e.g., on bursty changes in statistics New Ideas Proposal: Plan Logging for Continuous Queries
39
IntroductionAQP FamiliesComparisonNew IdeasConclusions Plan Logging for Continuous Queries Log the statistics and re-optimization history –Query is long-running –Example view over log for R S T Rate(R) … R,S) PlanCost 1024 … 0.75P1P1 12762 5642 … 0.72P2P2 72332 934 … 0.76P1P1 12003 ⋈ ⋈ Rate(R) R,S) P1P1 P2P2 New Ideas Plans lying in a high-dimensional space of statistics time
40
IntroductionAQP FamiliesComparisonNew IdeasConclusions Summary AQP is becoming important: –New data and application trends –CS-wide push towards Autonomic Computing –Significant amount of work on AQP in recent years Our contributions: –In-depth categorization and comparison of AQP systems and techniques –Identified current shortcomings and new approaches to AQP Conclusions
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.