1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft Jim Gray Systems Lab Thanx goes to NSF for partial support of this project.
2 A variety of modern applications face data with non-uniform characteristics ubiquitous healthcare, location-based services, financial tickers, network monitoring… Data Query Results Data Sources Database Engine SELECT * FROM … Query Optimizer Plan Cost Query Execution Plan Query Executor Overall Statistics I want my results quickly. I don’t care how exactly they are computed TYPICALLY ONE execution plan for ALL DATA 2
3 Data Streams Query Results Network packets DSMS SELECT * FROM … Query Optimizer Continuous Query Execution Plan Plan 1 Plan 2 Plan 3 Opportunity for Improvement: It may be more efficient to use different plans for different subsets of data 3 Here example is with streaming data Similar examples can be found with static data
4 Introduction & Motivation Background : Query Mesh Model Optimization Execution Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Experimental Evaluation Ongoing and future work Conclusion 4
5 (Here, route = execution plan) Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems Single “route-oriented” solution Multiple routesClassifier Traditional Query Optimization Eddies and its descendants Multi “route-less” solution Eddy Query Mesh … … … Multi “route-oriented” solution Coarse optimization Small overhead Fine-granularity optim. Significant overhead Fine-granularity optimization Less overhead 5
/2/3/4 1/23/414/2/31/24/313/2/412/3/41/2/34 14/23 1/234124/313/24123/4134/212/34 Set of training tuples {1,2,3,4}* has cardinality n = 4 * We denote {{1},{2,3}} as “1/23” for brevity One plan for all data Each subset has individual route Query Mesh Lattice Shaped Search Space 6 Search Space: the set of all possible solutions Search Space Complexity Bell number B n = sum of Stirling numbers of second kind S(n,k) Stirling number of the second kind S(n, k) is the number of ways to partition a set of cardinality n into exactly k nonempty subsets
77 Query Mesh Cost Model (main idea) Cost(QM) = Cost of Classifier + Cost of routes + Multi-route overhead Query Mesh Search Algorithms Optimal Query Mesh Search (Opt-QM) Query Mesh Search Heuristics = explored solutions Three components of search heuristics: (1) Start Solution 5 different approaches - extreme-1, extreme-N, random, content-driven, route-driven Experimentally evaluated (2) Search Strategy Randomized algorithms -Iterative Improvement - Simulated annealing (3) Stop condition Largely depends on the search strategy employed -K-iterations, Plateau, Time-bounded, Resource-bounded Too expensive! Need heuristics! (1)Form all possible sets for the given powerset (2 ) Form partitions out of the above sets Main idea:
8 Sample of Tuples (training dataset) t 10 t9t9 t8t8 t7t7 t6t6 t5t5 t4t4 t3t3 t2t2 t1t1 t 11 t 12 … Data Stream … Query Executor Query Optimizer … sample and so on Compute Routes (i.e., plans) Query Mesh … … … … Induce Classifier r3r3 r4r4 r2r2 r1r1 r1r1 r2r2 r4r4 - QM Optimizer - QM Executor 8 [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, (Demo) In VLDB 2009.
Classification Window (tumbling window) t5t5 t4t4 t3t3 t1t1 t9t9 t6t6 t2t2 t 10 t8t8 t7t7 After Classification route r 1 route r 2 route r 3 t 10 t9t9 t8t8 t7t7 t6t6 t5t5 t4t4 t3t3 t2t2 t1t1 t 11 t 12 … r-tokens data tuples rusters Send to Self-Routing Fabric Data Stream Query Executor Query Optimizer … - QM Optimizer - QM Executor 9 [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009.
10 At time T + 1 At time T + 2At time T + 3 At time T 10
11 Can we have an execution strategy that is plan-based supports different plans for distinct subsets of data is as adaptive “as Eddies” 11 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.
12 Introduction & Motivation Background : Query Mesh Model Optimization Execution Dynamic Re-Optimization with Query Mesh Challenges Architecture Details Conclusion Current and Future Work 12
13 Multiple routes Classifier Query Mesh … … … 1. What should be monitored to determine whether the current QM solution is no longer adequate? 2. How to determine if the current QM solution should be adapted? 3. How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being executed? Concept Drift Analysis, QM Cost Model, Improvement Measure Data and Statistics Monitoring Single Lightweight Operation to Physically Adapt QM Self-Tuning Query Mesh … … … 13 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.
14 Static QM Framework Query Executor Query Optimizer Query Executor Query Optimizer ST-QM Adaptive QM Framework Adaptive QM Framework 14 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.
15 ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted) ST-QM Analyzer determines if a concept drift has actually occurred and makes recommendations if and how the QM solution should be adapted ST-QM Actuator takes these recommendations and physically adapts the QM solution ST-QMMonitorST-QMAnalyzer ST-QMActuator measurements recommendations actuation sampling 15 Query Mesh New Query Mesh
16 ST-QM Analyzer: From Concept Drifts To Tuning Recommendations In response to detected concept drifts, ST-QM Analyzer may give the following recommendations: ignore the concept drifts or make the following tuning recommendations Query Mesh … … … … … … … … … R 1 New Classifier + Old Routes R 2 Old Classifier + New Routes R 3 New Classifier + New Routes Case 1: Virtual Concept Drift Recommendation Case 2: Real Concept Drift Recommendation Case 3: Hybrid Concept Drift Recommendation 16
17 Classifier Modification Query Mesh … … … … … … … … … R 1 New Classifier + Old Routes R 2 Old Classifier + New Routes R 3 New Classifier + New Routes All possible recommendations: Case 1: Virtual Concept Drift Recommendation Case 2: Real Concept Drift Recommendation Case 3: Hybrid Concept Drift Recommendation … Query results OI-array Op-modules op i op k op l Self-Routing Fabric Data r1r1 r2r2 r3r3 r1r1 r2r2 r3r3 Online Classifier rusters Current Classifier New Classifier The beauty of the proposed design!!! 17
18 ST-QM was implemented inside Java-based continuous query engine called CAPE Compare its relative performance against competitor systems, namely, we compared adaptive QM against: Static (non-adaptive) QM, Adaptive “plan-less” Eddies Adaptive “plan-less” Eddies with CBR-based routing policy Results can be found in EDBT’
19 ST-QM gave up to 44% improvement in execution time and output rate compared to non-adaptive QM, Eddy and single plan execution approach The runtime overhead of ST-QM relative to query execution is small (on average 2%). The actuation cost of physical adaptivity is nearly negligible resulting in 0.02% of total execution cost Even if no adaptivity is needed, ST-QM’s performance in the worst case will be at most 2-3% slower than static QM 19
20 Query Mesh is practical query optimization approach Eliminates single plan assumption Feasibility shown Has low overhead & high potential benefit Easily implemented and integrated with existing systems Query Mesh leads to novel solutions Usage of machine learning in query optimization and query processing Usage of network-inspired techniques in query optimization and query processing 20
21 Consider state caching and indexing in QM stream context Work with alternate classification methods for route decisions Design customized query optimization and processing strategies Study multi-query processing and optimization Scale by applying distributed processing technologies Do QM principles also apply in static DB context !? 21
22 Thank you to current and past DSRG members for stream engine development, feedback, collaboration, and much more.