1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft.

Slides:

Advertisements

Similar presentations

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Advertisements

Overcoming Limitations of Sampling for Agrregation Queries Surajit ChaudhuriMicrosoft Research Gautam DasMicrosoft Research Mayur DatarStanford University.

Balajee Vamanan, Gwendolyn Voskuilen, and T. N. Vijaykumar School of Electrical & Computer Engineering SIGCOMM 2010.

Di Yang, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute VLDB 2009, Lyon, France 1 A Shared Execution Strategy for Multiple Pattern.

Distributed DBMSPage © 1998 M. Tamer Özsu & Patrick Valduriez Outline Introduction Background Distributed DBMS Architecture Distributed Database.

Distributed DBMS© M. T. Özsu & P. Valduriez Ch.6/1 Outline Introduction Background Distributed Database Design Database Integration Semantic Data Control.

Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.

Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.

IBM Software Group ® Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.

IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.

Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.

Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.

VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

Freddies: DHT-Based Adaptive Query Processing via Federated Eddies Ryan Huebsch Shawn Jeffery CS Peer-to-Peer Systems 12/9/03.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

A Strategy Selection Framework for Adaptive Prefetching in Visual Exploration Punit R. Doshi, Geraldine E. Rosario, Elke A. Rundensteiner, and Matthew.

Elke A. Rundensteiner Database Systems Research Group Office: Fuller 238 Phone: Ext. – 5815 WebPages:

An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.

1 DCAPE: Distributed and Self-Tuned Continuous Query Processing Tim Sutherland,Bin Liu,Mariana Jbantova, and Elke A. Rundensteiner Department of Computer.

1 Optimizing Utility in Cloud Computing through Autonomic Workload Execution Reporter : Lin Kelly Date : 2010/11/24.

Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer

Introduction ： ‘Skoll: Distributed Continuous Quality Assurance’ Morimichi Nishigaki.

Performance Issues in Adaptive Query Processing Fred Reiss U.C. Berkeley Database Group.

Prefetching for Visual Data Exploration Punit R. Doshi, Elke A. Rundensteiner, Matthew O. Ward Computer Science Department Worcester Polytechnic Institute.

Machine Learning in Simulation-Based Analysis 1 Li-C. Wang, Malgorzata Marek-Sadowska University of California, Santa Barbara.

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

Last Words COSC Big Data (frameworks and environments to analyze big datasets) has become a hot topic; it is a mixture of data analysis, data mining,

© 2006 IBM Corporation Adaptive Self-Tuning Memory in DB2 Adam Storm, Christian Garcia-Arellano, Sam Lightstone – IBM Toronto Lab Yixin Diao, M. Surendra.

Implicit An Agent-Based Recommendation System for Web Search Presented by Shaun McQuaker Presentation based on paper Implicit:

Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.

Master Thesis Defense Jan Fiedler 04/17/98

Budget-based Control for Interactive Services with Partial Execution 1 Yuxiong He, Zihao Ye, Qiang Fu, Sameh Elnikety Microsoft Research.

CAPE: Continuous Query Engine with Heterogeneous-Grained Adaptivity Elke A. Rundensteiner, Luping Ding, Timothy Sutherland, Yali Zhu Brad Pielech, Nishant.

1 Dynamically Adaptive Distributed System for Processing CompleX Continuous Queries Bin Liu, Yali Zhu, Mariana Jbantova, Brad Momberger, and Elke A. Rundensteiner.

Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.

CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Retrospective computation makes past states available inline with current state in a live system What is the language for retrospective computation? What.

A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.

Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.

Taguchi. Abstraction Optimisation of manufacturing processes is typically performed utilising mathematical process models or designed experiments. However,

Supporting Top-k join Queries in Relational Databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by: Z. Joseph, CSE-UT Arlington.

Di Yang, Zhengyu Guo, Elke A. Rundensteiner and Matthew O. Ward Worcester Polytechnic Institute EDBT 2010, Submitted 1 A Unified Framework Supporting Interactive.

D-skyline and T-skyline Methods for Similarity Search Query in Streaming Environment Ling Wang 1, Tie Hua Zhou 1, Kyung Ah Kim 2, Eun Jong Cha 2, and Keun.

A Memory-hierarchy Conscious and Self-tunable Sorting Library To appear in 2004 International Symposium on Code Generation and Optimization (CGO ’ 04)

© 2002 IBM Corporation IBM Research 1 Policy Transformation Techniques in Policy- based System Management Mandis Beigi, Seraphin Calo and Dinesh Verma.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Automatic Categorization of Query Results Kaushik Chakrabarti, Surajit Chaudhuri, Seung-won Hwang Sushruth Puttaswamy.

Collection and storage of provenance data Jakub Wach Master of Science Thesis Faculty of Electrical Engineering, Automatics, Computer Science and Electronics.

Written By: Presented By: Swarup Acharya,Amr Elkhatib Phillip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy Join Synopses for Approximate Query Answering.

Adaptive Online Scheduling in Storm Paper by Leonardo Aniello, Roberto Baldoni, and Leonardo Querzoni Presentation by Keshav Santhanam.

Introduction to Machine Learning, its potential usage in network area,

Igor EPIMAKHOV Abdelkader HAMEURLAIN Franck MORVAN

Applying Control Theory to Stream Processing Systems

A paper on Join Synopses for Approximate Query Answering

Preface to the special issue on context-aware recommender systems

Query in Streaming Environment

Performance Evaluation of Adaptive MPI

A Framework for Automatic Resource and Accuracy Management in A Cloud Environment Smita Vijayakumar.

A Unifying View on Instance Selection

Akshay Tomar Prateek Singh Lohchubh

Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy

Outline Introduction Background Distributed DBMS Architecture

Smita Vijayakumar Qian Zhu Gagan Agrawal

Recommending Materialized Views and Indexes with the IBM DB2 Design Advisor (Automating Physical Database Design) Jarek Gryz.

A Framework for Testing Query Transformation Rules

Resource Allocation for Distributed Streaming Applications

Realizing Closed-loop, Online Tuning and Control for Configurable-Cache Embedded Systems: Progress and Challenges Islam S. Badreldin*, Ann Gordon-Ross*,

Rohan Yadav and Charles Yuan (rohany) (chenhuiy)

Presentation transcript:

1 Elke. A. Rundensteiner Worcester Polytechnic Institute Elisa Bertino Purdue University 1 Rimma V. Nehme Microsoft Jim Gray Systems Lab Thanx goes to NSF for partial support of this project.

2  A variety of modern applications face data with non-uniform characteristics  ubiquitous healthcare, location-based services, financial tickers, network monitoring… Data Query Results Data Sources Database Engine SELECT * FROM … Query Optimizer Plan Cost Query Execution Plan Query Executor Overall Statistics I want my results quickly. I don’t care how exactly they are computed TYPICALLY ONE execution plan for ALL DATA 2

3 Data Streams Query Results Network packets DSMS SELECT * FROM … Query Optimizer Continuous Query Execution Plan Plan 1 Plan 2 Plan 3 Opportunity for Improvement: It may be more efficient to use different plans for different subsets of data 3 Here example is with streaming data Similar examples can be found with static data

4  Introduction & Motivation  Background : Query Mesh  Model  Optimization  Execution  Dynamic Re-Optimization with Query Mesh  Challenges  Architecture  Details  Experimental Evaluation  Ongoing and future work  Conclusion 4

5 (Here, route = execution plan) Query Mesh provides a middle ground between a single pre-computed route and multiple runtime routes systems Single “route-oriented” solution Multiple routesClassifier Traditional Query Optimization Eddies and its descendants Multi “route-less” solution Eddy Query Mesh … … … Multi “route-oriented” solution Coarse optimization Small overhead Fine-granularity optim. Significant overhead Fine-granularity optimization Less overhead 5

/2/3/4 1/23/414/2/31/24/313/2/412/3/41/2/34 14/23 1/234124/313/24123/4134/212/34 Set of training tuples {1,2,3,4}* has cardinality n = 4 * We denote {{1},{2,3}} as “1/23” for brevity One plan for all data Each subset has individual route Query Mesh Lattice Shaped Search Space 6 Search Space: the set of all possible solutions Search Space Complexity Bell number B n = sum of Stirling numbers of second kind S(n,k) Stirling number of the second kind S(n, k) is the number of ways to partition a set of cardinality n into exactly k nonempty subsets

77 Query Mesh Cost Model (main idea) Cost(QM) = Cost of Classifier + Cost of routes + Multi-route overhead Query Mesh Search Algorithms Optimal Query Mesh Search (Opt-QM) Query Mesh Search Heuristics = explored solutions Three components of search heuristics: (1) Start Solution 5 different approaches - extreme-1, extreme-N, random, content-driven, route-driven Experimentally evaluated (2) Search Strategy Randomized algorithms -Iterative Improvement - Simulated annealing (3) Stop condition Largely depends on the search strategy employed -K-iterations, Plateau, Time-bounded, Resource-bounded Too expensive! Need heuristics! (1)Form all possible sets for the given powerset (2 ) Form partitions out of the above sets Main idea:

8 Sample of Tuples (training dataset) t 10 t9t9 t8t8 t7t7 t6t6 t5t5 t4t4 t3t3 t2t2 t1t1 t 11 t 12 … Data Stream … Query Executor Query Optimizer … sample and so on Compute Routes (i.e., plans) Query Mesh … … … … Induce Classifier r3r3 r4r4 r2r2 r1r1 r1r1 r2r2 r4r4 - QM Optimizer - QM Executor 8 [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology, (Demo) In VLDB 2009.

Classification Window (tumbling window) t5t5 t4t4 t3t3 t1t1 t9t9 t6t6 t2t2 t 10 t8t8 t7t7 After Classification route r 1 route r 2 route r 3 t 10 t9t9 t8t8 t7t7 t6t6 t5t5 t4t4 t3t3 t2t2 t1t1 t 11 t 12 … r-tokens data tuples rusters Send to Self-Routing Fabric Data Stream Query Executor Query Optimizer … - QM Optimizer - QM Executor 9 [NWRB09] R. Nehme, K. Works, E. Rundensteiner and E. Bertino, Query Mesh: Multi-Route Query Processing Technology (Demo), In VLDB 2009.

10 At time T + 1 At time T + 2At time T + 3 At time T 10

11 Can we have an execution strategy that is plan-based supports different plans for distinct subsets of data is as adaptive “as Eddies” 11 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

12  Introduction & Motivation  Background : Query Mesh  Model  Optimization  Execution  Dynamic Re-Optimization with Query Mesh  Challenges  Architecture  Details  Conclusion  Current and Future Work 12

13 Multiple routes Classifier Query Mesh … … … 1. What should be monitored to determine whether the current QM solution is no longer adequate? 2. How to determine if the current QM solution should be adapted? 3. How to efficiently execute the physical migration from the current QM to a new QM solution while the query is being executed? Concept Drift Analysis, QM Cost Model, Improvement Measure Data and Statistics Monitoring Single Lightweight Operation to Physically Adapt QM Self-Tuning Query Mesh … … … 13 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

14  Static QM Framework Query Executor Query Optimizer Query Executor Query Optimizer ST-QM Adaptive QM Framework Adaptive QM Framework 14 [NRB09] R. Nehme, E. Rundensteiner and E. Bertino, Self-Tuning Query Mesh for Adaptive Multi-Route Query Processing, In EDBT 2009.

15  ST-QM Monitor continuously samples data and execution statistics that will be used to determine if a concept drift has occurred (i.e., QM needs to be adapted)  ST-QM Analyzer determines if a concept drift has actually occurred and makes recommendations if and how the QM solution should be adapted  ST-QM Actuator takes these recommendations and physically adapts the QM solution ST-QMMonitorST-QMAnalyzer ST-QMActuator measurements recommendations actuation sampling 15 Query Mesh New Query Mesh

16 ST-QM Analyzer: From Concept Drifts To Tuning Recommendations  In response to detected concept drifts, ST-QM Analyzer may give the following recommendations:  ignore the concept drifts  or make the following tuning recommendations Query Mesh … … … … … … … … … R 1 New Classifier + Old Routes R 2 Old Classifier + New Routes R 3 New Classifier + New Routes Case 1: Virtual Concept Drift Recommendation Case 2: Real Concept Drift Recommendation Case 3: Hybrid Concept Drift Recommendation 16

17 Classifier Modification Query Mesh … … … … … … … … … R 1 New Classifier + Old Routes R 2 Old Classifier + New Routes R 3 New Classifier + New Routes All possible recommendations: Case 1: Virtual Concept Drift Recommendation Case 2: Real Concept Drift Recommendation Case 3: Hybrid Concept Drift Recommendation … Query results OI-array Op-modules op i op k op l Self-Routing Fabric Data r1r1 r2r2 r3r3 r1r1 r2r2 r3r3 Online Classifier rusters Current Classifier New Classifier The beauty of the proposed design!!! 17

18  ST-QM was implemented inside Java-based continuous query engine called CAPE  Compare its relative performance against competitor systems, namely, we compared adaptive QM against:  Static (non-adaptive) QM,  Adaptive “plan-less” Eddies  Adaptive “plan-less” Eddies with CBR-based routing policy  Results can be found in EDBT’

19  ST-QM gave up to 44% improvement in execution time and output rate compared to non-adaptive QM, Eddy and single plan execution approach  The runtime overhead of ST-QM relative to query execution is small (on average 2%).  The actuation cost of physical adaptivity is nearly negligible resulting in 0.02% of total execution cost  Even if no adaptivity is needed, ST-QM’s performance in the worst case will be at most 2-3% slower than static QM 19

20 Query Mesh is practical query optimization approach  Eliminates single plan assumption  Feasibility shown  Has low overhead & high potential benefit  Easily implemented and integrated with existing systems Query Mesh leads to novel solutions  Usage of machine learning in query optimization and query processing  Usage of network-inspired techniques in query optimization and query processing 20

21 Consider state caching and indexing in QM stream context Work with alternate classification methods for route decisions Design customized query optimization and processing strategies Study multi-query processing and optimization Scale by applying distributed processing technologies Do QM principles also apply in static DB context !? 21

22 Thank you to current and past DSRG members for stream engine development, feedback, collaboration, and much more.