NiagaraCQ A Scalable Continuous Query System for Internet Databases.

Slides:



Advertisements
Similar presentations
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Advertisements

An Array-Based Algorithm for Simultaneous Multidimensional Aggregates By Yihong Zhao, Prasad M. Desphande and Jeffrey F. Naughton Presented by Kia Hall.
Equality Join R X R.A=S.B S : : Relation R M PagesN Pages Relation S Pr records per page Ps records per page.
CS 245Notes 71 CS 245: Database System Principles Notes 7: Query Optimization Hector Garcia-Molina.
CS 540 Database Management Systems
Chapter 4: Trees Part II - AVL Tree
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Advanced Databases: Lecture 2 Query Optimization (I) 1 Query Optimization (introduction to query processing) Advanced Databases By Dr. Akhtar Ali.
Dr. Kalpakis CMSC 661, Principles of Database Systems Index Structures [13]
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
Query Evaluation. SQL to ERA SQL queries are translated into extended relational algebra. Query evaluation plans are represented as trees of relational.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
FALL 2004CENG 351 Data Management and File Structures1 External Sorting Reference: Chapter 8.
1 NiagaraCQ: A Scalable Continuous Query System for Internet Databases CS561 Presentation Xiaoning Wang.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
1 Indexing and Querying XML Data for Regular Path Expressions A Paper by Quanzhong Li and Bongki Moon Presented by Amnon Shochot.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
CSCI 5708: Query Processing I Pusheng Zhang University of Minnesota Feb 3, 2004.
1.1 CAS CS 460/660 Introduction to Database Systems File Organization Slides from UC Berkeley.
Practical Database Design and Tuning. Outline  Practical Database Design and Tuning Physical Database Design in Relational Databases An Overview of Database.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Database Management 9. course. Execution of queries.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 16 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Optimization in XSLT and XQuery Michael Kay. 2 Challenges XSLT/XQuery are high-level declarative languages: performance depends on good optimization Performance.
12.1Database System Concepts - 6 th Edition Chapter 12: Query Processing Overview Measures of Query Cost Selection Operation Join Operation Sorting 、 Other.
CSC 211 Data Structures Lecture 13
Lecture 1- Query Processing Advanced Databases Masood Niazi Torshiz Islamic Azad university- Mashhad Branch
Chapter 12 Query Processing (1) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Methodology – Physical Database Design for Relational Databases.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Lec 7 Practical Database Design and Tuning Copyright © 2004 Pearson Education, Inc.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
Query Execution. Where are we? File organizations: sorted, hashed, heaps. Indexes: hash index, B+-tree Indexes can be clustered or not. Data can be stored.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
File Processing : Query Processing 2008, Spring Pusan National University Ki-Joune Li.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
CS 540 Database Management Systems
Query Execution Query compiler Execution engine Index/record mgr. Buffer manager Storage manager storage User/ Application Query update Query execution.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.
Practical Database Design and Tuning
Storage Access Paging Buffer Replacement Page Replacement
Module 11: File Structure
CS 540 Database Management Systems
Record Storage, File Organization, and Indexes
CS 540 Database Management Systems
CS 440 Database Management Systems
Database Management System
Modified from Stanford CS276 slides Lecture 4: Index Construction
Prepared by : Ankit Patel (226)
NiagaraCQ : A Scalable Continuous Query System for Internet Databases
Chapter 12: Query Processing
Database Management Systems (CS 564)
File Processing : Query Processing
Practical Database Design and Tuning
Lecture 2- Query Processing (continued)
Database Design and Programming
Chapter 12 Query Processing (1)
Lecture 13: Query Execution
CENG 351 Data Management and File Structures
Presentation transcript:

NiagaraCQ A Scalable Continuous Query System for Internet Databases

2NiagaraCQ Outline 1.Problem 2.NiagaraCQ 3.Selection Placement Strategies 4.Dynamic Regrouping Algorithm

3NiagaraCQ Problem Lack of a scalable and efficient system which supports persistent queries, that allow users to receive new results when they become available: Notify me whenever the price of Dell stock drops by more than 5% and the price of Intel stock remains unchanged over next three months.

4NiagaraCQ NiagaraCQ Support continues queries Change-based queries Change-based queries Timer-based queries Timer-based queriesScalabilityPerformance Adequate to the Internet User Interface - high level query language

5NiagaraCQ Command Language Create continuous query: CREATE CQ_name XML-QL query DO action {START start_time} {EVERY time_interval} {EXPIRE expiration_time} Delete continuous query: DELETE CQ_name

6NiagaraCQ Expression Signature Represent the same syntax structure, but possibly different constant values, in different queries. Where Where INTC INTC element_as $g element_as $g in “ construct $g Where Where MSFT MSFT element_as $g element_as $g in “ construct $g

7NiagaraCQ Expression Signature (2) = Quotes.Quote.Symbol constant in quotes.xml

8NiagaraCQ Query Plan Trigger Action I Select Symbol=“INTC” File Scan quotes.xml Trigger Action J Select Symbol=“MSFT” File Scan quotes.xml

9NiagaraCQ Group Signature Common expression signature of all queries in the group = Quotes.Quote.Symbol constant in quotes.xml

10NiagaraCQ Group Constant Table Constant_valueDestination_buffer …… INTC Dest. I MSFT Dest. J ……

11NiagaraCQ Group Plan Trigger Action J Trigger Action I Split Join File File Scan quotes.xml Constant Table …….. Symbol = Constant_value

12NiagaraCQ Incremental Grouping Algorithm 1.Group optimizer traverses the query plan bottom up. 2.Matches the query ’ s expression signature with the signatures of existing groups. Trigger Action Select Symbol=“AOL” File Scan quotes.xml

13NiagaraCQ Incremental Grouping Algorithm (2) 3.Group optimizer breaks the query plan into two parts. Lower – removed Upper – added onto the group plan. 4.Adds the constant to the constant table. Trigger Action Select Symbol=“AOL” File Scan quotes.xml

14NiagaraCQ Pipeline Approach Tuples are pipelined from the output of one operator into the input of the next operator. Disadvantages Doesn’t work for grouping timer-based queries. Doesn’t work for grouping timer-based queries. Split operator may become a bottleneck. Split operator may become a bottleneck. Not all parts should be executed. Not all parts should be executed.

15NiagaraCQ Intermediate Files

16NiagaraCQ Intermediate Files (2) Advantages Intermediate files and data sources are monitored uniformly. Intermediate files and data sources are monitored uniformly. Each query is scheduled independently. Each query is scheduled independently. The potential bottleneck problem of the pipelined approach is avoided. The potential bottleneck problem of the pipelined approach is avoided.Disadvantages Extra disk I/Os. Extra disk I/Os. Split operator becomes a blocking operator. Split operator becomes a blocking operator.

17NiagaraCQ Virtual Intermediate Files Where Where <Change_ratio>$c</> element_as $g element_as $g in “quotes.xml”, $c>0.05 construct $g Where Where <Change_ratio>$c</> element_as $g element_as $g in “quotes.xml”, $c>0.15 construct $g > Quotes.Quote.Change_Ratio constant in quotes.xml Overlap

18NiagaraCQ Virtual Intermediate Files (2) All outputs from split operator are stored in one real intermediate file. This file has index on the range attribute. Virtual intermediate files store a value range. Modification of virtual intermediate files can trigger upper-level queries. The value range is used to retrieve data from the real intermediate file.

19NiagaraCQ Event Detection Types of Events Data-source change Timer Types of data sources Push-basedPull-based

20NiagaraCQ Timer-based Timer events are stored in an event list, sorted in time order. Each entry stores query ids. Query will be fired if its data source has been modified since its last firing time. After a timer event, the next firing times are calculated and the queries are added into the corresponding entries.

21NiagaraCQ Incremental Evaluation Queries are been invoked only on changed data. For each file, NiagaraCQ keeps a “ delta file ”. Queries are run over delta files. Incremental evaluation of join operators requires complete data files. Time stamp is added to each tuple in order to support timer-based.

22NiagaraCQ Memory Caching Query plans - using LRU policy that favors frequently fired queries. Data files - favors the delta files. Event list – only a “ time window ”

23NiagaraCQ System Architecture

24NiagaraCQ Continues Queries Processing Continuous Query Manager (CQM) Event Detector (ED) Data Manager (DM) Query Engine (QE) CQM adds continuous queries with file and timer information to enable ED to monitor the events 1 2 ED asks DM to monitor changes to files, 3 When a timer event happens, ED asks DM the last modified time of files 4 DM informs ED of changes to pushed-baseddata sources 5 If file changes and timer events are satisfied, ED provides CQM with a list of firing CQs 6 CQM invokes QE to execute firing CQs File scan operator calls DM to retrieve selected documents 7 8 DM only returns changes between last fire time and current fire time

25NiagaraCQ Selection Placement Strategies Where $s $p element_as $g in “quotes.xml”, $p > 90 $s element_as $t in “profiles.xml” construct $g, $t Where $s $p element_as $g in “quotes.xml”, $p > 100 $s element_as $t in “profiles.xml” construct $g, $t

26NiagaraCQ Expressions Signatures > Quotes.Quote.Price constant in quotes.xml Symbol=Symbol quotes.xml profiles.xml

27NiagaraCQ Where to place the selection operator ? Below the join - PushDown (σ 1 R S) U (σ 2 R S) U … U (σ n R S) Above the join – PullUp σ 1 (R S) U σ 2 (R S) U … U σ n (R S) PullUp achieves an average 10-fold performance improvement over PushDown.

28NiagaraCQ - Query Plan PushDown - Query Plan Join Select Price>90 quotes.xml profiles.xml

29NiagaraCQ - Groups Plans PushDown - Groups Plans

30NiagaraCQ - Groups Plans PullUp - Groups Plans

31NiagaraCQ PullUp Vs. PushDown Only one join group and one selection group Only one join group and one selection group Maintains a single intermediate file Maintains a single intermediate file Irrelevant tuples being joined Very large intermediate file Changes in profiles.xml affect the intermediate file (file_k) – maintenance overhead.

32NiagaraCQ Filtered PullUp Join Selection Price>90 quotes.xml profiles.xml quotes.xml Grouped Join Plan

33NiagaraCQ Filtered PullUp Vs. PullUp Relevant tuples being joined Relevant tuples being joined Reduce the size of intermediate file Reduce the size of intermediate file Reduce the cost of PullUp by 75% Reduce the cost of PullUp by 75% Complexity – the selection predicate may need to be dynamically modified (query with price>70)

34NiagaraCQ Dynamic Re-grouping Let Q1 (A B C) and Q2 (B C) be two continuous queries submitted sequentially. Incremental grouping algorithm chooses a plan ((A B) C). Neither of these groups can be used for Q2. ABC BCAB ABC BC

35NiagaraCQ Dynamic Re-grouping (2) Existing queries are not regrouped with new grouping opportunities introduced by subsequent queries. Reduction in the overall performance - queries are continuously being added and removed. Naive regrouping-algorithm – periodically perform a global query optimization: Expensive Redundant work (already done by incremental opt.)

36NiagaraCQ Data Structures A query graph – directed acyclic graph, with each node representing an existing join expression in the group plan. Node { char* query; //ASCII query plan SIG_TYPE sig; //signature of the query string int final_node_count; //number of users that require this query. //0: non-final node; >0: final node list children; //children of this node, where Child={Node*, weight} list parents; //parents of this node float updateFreq; //update frequency of this node float cost; //the cost for computing this node //Following data structures used only for dynamic regrouping int reference_count; //reference count bool visited; //a flag that records whether //purgeSibling has performed on this node }

37NiagaraCQ Data Structures (2) A group table – array of hash tables. i-th hash table - queries with query length (number of joins) i. Hash table entry - mapping from a query string to the corresponding node in the graph. Array HashNode

38NiagaraCQ Data Structures (3) A query log – array of vectors. Stores new nodes that have been added since the last regrouping. Cleared after regrouping. Array VectorNode

39NiagaraCQ Incremental Grouping Algorithm Top-down local exhaustive search : If the query exists, increases the final node count by 1. Else Enumerates all possible sub-query in a top-down manner and probes the group table to check whether a sub-query node exists. Computes the minimal cost of using existing sub-query nodes. Computes the minimal cost without using existing sub- query nodes. The least-costly plan will be chosen.

40NiagaraCQ Dynamic Regrouping Algorithm Phase 1 : constructing links among existing nodes and new nodes. Phase 2 : find minimal-weighted solution from the current solution by removing redundant nodes. ABC BCAB

41NiagaraCQ Phase 1: constructing links among existing nodes and new nodes Main idea - for any pair of nodes in the graph, if one node is a sub-query of another node, it creates a link between them if it did not exist before. Relationships are only evaluated between existing nodes and nodes added since last regrouping. The difference of levels between a parent and a child is always 1.

42NiagaraCQ Phase 1 - Algorithm bottom-up for each node in level i query log if node has parents in level i+1 group table connect node to parent if node has children in level i-1 group table connect node to children

43NiagaraCQ Phase 2: A greedy algorithm for level- wise graph minimization Main idea – traverse the query graph level-by- level and attempt to remove any redundant nodes at one level a time. Starts from the second level from the top. Subset of level i nodes retain if: Nodes at level i+1 have at least one child in this set. These nodes have a minimum total cost. Nodes that are not selected are removed permanently.

44NiagaraCQ Phase 2 - Algorithm MinimizeGraph() { for each level L in group-table { // L ranging from the maximum number of join-1 to 1 for each node N in the level-L group table InitializeSet(N) for each node N in finalSet PurgeSiblings(N); while (remain set is not empty) { scan each node R in the remain set { if (R’s reference count == 0) { remove R from the remain set deleteNode(R) } else if (R.cost/R.reference_count < Current_minimum) { M=R Current_minimum =R.cost/R.reference_count; } } //scan … remove M from the remain set PurgeSiblings(M) } //while… } //for each level … } //MinimizeGraph InitializeSet(Node N) { if N is a final node Add N into final_set else { add N into the remain_set N.reference_count = number of parents of N } N.visited = false } purgeSiblings(Node N) { For each parent P of N { if (!P.visited) { Decrease the reference count of N’s siblings of same parent P by 1 P.visited = true }

45NiagaraCQ Cost Analysis N = number of queries Number of nodes is proportional to the number of queries = C*N Each query contains no more then 10 joins. Each level contain about C*N/10 nodes

46NiagaraCQ Cost Analysis – Phase 1 R or K*R = regrouping frequencies In frequency R N/R = number of regrouping C*R = number of nodes that will be joined with existing nodes. m*C*R = number of nodes after m-1 regrouping. m*(C*R) 2 = number of comparisons for m-th regrouping (ignoring a constant reduction).

47NiagaraCQ Cost Analysis – Phase 1 (2) Total number of comparisons, frequency R: (C*R) 2 +2*(C*R) 2 +…+N/R*(C*R) 2 = N(N+R)C 2 /2 = O(N 2 ) N(N+R)C 2 /2 = O(N 2 ) Total number of comparisons, frequency K*R: (C*K*R) 2 +…+(N/(K*R))*(C*K*R) 2 = N(N+KR)C 2 /2 N(N+KR)C 2 /2 The ratio: [N(N+KR)C 2 /2]/[N(N+R)C 2 /2] = (N+KR)/(N+R)

48NiagaraCQ Cost Analysis – Phase 2 Worst case – each pass remove one node. Cost for a level: (C*N/10) +(C*N/10-1) +…+1 = CN(CN+10)/200 = O(N 2 ) CN(CN+10)/200 = O(N 2 ) Purge siblings: (C*N/10 * C*N/10) = (CN) 2 /100 = O(N 2 ) All 9 levels: O(N 2 )

49NiagaraCQ References NiagaraCQ: A Scalable Continuous Query System for Internet Databases Design and Evaluation of Alternative Selection Placement Strategies in Optimizing Continuous Queries Dynamic Re-grouping of Continuous Queries