NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.

Slides:



Advertisements
Similar presentations
Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Advertisements

Supporting top-k join queries in relational databases Ihab F. Ilyas, Walid G. Aref, Ahmed K. Elmagarmid Presented by Rebecca M. Atchley Thursday, April.
CS 540 Database Management Systems
Query Execution, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems November 18, 2003 Some slide content may.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Quick Review of Apr 10 material B+-Tree File Organization –similar to B+-tree index –leaf nodes store records, not pointers to records stored in an original.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11.
Chapter 11 Indexing and Hashing (2) Yonsei University 2 nd Semester, 2013 Sanghyun Park.
Chapter 11 (3 rd Edition) Hash-Based Indexes Xuemin COMP9315: Database Systems Implementation.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B+-Tree Index Files B-Tree.
Database Management Systems 3ed, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 14, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Database Management Systems, R. Ramakrishnan and Johannes Gehrke1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Query Evaluation. An SQL query and its RA equiv. Employees (sin INT, ename VARCHAR(20), rating INT, age REAL) Maintenances (sin INT, planeId INT, day.
NiagaraCQ A Scalable Continuous Query System for Internet Databases.
1 NiagaraCQ: A Scalable Continuous Query System for Internet Databases CS561 Presentation Xiaoning Wang.
1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,
Cost-Based Plan Selection Choosing an Order for Joins Chapter 16.5 and16.6 by:- Vikas Vittal Rao ID: 124/227 Chiu Luk ID: 210.
Query Optimization 3 Cost Estimation R&G, Chapters 12, 13, 14 Lecture 15.
16.5 Introduction to Cost- based plan selection Amith KC Student Id: 109.
1 Evaluation of Relational Operations: Other Techniques Chapter 12, Part B.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Bigtable: A Distributed Storage System for Structured Data F. Chang, J. Dean, S. Ghemawat, W.C. Hsieh, D.A. Wallach M. Burrows, T. Chandra, A. Fikes, R.E.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
IT The Relational DBMS Section 06. Relational Database Theory Physical Database Design.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Query Evaluation Chapter 12: Overview.
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Database Management 9. course. Execution of queries.
Data Streams and Continuous Query Systems
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
CPSC 404, Laks V.S. Lakshmanan1 Evaluation of Relational Operations: Other Operations Chapter 14 Ramakrishnan & Gehrke (Sections ; )
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
1 Biometric Databases. 2 Overview Problems associated with Biometric databases Some practical solutions Some existing DBMS.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 11 Modified by Donghui Zhang Jan 30, 2006.
Introduction to Database, Fall 2004/Melikyan1 Hash-Based Indexes Chapter 10.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Hash-Based Indexes Chapter 10.
16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.
CS4432: Database Systems II Query Processing- Part 2.
Radix Sort and Hash-Join for Vector Computers Ripal Nathuji 6.893: Advanced VLSI Computer Architecture 10/12/00.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
Query Optimization CMPE 226 Database Systems By, Arjun Gangisetty
Query Processing CS 405G Introduction to Database Systems.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
Relational Operator Evaluation. overview Projection Two steps –Remove unwanted attributes –Eliminate any duplicate tuples The expensive part is removing.
Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,
CS 540 Database Management Systems
NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
Chiu Luk CS257 Database Systems Principles Spring 2009
CS 540 Database Management Systems
CS 440 Database Management Systems
Parallel Databases.
Database Management System
Database Applications (15-415) DBMS Internals- Part VII Lecture 16, October 25, 2016 Mohammad Hammoud.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases
Hash-Based Indexes Chapter 11
Chapter 12: Query Processing
Introduction to Query Optimization
Chapter 15 QUERY EXECUTION.
Database Management Systems (CS 564)
Evaluation of Relational Operations: Other Operations
Join Processing in Database Systems with Large Main Memories (part 2)
Hash-Based Indexes Chapter 10
Selected Topics: External Sorting, Join Algorithms, …
Lecture 2- Query Processing (continued)
Evaluation of Relational Operations: Other Techniques
Chapter 11 Instructor: Xin Zhang
Evaluation of Relational Operations: Other Techniques
Presentation transcript:

NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD 2000 Talk by Naresh Kumar

Outline Motivation What is NiagaraCQ ? General strategy of incremental group optimization Query split scheme with materialized intermediate files Incremental grouping of selection and join operators Experimental Details

Data Stream Management System(DSMS)‏ The two fundamental difference btn DBMS and DSMS  In addition to managing traditional stored data, DSMS must handle multiple continuous, unbounded, possibly rapid data streams  A DSMS supports long running continuous queries.

Continuous Queries(CQ)‏ Continuous queries are persistent queries that allow user to receive results when they become available. Example Notify me when ever price of Dell stock drops by more than 5% A broad classification  Change based  Timer based

Motivation Continuous queries are growingly popular. Notifies user with out querying repeatedly. Much useful in internet where information changes frequently Challenges: Need to be able to support million of queries due to scale of internet. Solution - NiagaraCQ

Previous Attempts Previous group optimization efforts focused on finding an optimal plan for a small number of similar queries Not applicable to a continuous query system for the following reasons: Computationally too expensive to handle a large number of queries. Not designed for an environment like the web where CQ s are added or removed dynamically.

What is NiagaraCQ ? NiagaraCQ–A CQ system for the internet It is built on the assumption that Many queries tend to be similar to one another. Similar queries can be grouped together It supports scalable continuous query processing over multiple, distributed XML files

NiagaraCQ - Approaches Advantages of grouping : Grouped queries can share computation. They can reside in memory saving IO-cost Avoid unnecessary invocations by testing many CQ together Handles both change based and timer based queries in a uniform way To ensure scalability: Incremental evaluation of CQ's Memory caching

NiagaraCQ command language Creating a CQ Create CQ_name XML-QL query Do action { START start_time} { EVERY time_interval} { EXPIRE expiration_time} Delete CQ_name

Expression Signature Query examples Where INTC element_as $g in “ construct $g Where MSFT element_as $g in “ construct $g Expression signatures Quotes.Quote.Symbol in quotes.xml constant =

Query plans Trigger Action ITrigger Action J File Scan Select Symbol = “MSFT” Select Symbol = “INTC” File Scan quotes.xml

Group Group – signature, constant table, plan Group Signature  Common signature of all queries in the group Group constant table Dest_buffer Constant_value Dest. J MSFT Dest. I INTC

The group plan

Incremental Grouping Algo When a new query is submitted If the expression signature of the new query matches that of existing groups Break the query plan into two parts Remove the lower part Add the upper part onto the group plan else create a new group

New Query **AOL added to Constant Table **new destination buffer allocated **Matching process continues until top

Query split scheme  Matching Process will continue on the remainder of query plan until top of plan is reached  Thus, each continuous query is split into several smaller queries such that inputs of each of these queries are monitored using the same techniques that are used for the inputs of user­defined continuous queries.  Incremental group optimization is very efficient because it only requires one traversal of query

Query split with materialized intermediate files Destination buffer for the split operator can be implemented in Pipelined scheme Intermediate file scheme Disadvantages of pipelined scheme  It does not work for grouping timer based queries  Gives a single complicated execution plan  Combined plan may be very large and require resources beyond limit of system  A large portion of query plan may not need to be executed at each invocation  Split operator may block simple queries

Query split with materialized intermediate files(cont...) Using intermediate files  Split operator writes each output stream to a file  Cut query plan into 2 parts at split operator  Add a file scan operator to upper part to read intermediate file  Intermediate files are monitored just like other data sources  Intermediate file names are stored in the constant table  Grouped CQ with same constant share same intermediate file

The query split scheme

Trade-offs Other advantages of materialized intermediate files  Only the necessary queries are executed. Thus computation time reduced  Tree structured query format – can easily scheduled and executed by general query engine  Uniform handling of intermediate files and original data source files  Bottle neck problem is avoided Disadvantages  Split operator becomes a blocking operator  Extra disk I/Os

Range Predicates E.g. R.a < val or val1 < R.a < val2 Multiple such ranges Problem  Intermediate files may contain duplicate tuples Idea: Virtual intermediate files Use an index to implement this

Incremental grouping of selection predicates Multiple selection predicates in a query CNF for predicates on same data source Incremental grouping  Choose the most selective conjunct and implement virtual file on this conjunct Example query Where ”INTC” $p element_as $g in “quotes.xml”, $p < 100 Construct $g

Incremental grouping of join operators Join operators are usually expensive, sharing common join operations can significantly reduce the amount of computation. A join query Quotes.Quote.Change_Ratio constant in “quotes.xml” Where $s element_as $g in “quotes.xml”, $s element_as $t in “companies.xml” construct $g, $t

Queries that contain both join and selection Example query : Where $s ”Computer Service” element_as $g in “quotes.xml”, $s element_as $t in “companies.xml” construct $g, $t Where to place the selection operator ?  Below the join Eliminates irrelevant tuples  Above the join Allows sharing Pick based on cost model

Grouping timer-based queries Challenge  Hard to monitor the timer events of queries  Sharing common computation becomes difficult Event Detection  Stores time events sorted in time order  Each query has an ID

Incremental evaluation Invoke queries only on changed data For each source file, NiagaraCQ keeps a delta file Also for the intermediate files Time stamp store the each tuple – for timer based queries Incremental evaluation of join operators requires complete data files

Memory Caching Thousands of continuous queries can’t fit in memory What should we cache ?  Grouped query plans What about non-grouped queries ?  Favor small delta files  Time window of the event list

System Architecture

CQ processing

Experimental Results Example query : Where ”INTC” element_as $g in “quotes.xml”, construct $g N = number of installed queries F= number of fired queries C = number of tuples modified

Performance Results Case 1: F=N, C=1000 Case 2: F=100, C=1000

Performance Results F=N=2000, vary data size

Thank You