1 Incremental Aggregation on Multiple Continuous Queries Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy.

Slides:



Advertisements
Similar presentations
Module 13: Performance Tuning. Overview Performance tuning methodologies Instance level Database level Application level Overview of tools and techniques.
Advertisements

1 Copyright © 2011, Oracle and/or its affiliates. All rights reserved.
ARGUS: Rete + DBMS = Efficient Persistent Profile Matching on Large-Volume Data Streams Chun Jin Language Technologies Institute School of Computer Science.
Solving Problem by Searching
Manajemen Basis Data Pertemuan Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
ZHT 1 Tonglin Li. Acknowledgements I’d like to thank Dr. Ioan Raicu for his support and advising, and the help from Raman Verma, Xi Duan, and Hui Jin.
CMU SCS /615Faloutsos/Pavlo1 Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture #13: Query.
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Paper by: A. Balmin, T. Eliaz, J. Hornibrook, L. Lim, G. M. Lohman, D. Simmen, M. Wang, C. Zhang Slides and Presentation By: Justin Weaver.
ARGUS: A Prototype Stream Anomaly Monitoring System Thesis Proposal Chun Jin Thesis Committee Jaime Carbonell (Chair) Christopher Olston Jamie Callan Phil.
Novelty Detection and Profile Tracking from Massive Data Jaime Carbonell Eugene Fink Santosh Ananthraman.
UbiComp ’03 – Context Awareness Session liquid context-aware distributed queries jeffrey heer alan newberger chris beckmann jason i. hong group.
8-1 Outline  Overview of Physical Database Design  File Structures  Query Optimization  Index Selection  Additional Choices in Physical Database Design.
Query Processing & Optimization
Chapter 19 Query Processing and Optimization
Winter Retreat Connecting the Dots: Using Runtime Paths for Macro Analysis Mike Chen, Emre Kıcıman, Anthony Accardi, Armando Fox, Eric Brewer
Chapter 8 Physical Database Design. McGraw-Hill/Irwin © 2004 The McGraw-Hill Companies, Inc. All rights reserved. Outline Overview of Physical Database.
Optimizing Multiple Continuous Queries Dissertation Defense Chun Jin Thesis Committee Jaime Carbonell (Chair) Christopher Olston, on leave at Yahoo! Research.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
Database Design – Lecture 16
Hopkins Storage Systems Lab, Department of Computer Science A Workload-Driven Unit of Cache Replacement for Mid-Tier Database Caching Xiaodan Wang, Tanu.
Robot Autonomous Perception Model For Internet-Based Intelligent Robotic System By Sriram Sunnam.
Breaking the Memory Wall in MonetDB
Cloud Computing Other High-level parallel processing languages Keke Chen.
Database Management 9. course. Execution of queries.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
Carnegie Mellon School of Computer Science Copyright © 2001, Carnegie Mellon. All Rights Reserved. JAVELIN Project Briefing 1 AQUAINT Phase I Kickoff December.
Copyright © 2004 Pearson Education, Inc.. Chapter 15 Algorithms for Query Processing and Optimization.
Querying Large Databases Rukmini Kaushik. Purpose Research for efficient algorithms and software architectures of query engines.
Hash/B+ Tree/R Tree Muneeb Mahmood Ashfaq Ahmed Jim Kang.
NIMD 1 Scalable Data Exploration and Novelty Detection NIMD Grand Finale PI Meeting April 18, 2006 Main contacts: Prof. Jaime Carbonell, Carnegie Mellon.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Relational Operator Evaluation. Overview Application Programmer (e.g., business analyst, Data architect) Sophisticated Application Programmer (e.g.,
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
ICOM 6005 – Database Management Systems Design Dr. Manuel Rodríguez-Martínez Electrical and Computer Engineering Department Lecture 15 – Query Optimization.
Unclassified//For Official Use Only 1 RAPID: Representation and Analysis of Probabilistic Intelligence Data Carnegie Mellon University PI : Prof. Jaime.
Lecture 15: Query Optimization. Very Big Picture Usually, there are many possible query execution plans. The optimizer is trying to chose a good one.
Optimizing Query Processing In Sensor Networks Ross Rosemark.
CS223: Software Engineering Lecture 19: Unit Testing.
1 Overview of Query Evaluation Chapter Outline  Query Optimization Overview  Algorithm for Relational Operations.
What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently and safely. Provide.
CHAPTER 19 Query Optimization. CHAPTER 19 Query Optimization.
SketchVisor: Robust Network Measurement for Software Packet Processing
”Map-Reduce-Merge: Simplified Relational Data Processing on Large Clusters” Published In SIGMOD '07 By Yahoo! Senthil Nathan N IIT Bombay.
Efficient Evaluation of XQuery over Streaming Data
SQL Server 2017 Graph Database Inside-Out
CS422 Principles of Database Systems Course Overview
Database Management System
Applying Control Theory to Stream Processing Systems
Detection and Analysis of Threats to the Energy Sector (DATES)
Interquery Parallelism
Chapter 12: Query Processing
Unit Test Pattern.
湖南大学-信息科学与工程学院-计算机与科学系
SQL 2014 In-Memory OLTP What, Why, and How
Predictive Performance
Physical Database Design
Faloutsos/Pavlo C. Faloutsos – A. Pavlo Lecture#27: Final Review
Query Processing B.Ramamurthy Chapter 12 11/27/2018 B.Ramamurthy.
Ch 4. The Evolution of Analytic Scalability
Overview of Query Evaluation
CS222P: Principles of Data Management Notes #13 Set operations, Aggregation, Query Plans Instructor: Chen Li.
A Framework for Testing Query Transformation Rules
Database System Concepts and Architecture
Query Processing.
Lu Tang , Qun Huang, Patrick P. C. Lee
Presentation transcript:

1 Incremental Aggregation on Multiple Continuous Queries Chun Jin Carnegie Mellon University 09/28/2006 ISMIS, Bari Italy

2 Intelligence monitoring Fraud detection Onset epidemic patterns Network intrusion detection GeoSpacial changes Transactions Senor network readings Network traffic data Stream Processing

3 Problem Aggregate queries Continuous evaluation Multiple concurrent queries

4 Solutions Incremental aggregation Incremental multiple aggregate query optimization (incremental sharing)

5 Roadmap System overview Query examples Incremental Aggregation Incremental sharing Evaluation

6 Query Network Query Coordinator System Catalog Common Computation Identifier (CCI) Network Operation Manager (NOM) Code Assembler Sharing Optimizer (SO) Projection Manager (PM) System Architecture New Query Insertion: 1.Index query network 2.Identify common computation 3.Select optimal sharing path 4.Expand query network Query Network Execution: 1.Code assembly 2.Incremental aggregation 3.Periodical execution Engine Generator Oracle

7 SAB hospitalvdateCOUNT(*)SUM(fee)AVERAGE(fee) SA dis_cathospitalvdateCOUNT(*)SUM(fee)AVERAGE(fee) SELECT dis_cat, hospital, vdate, COUNT(*), AVERAGE(fee) FROM Med GROUP BY CAT(disease) AS dis_cat, hospital, DAY(visit_time) AS vdate (a) Query A SELECT hospital, vdate, AVERAGE(fee) FROM Med GROUP BY hospital, DAY(visit_time) AS vdate (b) Query B Query Examples SHSH SNSN AHAH ANAN

8 Roadmap System overview Query examples Incremental Aggregation Incremental sharing Evaluation

9 Aggregate Function Types Distributive: aggregate function itself. Sum, count. Algebraic: a finite set of aggregate functions. Average. Holistic: no such finite set. Quantiles. Incremental Aggregation

10 Holistic Aggregation Revisiting the entire history. Usage: –For holistic aggregates. –For post-non-incrementally-evaluated aggregates. –Baseline to incremental aggregation. Incremental Aggregation

11 GIDCOUNT(*) AS COUNTA SUM(fee) AS SUMA AVERAGE(fee) AS AVGA GIDCOUNT(*) AS COUNTA SUM(fee) AS SUMA AVERAGE(fee) AS AVGA 0: PreUpdate State 1: Aggregate A N t1: A H t2: A N SHSH SNSN 2: Merge Groups t2.COUNTA = t1.COUNTA + t2.COUNTA t2.SUMA = t1.SUMA + t2.SUMA 3: Compute Algebraic Aggregate 4: Drop Duplicates 5: Insert New Results Algorithm Incremental Aggregation

12 Complexity 1.Aggregate S N. T 1 = O(|S N |) 2.Merge groups in A H to A N. T curr2 = O(|A H | + |A N |), T hash2 = O(|A H | + |A N |), T prefetch2 = O(|A N |) 3.Compute algebraic aggregates in A N. T 3 = O(|A N |) 4.Drop duplicates. T curr4 = O(|A N |*|A N H |) = O(|A N | 2 ), T hash4 = O(|A H |+|A N |), T prefetch4 = O(|A N |) 5.Insert new results. T 5 = O(|A N |) Incremental Aggregation

13 Implementation System catalog: –AggreRules –AggreBasics Incremental aggregation instantiation Incremental Aggregation

14 System Catalog Incremental Aggregation FunctionCategoryIncremental Aggregation Rule Vertical Expansion Rule AVERAGEASUMX/COUNTW SUMDSUMX(H)+SUMX(N)SUM(SUMX) MEDIANHNULL COUNTDCOUNTW(H)+COUNTW(N)SUM(COUNTW) FunctionBasicsBasic ID AVERAGECOUNT(W)COUNTW AVERAGESUM(X)SUMX SUMSUM(X)SUMX COUNTCOUNT(W)COUNTW AggreBasics AggreRules

15 AggreRules: AggreBasics: AVERAGE: SUM(X): SUMX AVERAGE: COUNT(W): COUNTW New Query A: AVERAGE(fee) GroupColumns: SUM(fee):SUMA COUNT(*):COUNTA AVERAGE(fee): AVGA AVERAGE fee SUM(X) SUMX COUNT(W) COUNTW SUM(fee) SUMX COUNT(*) COUNTW parse retrieve rules substitute insert columns substitute SUM(fee) SUMX SUMA COUNT(*) COUNTW COUNTA AVERAGE(fee) AVGA Name Mapping: Instantiation Incremental Aggregation

16 Roadmap System overview Query examples Incremental Aggregation Incremental sharing Evaluation

17 Incremental Multiple Query Optimization (Incremental Sharing) Index existing query plan information R. Given a new query Q, identify the sharable computations from R. Select the optimal sharing path. Expand R to compute Q. Incremental Sharing

18 Expanding Query Network Limited sharing on holistic aggregates Sharing on distributive/algebraic aggregates through vertical expansion Incremental Sharing

19 BIDRest ID COUNT(*) AS COUNTA SUM(fee) AS SUMA AVERAGE(fee) AS AVGA AHAH 1: Further Aggregate: COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID 2: BIDCOUNT(*) AS COUNTB SUM(fee) AS SUMB AVERAGE(fee) AS AVGB BHBH 1: Further Aggregate COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID A B Vertical Expansion Incremental Sharing Vertical Expansion

20 BI D Rest ID COUNT(*) AS COUNTA SUM(fee ) AS SUMA … ANAN A B BI D Rest ID … AHAH BI D COUNT(*) AS COUNTB SUM(fee ) AS SUMB AVERAGE(f ee) AS AVGB BHBH 2: Merge Groups t2.COUNTA = t1.COUNTA + t2.COUNTA t2.SUMA = t1.SUMA + t2.SUMA 1: Further Aggregate COUNTB=SUM(COUNTA) SUMB=SUM(SUMA) GROUP BY BID Vertical Expansion 3: Compute Algebraic Aggregate BI D COUNT(*) AS COUNTB SUM(fee ) AS SUMB AVERAGE(f ee) AS AVGB BNBN 4: Drop Duplicates 5: Insert New Results

21 Vertical Expansion Complexity T V curr = O(|A N | 2 + |B H |) T V hash = O(|A N | + |B H |) T V prefetch = O(|A N |) Incremental Sharing

22 OriginalDirectParentNodeNameGroupID OriginalExprCanonicalColumnNameNodeName OriginalGroupExprCanonicalGroupExprID GroupID GroupTopology GroupExprSet GroupExprIndex GroupColumns Incremental Sharing System Catalog

23 Select Optimal Sharing Path Select least-size node for sharing Incremental Sharing

24 Rerouting SB SA A B SB A SB B Animation Evolution Incremental Sharing

25 Roadmap System overview Query examples Incremental Aggregation Incremental sharing Evaluation

26 Evaluation Databases: –Synthesized FedWire money transfers –Anonymized Medical patient admission records Queries: –Seed queries –Generate sharable queries from seeds –A wild range of queries (aggregates in this paper) Simulation: –Historical data ( on Fed, and on Med) –Chunks of new data (4000 per chunk) Evaluation

27 Incremental Aggregation Fed (350 queries) Med (450 queries) Incremental Aggregation Non Incremental Aggregation Total execution time in seconds Evaluation

28 Number of FED queries Execution Time (s) (a) Fed Evaluation

29 Number of MED queries Execution Time (s) (a) Med Evaluation

30 Conclusion Multiple aggregates over streams Solutions: –Incremental aggregation –Incremental MQO (incremental sharing) –Built atop DBMSs for direct practical utility Big performance improvement Future work: –A broad range of queries –Built atop DSMSs.

31 Acknowledgement Work with Professor Jaime Carbonell. Part of ARGUS by CMU and Dynamix. Team: Phil Hayes, Santosh Ananthraman, Bob Frederking, Eugene Fink, Dwight Dietrich, Ganesh Mani, Johny Mathew. Thanks to Professor Chris Olston for helpful discussion.

32 Incremental Size: |S N | NonVE ITT VE ITT Non-VE IBT VE IBT IBT: Incremental-Batch Execution Time (s) ITT: Average Individual-Tuple Execution Time (s) FED Query Pair 1 (a) Pair 1 Evaluation