1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar.

Slides:

Advertisements

Similar presentations

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.

Advertisements

Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014

C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.

Query Optimization CS634 Lecture 12, Mar 12, 2014 Slides based on “Database Management Systems” 3 rd ed, Ramakrishnan and Gehrke.

1 Hash-Based Indexes Module 4, Lecture 3. 2 Introduction As for any index, 3 alternatives for data entries k* : – Data record with key value k – –Choice.

Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.

Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.

IntroductionAQP FamiliesComparisonNew IdeasConclusions Adaptive Query Processing in the Looking Glass Shivnath Babu (Stanford Univ.) Pedro Bizarro (Univ.

Continuously Adaptive Processing of Data and Query Streams Michael Franklin UC Berkeley April 2002 Joint work w/Joe Hellerstein and the Berkeley DB Group.

Sharing Aggregate Computation for Distributed Queries Ryan Huebsch, UC Berkeley Minos Garofalakis, Yahoo! Research † Joe Hellerstein, UC Berkeley Ion Stoica,

1-1 CMPE 259 Sensor Networks Katia Obraczka Winter 2005 Storage and Querying II.

PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.

Evaluating Window Joins Over Unbounded Streams By Nishant Mehta and Abhishek Kumar.

Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.

Chapter 6: Database Evolution Title: AutoAdmin “What-if” Index Analysis Utility Authors: Surajit Chaudhuri, Vivek Narasayya ACM SIGMOD 1998.

VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute

1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.

1 SINA: Scalable Incremental Processing of Continuous Queries in Spatio-temporal Databases Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref Presented by.

Sensor Database: Querying Sensor Networks Yinghua Wu, Haiyong Xie.

On-The-Fly Verification of Rateless Erasure Codes Max Krohn (MIT CSAIL) Michael Freedman and David Mazières (NYU)

Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam

1 Anna Östlin Pagh and Rasmus Pagh IT University of Copenhagen Advanced Database Technology March 25, 2004 QUERY COMPILATION II Lecture based on [GUW,

Performance Issues in Adaptive Query Processing Fred Reiss U.C. Berkeley Database Group.

1 04/18/2005 Flux Flux: An Adaptive Partitioning Operator for Continuous Query Systems M.A. Shah, J.M. Hellerstein, S. Chandrasekaran, M.J. Franklin UC.

Query Processing Presented by Aung S. Win.

Efficient Scheduling of Heterogeneous Continuous Queries Mohamed A. Sharaf Panos K. Chrysanthis Alexandros Labrinidis Kirk Pruhs Advanced Data Management.

NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.

NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.

施賀傑何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.

Identifying Reversible Functions From an ROBDD Adam MacDonald.

M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.

1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.

Physical Database Design Chapter 6. Physical Design and implementation 1.Translate global logical data model for target DBMS  1.1Design base relations.

Access Path Selection in a Relational Database Management System Selinger et al.

Database Management 9. course. Execution of queries.

Index Tuning for Adaptive Multi-Route Data Stream Systems Karen Works, Elke A. Rundensteiner, and Emmanuel Agu Database Systems Research.

Ripple Joins for Online Aggregation by Peter J. Haas and Joseph M. Hellerstein published in June 1999 presented by Ronda Hilton.

Data Streams and Continuous Query Systems

Int. Workshop on Advanced Computing and Analysis Techniques in Physics Research (ACAT2005), Zeuthen, Germany, May 2005 Bitmap Indices for Fast End-User.

Relational Operator Evaluation. Overview Index Nested Loops Join If there is an index on the join column of one relation (say S), can make it the inner.

1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.

PermJoin: An Efficient Algorithm for Producing Early Results in Multi-join Query Plans Justin J. Levandoski Mohamed E. Khalefa Mohamed F. Mokbel University.

Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.

1 Virtual Machine Memory Access Tracing With Hypervisor Exclusive Cache USENIX ‘07 Pin Lu & Kai Shen Department of Computer Science University of Rochester.

QED: A Novel Quaternary Encoding to Completely Avoid Re-labeling in XML Updates Changqing Li,Tok Wang Ling.

An Example Data Stream Management System: TelegraphCQ INF5100, Autumn 2009 Jarle Søberg.

PARALLEL RECURSIVE STATE COMPRESSION FOR FREE ALFONS LAARMAN JOINT WORK WITH: MICHAEL WEBER JACO VAN DE POL 12/7/2011 SPIN 2011.

Multi-Query Optimization and Applications Prasan Roy Indian Institute of Technology - Bombay.

Eddies: Continuously Adaptive Query Processing Ross Rosemark.

16.7 Completing the Physical- Query-Plan By Aniket Mulye CS257 Prof: Dr. T. Y. Lin.

CS4432: Database Systems II Query Processing- Part 2.

Finding skyline on the fly HKU CS DB Seminar 21 July 2004 Speaker: Eric Lo.

CS 440 Database Management Systems Lecture 5: Query Processing 1.

Chapter 9: Web Services and Databases Title: NiagaraCQ: A Scalable Continuous Query System for Internet Databases Authors: Jianjun Chen, David J. DeWitt,

CS 540 Database Management Systems

Rate-Based Query Optimization for Streaming Information Sources Stratis D. Viglas Jeffrey F. Naughton.

NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.

REED ： Robust, Efficient Filtering and Event Detection in Sensor Network Daniel J. Abadi, Samuel Madden, Wolfgang Lindner Proceedings of the 31st VLDB.

Query Optimization for Stream Databases Presented by: Guillermo Cabrera Fall 2008.

CS 540 Database Management Systems

CS 440 Database Management Systems

Proactive Re-optimization

Cluster Resource Management: A Scalable Approach

Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy

Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform

TelegraphCQ: Continuous Dataflow Processing for an Uncertain World

PSoup: A System for streaming queries over streaming data

Eddies for Continuous Queries

Adaptive Query Processing (Background)

Relax and Adapt: Computing Top-k Matches to XPath Queries

Presentation transcript:

1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman

2 CACQ Introduction Proposed continuous query (CQ) systems are based on static plans – But, CQs are long running – Initially valid assumptions less so over time – Static optimizers at their worst! CACQ insight: apply continuous adaptivity of eddies to continuous queries – Dynamic operator ordering avoids static optimizer danger – Process multiple queries simultaneously – Interestingly, enables sharing of work & storage

3 Outline Background – Motivation – Continuous Queries – Eddies CACQ – Contributions - Example driven explanation Results & Experiments

4 Outline Background – Motivation – Continuous Queries – Eddies CACQ – Contributions - Example driven explanation Results & Experiments

5 Motivating Applications “Monitoring” queries look for recent events in data streams – Sensor data processing – Stock analysis – Router, web, or phone events In CACQ, we confine our view to queries over ‘recent- history’ Only tuples currently entering the system Stored in in-memory data tables for time-windowed joins between streams

6 Continuous Queries Long running, “standing queries”, similar to trigger systems Installed; continuously produce results until removed Lots of queries, over the same data sources – Opportunity for work sharing! – Global query optimization problem: hard! – Idea: adaptive heuristics not quite as hard? Bad decisions are not final – Future work: finding an optimal plan (adaptively)

7 CACQ Query Model Monotonic queries, from point when query is registered Terry et al., SIGMOD 1992 Streaming answers Non-blocking operators Windowed Symmetric Joins Windows in tuples or time

8 Eddies & Adaptivity Eddies (Avnur & Hellerstein, SIGMOD 2000): Continuous Adaptivity No static ordering of operators Policy dynamically orders operators on a per tuple basis done and ready bits encode where tuple has been, where it can go

9 Outline Background – Motivation – Continuous Queries – Eddies CACQ – Contributions - Example driven explanation Results & Experiments

10 CACQ Contributions Adaptivity – Policies for continuous queries – Single eddy for multiple queries Tuple Lineage – In addition to ready and done, encode output history in tuple in queriesCompleted bits Enables flexible sharing of operators between queries Grouped Filter – Efficiently compute selections over multiple queries Join Sharing through State Modules (SteMs)

11 Explication By Example First, example with just one query and only selections Then, add multiple queries Then, (briefly) discuss joins

12 Eddies & CACQ : Single Query, Single Source Use ready bits to track what to do next All 1’s in single source Use done bits to track what has been done Tuple can be output when all bits set Routing policy dynamically orders tuples R  (R.a > 10) Eddy  (R.b < 15) R1R1 R1R1 R1R1 R1 a5 b25 R2 a15 b Rea dy Don e aa bb aa bb R  (R.a > 10) Eddy  (R.b < 15) R2R2 R2R2 R2R2 R2R2 R2R2 R2R2 SELECT * FROM R WHERE R.a > 10 AND R.b < 15

13 aa bb R aa bb R aa bb R Q1Q2Q3 Multiple Queries R.a > 10 R.a > 20 R.a = 0 R.b < 15 R.b = 25 R.b <> 50 bb aa R R1R1 R1R1 R1R1 R1R1 R1R1 Grouped Filters R1 a5 b25 SELECT * FROM R WHERE R.a > 10 AND R.b < 15 Q1 SELECT * FROM R WHERE R.a > 20 AND R.b = 25 Q2 SELECT * FROM R WHERE R.a = 0 AND R.b <> 50 Q3 R1R1 R1R1 R1R1 R1R1 R1R1 R1R1 R1R1 R1R aa bb Q1Q2Q3 DoneQueriesComplete d

14 aa bb R aa bb R aa bb R Q1Q2Q3 Multiple Queries R.a > 10 R.a > 20 R.a = 0 R.b < 15 R.b = 25 R.b <> 50 bb aa R R2R2 R2R2 R2R2 R2R2 R2R2 R2R2 Grouped Filters R2 a15 b0 SELECT * FROM R WHERE R.a > 10 AND R.b < 15 Q1 SELECT * FROM R WHERE R.a > 20 AND R.b = 25 Q2 SELECT * FROM R WHERE R.a = 0 AND R.b <> 50 Q bb aa R bb aa R bb aa R Q1Q2Q3 R2R2 R2R2 R2R aa bb Q1Q2Q3 DoneQueriesComplete d R1R1 R1R1 R2R2 Reorder Operators!

15 Outputting Tuples Store a completionMask bitmap for each query – One bit per operator – Set if the operator in the query To determine if a tuple t can be output to query q: – Eddy ANDs q’s completionMask with t’s done bits – Output only if q’s bit not set in t’s queriesCompleted bits Every time a tuple returns from an operator Q2: 0111 Q1: 1100 & Done == 1100 & Done == 0111 completionMasks && QueriesCompleted[0] == 0 SELECT * FROM R WHERE R.a > 10 AND R.b < 15 Query 1 SELECT * FROM R WHERE R.b < 15 AND R.c <> 5 AND R.d = 10 Query 2 completionMas ks  abcd Q11100 Q20111 Done QC aa Q1Q2 bb cc dd Tupl e

16 Grouped Filter Use binary trees to efficiently index range predicates – Two trees (LT & GT) per attribute – Insert constant When tuple arrives – Scan everything to right (for GT) or left (for LT) of the tuple-attribute in the tree – Those are the queries that the tuple does not pass Hash tables to index equality, inequality predicates Greater-than tree over S.a S.a > 1 S.a > 7 S.a > 11 8 S.a > >7>1 Q1 Q2Q3

17 Work Sharing via Tuple Lineage A B C D B A Data Stream S s scsc sBCsBC sDsD s BD Query 1 Query 2 Conventional Queries s s sCsC s CD s CDB CACQ - Adaptivity A C D B Data Stream S A C D B Query 1 Query 2 Shared Subexpressions sBsB s AB Reject ? s CDBA s Q1: SELECT * FROM s WHERE A, B, C Q2: SELECT * FROM s WHERE A, B, D Intersecti on of CD goes through AB an extra time! AB must be applied first! Lineage (Queries Completed) Enables Any Ordering! 0 | 0 QC 0 or 1 | 0 QC 1 | 1 QC 0 or 1 | 0 or 1 QC 0 or 1 | 0 or 1 QC CD 0 or 1 | 0 or 1 QC

18 Tradeoff: Overhead vs. Shared Work Overhead in additional bits per tuple – Experiments studying performance, size in paper – Bit / query / tuple is most significant Trading accounting overhead for work sharing – 100 bits / tuple allows a tuple to be processed once, not 100 times Reduce overhead by not keeping state about operators tuple will never pass through

19 Joins in CACQ Use symmetric hash join to avoid blocking Use State Modules (SteMs) to share storage between joins with a common base relation Detail about effect on implementation & benefit in paper See Raman, UC Berkeley Ph.D. Thesis, 2002.

20 Routing Policies Previous system provides correctness; policy responsible for performance Consult the policy to determine where to route every tuple that: – Enters the system – Returns from an operator Basic Ticket Policy – Give operators tickets for consuming tuples, take away tickets for producing them – To choose the next operator to route, run a lottery – More selective operators scheduled earlier Modification for CACQ – Give more tickets to operators shared by multiple queries (e.g. grouped filters) – When a shared operator outputs a tuple, charge it multiple tickets – Intuition: cardinality reducing shared operators reduce global work more than unshared operators Not optimizing for the throughput of a single query!

21 CACQ Review Efficient mechanism for processing multiple simultaneous monitoring queries over streaming data sources – Share work by processing all queries within a single eddy Continuous adaptivity via eddies & routing policy – Queries come & go, but performance adapts without costly multiquery reoptimization Maximize ability to work share by explicitly encoding lineage Share selections via grouped filter Share join state via SteMs

22 Outline Background – Motivation – Continuous Queries – Eddies CACQ – Contributions - Example driven explanation Results & Experiments

23 Evaluation Real Java implementation on top of Telegraph QP – 4,000 new lines of code in 75,000 line codebase Server Platform – Linux – Pentium III 733, 756 MB RAM Queries posed from separate workstation – Output suppressed Lots of experiments in paper, just a few here

24 Results: Routing Policy From S select index where a > 90 From S select index where a > 90 and b > 70 From S select index where a > 90 and b > 70 and c > 50 From S select index where a > 90 and b > 70 and c > 50 and d > 30 From S select index where a > 90 and b > 70 and c > 50 and d > 30 and e > 10 All attributes uniformly distributed over [0,100] Query

25 CACQ vs. NiagaraCQ * Performance Competitive with Workload from NCQ Paper Different workload where CACQ outperforms NCQ: Stocks.sym = Articles.sym  UDF(stocks.sym) Articles Stocks Query 3 Stocks.sym = Articles.sym  UDF(stocks.sym) Articles Stocks Query 2 Stocks.sym = Articles.sym  UDF(stocks) Articles Stocks Query 1 |result| > |stocks| Expensive SELECT stocks.sym, articles.text FROM stocks,articles WHERE stocks.sym = articles.sym AND UDF(stocks) *See Chen et al., SIGMOD 2000, ICDE 2002

26 CACQ vs. NiagaraCQ #2  UDF 1 Query 1  UDF 2 Query 2  UDF 3 Query 3 Niagara Option #2 S1S1 S1AS1AS2AS2A S2S2 S3S3 S3AS3A  UDF 3  UDF 2  UDF 1 CACQ S1S1 S 1|2 S 1|2|3 Stocks.sym = Articles.sym Articles Stocks |result| > |stocks| Expensive  UDF 1 Query 1Query 2 Query 3 Niagara Option #1  UDF 2  UDF 3 SA SA No shared subexpressions, so no shared work! Lineage Allows Join To Be Applied Just Once

27 CACQ vs. NiagaraCQ Graph

28 Conclusion CACQ: sharing and adaptivity for high performance monitoring queries over data streams Features – Adaptivity Adapt to changing query workload without costly multi-query reoptimization – Work sharing via tuple lineage Without constraining the available plans – Computation sharing via grouped filter – Storage sharing via SteMs Future Work – More sophisticated routing policies – Batching & query grouping – Better integration with historical results (Chandrasekaran, VLDB 2002)

29 Questions? Shameless plug: check out my demo on query processing in sensor networks!

30 S.bR.a RS Joins in CACQ CACQ uses Parallel Pipelined Joins – To avoid blocking Consider Symmetric Hash Join Buil d Probe

31 Processing Joins Via State Modules Idea: Share join indices over base relations State Modules (SteMs) are: – Unary indexes (e.g. hash tables, trees) – Built on the fly (as data arrives) – Scheduled by CACQ as first class operators Based on symmetric hash join S.b = T.c Query 1 R.a = S.b Query 2 R S T S.bR.a T.c Buil d Prob e Buil d Prob e

32 Experiment: Increased Scalability Workload, Per Query: 1-5 randomly selected range predicates of form ‘attr > x’ over 5 attributes. Predicates from the uniform distribution [0,100]. 50% chance of predicate over each attribute.

33 Tuple & Query Data Structures Per tuple bitmaps: – queriesCompleted What queries has this tuple been output to or rejected by? – done What operators have been applied to this tuple? – ready What operators can be applied to this tuple? Per query bitmaps: – completionMask What operators must be applied to output a tuple to this query? Tuple [10, 1100, …] BitValue QueriesCompleted Query 11 Query 20 Done S.a Index1 S.b Index1 R.a – S.b Join0 UDF (R.a)0 Query [0110] BitValue completionMask S.a Index0 S.b Index1 R.a – S.b Join1 UDF (R.a)0

34 n CACQ Query Model SELECT R.a, S.b from R, S where R.c = x and R.d[n] = S.e[m]  m S R Landmark queries Streaming answers Band Joins Windows in tuples or time n Probe into S