Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.

Slides:



Advertisements
Similar presentations
Lecture 24 MAS 714 Hartmut Klauck
Advertisements

Evaluating “find a path” reachability queries P. Bouros 1, T. Dalamagas 2, S.Skiadopoulos 3, T. Sellis 1,2 1 National Technical University of Athens 2.
Twig 2 Stack: Bottom-up Processing of Generalized-Tree-Pattern Queries over XML Documents Songting Chen, Hua-Gang Li *, Junichi Tatemura Wang-Pin Hsiung,
Querying Workflow Provenance Susan B. Davidson University of Pennsylvania Joint work with Zhuowei Bao, Xiaocheng Huang and Tova Milo.
1 Finite Constraint Domains. 2 u Constraint satisfaction problems (CSP) u A backtracking solver u Node and arc consistency u Bounds consistency u Generalized.
Optimizing Join Enumeration in Transformation-based Query Optimizers ANIL SHANBHAG, S. SUDARSHAN IIT BOMBAY VLDB 2014
Berkeley dsn declarative sensor networks problem David Chu, Lucian Popa, Arsalan Tavakoli, Joe Hellerstein approach related dsn architecture status  B.
New Models for Graph Pattern Matching Shuai Ma ( 马 帅 )
DIMACS Streaming Data Working Group II On the Optimality of the Holistic Twig Join Algorithm Speaker: Byron Choi (Upenn) Joint Work with Susan Davidson.
Structural Joins: A Primitive for Efficient XML Query Pattern Matching Al Khalifa et al., ICDE 2002.
The Volcano/Cascades Query Optimization Framework
DISCOVER: Keyword Search in Relational Databases Vagelis Hristidis University of California, San Diego Yannis Papakonstantinou University of California,
Composite Subset Measures Lei Chen, Paul Barford, Bee-Chung Chen, Vinod Yegneswaran University of Wisconsin - Madison Raghu Ramakrishnan Yahoo! Research.
The Theory of NP-Completeness
SYMBOLIC MODEL CHECKING: STATES AND BEYOND J.R. Burch E.M. Clarke K.L. McMillan D. L. Dill L. J. Hwang Presented by Rehana Begam.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Efficient Query Evaluation on Probabilistic Databases
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
NP-complete and NP-hard problems Transitivity of polynomial-time many-one reductions Definition of complexity class NP –Nondeterministic computation –Problems.
1 8. Safe Query Languages Safe program – its semantics can be at least partially computed on any valid database input. Safety is tied to program verification,
Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group Worcester.
VLDB Revisiting Pipelined Parallelism in Multi-Join Query Processing Bin Liu and Elke A. Rundensteiner Worcester Polytechnic Institute
NP-complete and NP-hard problems
Exploiting Correlated Attributes in Acquisitional Query Processing Amol Deshpande University of Maryland Joint work with Carlos Sam
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 6 The Relational Algebra and Relational Calculus.
Chapter 11: Limitations of Algorithmic Power
Topic 6 -Code Generation Dr. William A. Maniatty Assistant Prof. Dept. of Computer Science University At Albany CSI 511 Programming Languages and Systems.
Efficient Query Evaluation over Temporally Correlated Probabilistic Streams Bhargav Kanagal, Amol Deshpande ΗΥ-562 Advanced Topics on Databases Αλέκα Σεληνιωτάκη.
Mike 66 Sept Succinct Data Structures: Techniques and Lower Bounds Ian Munro University of Waterloo Joint work with/ work of Arash Farzan, Alex Golynski,
Efficient Evaluation of XQuery over Streaming Data Xiaogang Li Gagan Agrawal The Ohio State University.
EVENT MANAGEMENT IN MULTIVARIATE STREAMING SENSOR DATA National and Kapodistrian University of Athens.
Context Tailoring the DBMS –To support particular applications Beyond alphanumerical data Beyond retrieve + process –To support particular hardware New.
CMPS 3223 Theory of Computation Automata, Computability, & Complexity by Elaine Rich ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Slides provided.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
DANIEL J. ABADI, ADAM MARCUS, SAMUEL R. MADDEN, AND KATE HOLLENBACH THE VLDB JOURNAL. SW-Store: a vertically partitioned DBMS for Semantic Web data.
Chapter 13 Query Processing Melissa Jamili CS 157B November 11, 2004.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
TwigStackList¬: A Holistic Twig Join Algorithm for Twig Query with Not-predicates on XML Data by Tian Yu, Tok Wang Ling, Jiaheng Lu, Presented by: Tian.
RRXS Redundancy reducing XML storage in relations O. MERT ERKUŞ A. ONUR DOĞUÇ
SPARQL Query Graph Model (How to improve query evaluation?) Ralf Heese and Olaf Hartig Humboldt-Universität zu Berlin.
CIKM Finding and Approximating Top-k Answers in Keyword Proximity Search Benny Kimelfeld Yehoshua Sagiv Benny Kimelfeld and Yehoshua Sagiv The Selim.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
ACM SIGMOD International Conference on Management of Data, Beijing, June 14 th, Keyword Search on Relational Data Streams Alexander Markowetz Yin.
Space-Efficient Online Computation of Quantile Summaries SIGMOD 01 Michael Greenwald & Sanjeev Khanna Presented by ellery.
Indexing Correlated Probabilistic Databases Bhargav Kanagal, Amol Deshpande University of Maryland, College Park, USA SIGMOD Presented.
By: Gang Zhou Computer Science Department University of Virginia 1 Medians and Beyond: New Aggregation Techniques for Sensor Networks CS851 Seminar Presentation.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Sum-Max Monotonic Ranked Joins for Evaluating Top-K Twig Queries on Weighted Data Graphs Yan Qi, Arizona State University K. Selcuk Candan, Arizona State.
A Semantic Caching Method Based on Linear Constraints Yoshiharu Ishikawa and Hiroyuki Kitagawa University of Tsukuba
Holistic Twig Joins Optimal XML Pattern Matching Nicolas Bruno Columbia University Nick Koudas Divesh Srivastava AT&T Labs-Research SIGMOD 2002.
Efficient Rule-Based Attribute-Oriented Induction for Data Mining Authors: Cheung et al. Graduate: Yu-Wei Su Advisor: Dr. Hsu.
Solving problems by searching A I C h a p t e r 3.
Scheduling of Transactions on XML Documents Author: Stijin Dekeyser Jan Hidders Reviewed by Jason Chen, Glenn, Steven, Christian.
R-customizers Goal: define relation between graph and its customizers, study domains of adaptive programs, merging of interface class graphs.
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
Diagnostic Information for Control-Flow Analysis of Workflow Graphs (aka Free-Choice Workflow Nets) Cédric Favre(1,2), Hagen Völzer(1), Peter Müller(2)
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 Representing and Reasoning on XML Documents: A Description Logic Approach D. Calvanese, G. D. Giacomo, M. Lenzerini Presented by Daisy Yutao Guo University.
Efficient Evaluation of XQuery over Streaming Data
Lecture 22 Complexity and Reductions
Probabilistic Data Management
Dealing with Changes of Time-Aware Processes
Steven Lindell Scott Weinstein
Structure and Content Scoring for XML
Efficient Subgraph Similarity All-Matching
Graphical Solution of Linear Programming Problems
Structure and Content Scoring for XML
Presentation transcript:

Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan and Wang-Pin Hsiung NEC Laboratories America * University of California, Santa Barbara

2VLDB' Seoul, Korea Stream Query Processing Continuous Queries Stream Query Engine Stream Query Engine Streaming Data Online transaction management Network analysis Sensor network monitoring …

3VLDB' Seoul, Korea Motivating Example Window approach –However, window size may be hard to determine Exploiting stream constraints –Uniqueness, sorted input, etc –Punctuations

4VLDB' Seoul, Korea Punctuation –A predicate that must be evaluated to false for every element following the punctuation Representation [Tucker et al. TKDE 2003] –A special tuple (*, c, *, *) –E.g., Item(sellerid,itemid,name,initialprice) A punctuation “no more item with itemid = 1” is denoted as (*, 1, *, *)

5VLDB' Seoul, Korea State of the Art Semantic modeling of punctuations [Tucker et al. TKDE 2003] Punctuation-aware query optimization –Binary join [Ding et al. EDBT 2004] –Group By [Li et al. SIGMOD 2005] Generation of useful punctuations, i.e., heartbeats, from time domain [Srivastava et al. PODS 2004] However, one fundamental problem is not addressed –Whether a query can benefit from available punctuations, refer to as “safety checking” problem

6VLDB' Seoul, Korea Outline Formulate safety checking problem for continuous join queries Sound and complete safety condition for simple punctuations Sounds and complete safety condition for complex punctuations Conclusion and future work

7VLDB' Seoul, Korea Punctuation Scheme Punctuation scheme –Describe the types of punctuation instances that a data stream can have at runtime –Can be viewed as metadata of punctuation instances Representation –Simple punctuation schemes: e.g., Item(sellerid, itemid, name, initialprice). punctuation scheme (–,+,–,–), instance (*, 1, *, *) –Complex punctation schemes: e.g., Bid(bidderid, itemid, increase). punctuation scheme (+,+,–), instance (1, 1, *) Determined by application semantics

8VLDB' Seoul, Korea Safety Checking Problem Given a continuous join query Q (CJQ) and a set of punctuation schemes, –Determine If Q still requires unbounded memory consumption no matter what punctuation instances (described by the punctuation schemes) may occur For example: –Unsafe if we only have following two punctuation schemes Item(sellerid,itemid,name,initialprice) (–, +, –, –) Bid(bidderid,itemid,increase) (+, +, –) Safety.vs. Runtime memory consumption –Unsafe query always requires infinite runtime memory –However, safe query does not guarantee low runtime memory consumption

9VLDB' Seoul, Korea Join State –Refer to the space used for storing the inputs of each join operator Purgeability –Purgeability of a join state for every tuple t, there exists a finite set of punctuation instances such that t will not produce any join results with any new tuples –Purgeability of a join operator Safe Execution Plan –Every join operator involved is purgeable Safe CJQ –There exists at least one safe execution plan Concepts … … √

10VLDB' Seoul, Korea Purging for Binary Join Operator Purge S 2 is similar. Hence we need punctuation schemes S 1 (–, +), S 2 (+, –)

11VLDB' Seoul, Korea A CJQ with no Safe Binary Join Plan S 1.A = S 3.A Punctuation Schemes S 1 (A –, B + ), S 2 (B –, C + ), S 3 (C –, A + ) CJQ Unsafe Plan

12VLDB' Seoul, Korea Purging for M-Way Join Operator

13VLDB' Seoul, Korea Chained Purge Strategy There is a punctuation propagation effect for M-way join operator!

14VLDB' Seoul, Korea Punctuation Graph (simple punctuation scheme) Capture such punctuation propagation effect

15VLDB' Seoul, Korea THEOREM 1. The join state S is purgeable iff there exists a path from S to every other node S i in the punctuation graph COROLLARY 1. A join operator is purgeable iff its punctuation graph is a strongly connected graph. Purgeability of a Join Operator S S1S1 S2S2 S3S3 … … S’S’

16VLDB' Seoul, Korea Safety for CJQ Safe CJQ requires at least one safe execution plan –However, the number of execution plans is exponential THEOREM 2. A CJQ is safe iff its M-join plan is safe → If M-join plan is unsafe, no other safe plan exists → Linear safety checking for simple punctuation schemes

17VLDB' Seoul, Korea Handling Complex Punctuation Schemes S 3 : (+,+) cannot purge either S 1 or S 2, but can purge S 1 S 2 S3S3 (A, C)

18VLDB' Seoul, Korea Generalized Punctuation Graph Intermediate result Purge of raw data stream Purge of intermediate result

19VLDB' Seoul, Korea CJQ Safety under Complex Punctuations Schemes Intuition: intermediate results have to be purgeable as well Transformed Punctuation Graph –1. Identify strongly connected sub-graph, merge them into a single merged node –2. Take the generalized punctuation edges of merged node into account, continue Step 1 THEOREM 3. A CJQ is safe iff transformed punctuation graph ends up in a single merged node –Polynomial safety checking for complex punctuation schemes

20VLDB' Seoul, Korea Conclusion & Future Work Formulate the safety checking problem for CJQ Sound and complete safety conditions –Based on novel punctuation graph –Linear for simple punctuation schemes –Polynomial for complex punctuation schemes Future work –Optimization of Chained Purge Strategy for M-join M-join purge.vs. a tree binary-join purge –Optimization of CJQ Purge plan.vs. join plan Adaptive purge plan –Generation of Punctuations