1 Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD Spring 2009.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

From Handbook of Temporal Reasoning in Artificial Intelligence By Jan Chomicki & David Toman Temporal Databases Presented by Leila Jalali CS224 presentation.
Equivalence Relations
N-Consensus is the Second Strongest Object for N+1 Processes Eli Gafni UCLA Petr Kuznetsov Max Planck Institute for Software Systems.
C6 Databases.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
Relational Algebra, Join and QBE Yong Choi School of Business CSUB, Bakersfield.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
YES-NO machines Finite State Automata as language recognizers.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
1 Answering Queries Using Views Alon Y. Halevy Based on Levy et al. PODS ‘95.
1 Continuous Queries over Data Streams Vitaly Kroivets, Lyan Marina Presentation for The Seminar on Database and Internet The Hebrew University of Jerusalem,
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
1 Introduction to Computability Theory Lecture11: Variants of Turing Machines Prof. Amos Israeli.
1 Continuous Query Languages for DSMS CS240B Notes by Carlo Zaniolo.
Query Optimization. General Overview Relational model - SQL  Formal & commercial query languages Functional Dependencies Normalization Physical Design.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
1 Query Processing: The Basics Chapter Topics How does DBMS compute the result of a SQL queries? The most often executed operations: –Sort –Projection,
ATLaS: A Complete Database Language for Streams Carlo Zaniolo, Haixun Wang Richard Luo,Jan-Nei Law et al. Documentation and software downloads:
1 Data Stream Management Systems Checkpoint CS240B Notes by Carlo Zaniolo UCLA CSD.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Cloud and Big Data Summer School, Stockholm, Aug Jeffrey D. Ullman.
Chapter 4: Organizing and Manipulating the Data in Databases
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Lecture 23: Finite State Machines with no Outputs Acceptors & Recognizers.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Query Processing. Steps in Query Processing Validate and translate the query –Good syntax. –All referenced relations exist. –Translate the SQL to relational.
C6 Databases. 2 Traditional file environment Data Redundancy and Inconsistency: –Data redundancy: The presence of duplicate data in multiple data files.
CS30003: Compilers Lexical Analysis Lecture Date: 05/08/13 Submission By: DHANJIT DAS, 11CS10012.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Set Operations Objectives of the Lecture : To consider the Set Operations Union, Difference and Intersect; To consider how to use the Set Operations in.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
1 Ch. 1: Sharing Knowledge and Success  Oracle is an Object-Relational Database (ORDBMS).  RDBMS allows you to put the data in, keep the data, get it.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
CS4432: Database Systems II Query Processing- Part 2.
Presented By: Miss N. Nembhard. Relation Algebra Relational Algebra is : the formal description of how a relational database operates the mathematics.
Query Processing – Query Trees. Evaluation of SQL Conceptual order of evaluation – Cartesian product of all tables in from clause – Rows not satisfying.
2/20: Ch. 6 Data Management What is data? How is it stored? –Traditional management storage techniques; problems –DBMS.
Relational Algebra p BIT DBMS II.
The Object-Oriented Database System Manifesto Malcolm Atkinson, François Bancilhon, David deWitt, Klaus Dittrich, David Maier, Stanley Zdonik DOOD'89,
Blocking, Monotonicity, and Turing Completeness in a Database Language for Sequences and Streams Yan-Nei Law, Haixun Wang, Carlo Zaniolo 12/06/2002.
Graph Indexing From managing and mining graph data.
1 10 Systems Analysis and Design in a Changing World, 2 nd Edition, Satzinger, Jackson, & Burd Chapter 10 Designing Databases.
1 Partial Orderings Epp, section Introduction An equivalence relation is a relation that is reflexive, symmetric, and transitive A partial ordering.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Ritu CHaturvedi Some figures are adapted from T. COnnolly
The Object-Oriented Database System Manifesto
Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD 2017.
Relational Algebra Chapter 4 1.
Copyright © Cengage Learning. All rights reserved.
Nondeterministic Finite Automata
Intelligent Information System Lab
Relational Algebra Chapter 4, Part A
Jaya Krishna, M.Tech, Assistant Professor
Chapter 15 QUERY EXECUTION.
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Finite State Machines.
Relational Algebra Chapter 4 1.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Query Execution Presented by Jiten Oswal CS 257 Chapter 15
Probabilistic Databases
CS240B: Assignment1 Winter 2016.
Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD 2017.
Equivalence Relations
Approximation and Load Shedding Sampling Methods
What is it? The term "Automata" is derived from the Greek word "αὐτόματα" which means "self-acting". An automaton (Automata in plural) is an abstract self-propelled.
Presentation transcript:

1 Continuous Query Languages (CQL) Blocking Operators and the expressive power problem Carlo Zaniolo UCLA CSD Spring 2009

2 CQLs for DSMS zMost of DSMS projects use SQL for continuous queries—for good reasons, since yMany applications span data streams and DB tables yA CQL based on SQL will be easier to learn & use yMoreover: the fewer the differences the better! zBut DSMS were designed for persistent data and transient queries---not for persistent queries on transient data zAdaptation of SQL and its enabling technology presents difficult research challenges zThese combine with traditional SQL problem, such as inability to deal with sequences, DM tasks, and other complex query tasks---i.e., lack of expressive power

3 Language Problems z Most DSMS use SQL — queries spanning both data streams and DBs will be easier. But … zEven for persistent data, SQL is far from perfect. Important application areas poorly supported include: yData Mining, and we need to mine data streams, ySequence queries, and data streams are infinite time series! zMajor new problems for SQL on data stream applications. ( After all, it was designed for persistent data on secondary store, not for streaming data) y Only NonBlocking operators in DSMS: blocking forbidden y Distinction not clear in DBMS which often use blocking implementations for nonblocking operators yThe distinction needs to formally characterized y and so is the loss of query power caused upon CQLs.

4 Blocking Operators zA blocking query operator is ‘one that is unable to produce the first tuple of the output until it has seen the entire input’ [Babcock et al. PODS02] zBut continuous queries cannot wait for the end of the stream: must return results while the data is streaming in. Blocking operators cannot be used.  Only non-blocking ( nb ) queries and operators can be used on data streams (i.e. those that return their results before they have detected the end of the input). zCurrent DBMSs make heavy usage of blocking computations: 1.For operators that are intrinsically blocking 2.And for those that are not—i.e., they are only implemented that way. To exclude 1, we need to find a characterization for blocking & nonblocking that is independent of implementation.

5 Partial Ordering  Let S = [ t 1, , t n ] be a sequence and 0  k  n.  Then S k =[t 1, , t k ] is said to be the presequence of S, of length k>0. zAlso S 0 =[ ] denotes the empty sequence  L  S denotes that L is a presequence of S, z  Defines a Partial Order: reflexive, antisymmetric and transitive. zThe notion of `preorder’ generalizes to the standard subset notion when order and duplicates are immaterial  The empty sequence [ ] is a presequence of every other sequence.

6 Operators on Sequences: S  G  G(S) G j (S) denotes the cumulative output produced up to the j -th input tuple included. S j input up to step j. S is a sequence of length n. Then G is said to be: zBlocking when G j (S)=[] for j<n, and G n (S)=G(S)  Nonblocking when G j (S) = G(S j ), for every j  n. Operators viewed as incremental transducers: G(S): result of aapplying G to the whole S

7 employees(E#,Sal,...) select count(E#) from employees grouped by Sal zTraditional SQL-2 aggregates: blocking select Sal, count(E#) over (range unbounded preceding) from employees ordered by Sal zSQL:2003 Non Blocking Continuous count returns, for each new tuple, the count so far. On a sequence of length n: at each step j<n the count up to j is returned: count 1 (S)= [1], count 2 (S)= [1,2],... count j (S)= [1,2, …, j] independent on whether j=n or j<n. Tradional count: Cumulative return For each j<n: nothing, count j (S)=[] Final: count n (S)=[n]

8 Examples Selection is nonblocking. Projection is non-blocking even if we eliminate resulting duplicates Traditional SQL-2 aggregates are blocking (for arbitrarily ordered input) SQL:2003 OLAP functions are not. E.g. Continuous count, sum, max, etc. (i.e., the unlimited preceding count of OLAP functions) is non-blocking Intermediate cases are also possible

9 Characterization of NonBlocking ( nb ) Theorem: Queries can be expressed via nonblocking computations iff they are monotonic w.r.t. the presequence ordering. Proof: (i)NB G implies monotonic G. Say that Sj  Sn. It is always true that G j (Sn)  G n (Sn). But if G is NB then G j (Sn)=G j (Sj)=G(Sj). QED (ii)monotonic G implies NB G … the incremental G transducer, at step j+1 will add the difference between G(Sj +1 ) and G(Sj).

NonBlocking Iff Monotonic zThe theorem generalizes from presequences to sets---i.e. presequences where duplicates are not allowed and order is immaterial. yIn fact S1 is a subset of S2 iff S1 is a presequence of S2, after proper reordering and elimination of duplicates zNB=monotonic: e.g., selection, projection, and OLAP functions zBlocking= Non-Monotonic: e.g. Traditional aggregates. zResults hold for operators of more than one argument: y Join are monotonic (i.e., NB) in both arguments. yR-S is monotonic on R and antimonotonic on S: i.e., will block on S but not on R (but it will unblock on R only after it has seen the whole S!) 10

11 NB-Completeness  A query language L can express a given set of functions on its input (DB, sequences, data streams). zThus nonmonotonic functions are intrinsically blocking and they cannot be used on data streams. zFor continuous queries on data streams, we should disallow blocking (i.e., nonmonotonic) operators & constructs and only allow nonblocking (i.e., monotonic ) operators: nb-operators for short.  But can ALL the monotonic functions expressible by L be expressed using only its nb-operators ? zOr did we also lose some monotonic queries? Definition: When using only its NB -operators L can express all the monotonic queries expressible in L, then L is said to be NB -complete.

12 Expressive Power and NB-Completeness  Consider a (DB) language L. The expressive power of L is the set of functions F that can be computed on the DB using its operators (or constructs).  On data streams we are only interested in mononotonic functions: F’  F. Also let O be the operators of L, and O’  O be the subset of such operators that are monotonic.  L will be said to be NB-complete if all functions in F’ can be expressed using only the operators in O’. zNB-completeness is a test that O is as suitable for continuous queries on data streams as it is on the database.  Say that L is not NB-complete: then some monotonic function that L can express on the data stored in the DB, it can no longer express on the same data presented as a stream (i.e., from a single read of the DB---push model vs pull model of computation)

13 Is SQL NB complete? zE-Bay Example Auctions: a stream of bids on an item. bidStream(Item#, BidValue, Time) zItems for which sum of bids is > 100K SELECT Item# FROM bidStream GROUP BY Item# HAVING SUM(BidValue) > ; zThis is a monotonic query. Thus it can be expressed in a language containing suitable query operators, but not in SQL-2. SQL-2 is not nb-complete; thus it is ill-suited for continuous queries on data streams. zSo SQL-2 is not nb-complete because of its blocking aggregates. zWhat about RA without aggregates?

14 Relational Algebra (RA) zSet difference can produce monotonic queries: Are these still expressible without set diff?  Intersection is monotonic: R 1  R 2  = R 1  (R 1  R 2 ) But intersection can also be expressed as a joins: product+select. So it is not lost if we disallow set diff. zBut interval coalescing and Until queries are monotonic queries that can be expressed in RA but not in nb-RA. zExample: Temporal domain isomorfic to nonnegative integers.Intervals closed to the left but open to the right: p(0, 3). % 0,1, and 2 are in p but 3 is not p(2, 4). % 3 is not a hole because is covered by this p(4, 5). % 5 is a hole because not covered by any other interval p(6, 8).

15 Coalesce p (cp) & p Until q p(0, 3). p(2, 4). p(4, 5). p(6, 8). cp(0, 3). cp(2, 4). cp(4, 5). cp(6, 8). cp(0, 4). cp(2, 5). cp(0,5). cp contains intervals from the start point of any p interval to the endpoint of any p interval unless the endpoint of an interval in between is a hole. cp(I1, J2)  p(I1, J1), p(I2, J2), J1 < J2,  hole(I1, J2). hole(I1, J2)  p(I1, J1), p(I2, J2), p(_,K), J1  K, K < I2,  cep(K). cep(K)  p(_, K), p(I, J), I  K, K < J. q(5,_) holds if cp has an interval that starts at 0 & contains 5 p Until q(yes)  q(0, J). p Until q(yes)  cp(0, I), q(J, _), I  J.

16 Relational Algebra zNonMonotonic (i.e., blocking) RA operators: set difference and division zWe are left with: select, project, join, and union. Can these express all FO monotonic queries? zSome interesting temporal queries: coalesce and until yThey are expressible in RA (by double negation) yThey are monotonic yBut they cannot be expressed in NB-RA. Theorem: RA and SQL are not NB-complete. SQL faces two problems: (i) the exclusion of EXCEPT/NOT EXISTS, and (ii) the exclusion of aggregates.

17 Real Applications Require REAL Power zSQL’s lack of expressive power is a major problem for database-centric applications. zThese problems are significantly more serious for data streams since: yOnly monotonic queries can be used, yActually, not even all the monotonic ones since SQL is not nb-complete, yThese problems cannot be solved by embedding SQL statements in a PL program—next slide!

18 Embedding SQL Queries in a PL  In DB applications, SQL can be embedded in a PL (Java, C++…) where the PL accesses the tuples returned by SQL using a ` Get Next of Cursor’ statement. zOperations that could not be expressed in SQL can then be expressed in the PL: yan effective remedy for the lack of expressive power of SQL zBut cursors are a ‘pull-based’ mechanism and cannot be used on data streams: the DSMS cannot hold tuples until the PL request them! zThe DSMS can only deliver its output to the PL as a stream yThis might be OK for simple situations yBut if the core of the work has not been done yet, the PL system must do the actual DSMS work! zConclusion: to support applications of any complexity we must have a DSMS with real expressive power, yAs opposed to DBMS that are useful even with a weak QL.

19 Real Applications Require Real Power Embedding CQL in PL programs does not work well... BUT: Embedding PL programs in CQL works: zUser Defined Functions with BLOBS: y Good for DBMS but DSMS require incremental computation zUser-Defined Aggregates (UDAs) functions: yIncremental computation model y Can be defined using a PL or SQL itself y with natively defined UDAs, SQL becomes Turing complete yAnd NB-complete: can express all monotonic functions ySimple syntactic characterization for NB aggregates. y Effective on a broad range of data-intensive applications: KDD in particular. yA few extensions are still need—more later.

Why UDAS are Important zWe have seen how new aggregates can be defined by the intialize, iterate, terminate scheme, using SQL itself (native UDAs) or an external language (C++, Java, etc.) zSQL with natively defined UDAs is Turing Complete. zWith non-blocking UDAs SQL, with a becomes NB-complete: it can express all monotonic computable functions on a single stream. Also on multiple streams if we introduce a sort-merge operator. 20

21 References D. Carney, U. Cetintemel, M. Cherniack, C. Convey, S. Lee, G. Seidman, M. Stonebraker, N. Tatbul, and S. Zdonik. Monitoring streams - a new class of data management applications. In VLDB, Hong Kong, China, Yijian Bai, Hetal Thakkar, Chang Luo, Haixun Wang, Carlo Zaniolo: A Data Stream Language and System Designed for Power and Extensibility. Proc. of the ACM 15th Conference on Information and Knowledge Management (CIKM'06), 2006 Yan-Nei Law, Haixun Wang, Carlo Zaniolo: Query Languages and Data Models for Database Sequences and Data Streams. VLDB 2004: Haixun Wang and Carlo Zaniolo. ATLaS: a native extension of SQL for data minining. In Proceedings of Third SIAM Int. Conference on Data MIning, pages , 2003.

22 `la femme fatale’ in Disney’s Cartoons I am not really bad... Just drawn that way!