PSoup Kevin Menard CS 561 4/11/2005
Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin VLDB 2002 Slides are modified versions of the following original presentation:
Sirish Chandrasekaran Psoup Insight #1 Queries and data are duals Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data – Supports all three types of queries: queries over the past, (landmark and sliding window) continuous, and hybrid Dat a Index Result Queries Query Index
Sirish Chandrasekaran Psoup Insight #1 Index Dat a Result Data Queries Queries and data are duals Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data – Supports all three types of queries: queries over the past, (landmark and sliding window) continuous, and hybrid
Sirish Chandrasekaran Motivation? Why another model for continuous queries? What is wrong with how Aurora and STREAM supply responses?
Sirish Chandrasekaran Motivation: Disconnected Operation Previous solutions stream out answers immediately Not feasible/suitable for all applications Intermittent Connectivity: e.g., Applications on hand-held devices (as in this morning’s keynote address) Even if connected: Not always interested in streaming answers
Sirish Chandrasekaran Psoup Insight #2 Separate computation from delivery Query answers continuously generated in background Apply windows on-demand to transmit “current” results Efficient support for disconnected operation Low response time, Shared computation and storage across invocations Data IDR.aR.b Query IDPredicate Results Structure Queries Data T TF F TT F FF T FF Register T T F T Invoke }
Sirish Chandrasekaran PSoup Query Model S ELECT select_list F ROM from_list W HERE where_clause B EGIN begin_time E ND end_time Where clause: conjunction of boolean factors B EGIN -E ND clause: system clock or sequence numbers (begin_time, end_time): (constant, constant) – snapshot query (constant, variable) – landmark window query (variable, variable) – sliding window query
Sirish Chandrasekaran Query Registration S ELECT select_list F ROM from_list W HERE where_clause B EGIN begin_time E ND end_time } } Standing Query Clause (SQC) Windows_Table Symmetric Join to the QueryID: handle for future query invocations
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification Data Store ID R.a R.b PSoup (a) Initial State Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification PSoup (b) Arrival of new Query Select * From R Where R.a =3 New query ID R.a R.b ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data StoreQuery Store
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification PSoup (c) Building Query Store 24R.a =3 ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID R.a R.b BUILD Data StoreQuery Store
Sirish Chandrasekaran (d) Probing Data Store Selections over Single Stream: Arrival of New Query Specification PSoup match 24R.a =3 ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID R.a R.b PROBE Data StoreQuery Store
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification Results Structure ? ? ? ? 52? 21 (e) Inserting Results Results Queries Data
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification Results Structure T F T F 52F 21 (e) Inserting Results Results Queries Data
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data Data Store ID R.a R.b PSoup (a) Initial State Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 24R.a =3
Sirish Chandrasekaran PSoup (b) Arrival of new Data New data 24R.a =3 Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID R.a R.b 5336 Selections over Single Stream: Arrival of New Data
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data PSoup (c) Building Data Store 24R.a =3 Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID R.a R.b 5336 BUILD
Sirish Chandrasekaran (d) Probing Query Store Selections over Single Stream: Arrival of New Data PSoup 24R.a =3 ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Query StoreData Store ID R.a R.b 5336 match PROBE
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data Results Structure (e) Inserting Results Results Queries Data ????? 24R.a =3 200<R.a<=5
Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data Results Structure (e) Inserting Results Results Queries Data TFFFT 24R.a =3 200<R.a<=5
Sirish Chandrasekaran Query Invocation Results Structure T F T F 52F 21 Queries Data 53TFFFT } Current Window BEGIN begin_time END end_time System returns the results corresponding to the current value of the B EGIN -E ND clause
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store (a) Initial State PSoup ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification 23R.a S.a and S.b>1 (b) Arrival of new Query PSoup New query Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store S-Data Store ID S.a S.b
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification 23R.a S.a and S.b>1 (c) Building Query Store PSoup ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store BUILD S-Data Store ID S.a S.b Query Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification (d) Probing R-Data Store PSoup } Matches 23R.a S.a and S.b>1 ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store PROBE S-Data Store ID S.a S.b Query Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a S.a and S.b>1 ID R.a R.b R-Data Store (e) Constructing Hybrid Structs PSoup } Matches >S.a and S.b>1 Query Store 233>S.a and S.b>1 234>S.a and S.b>1 Hybrid Structs R.IDQ.IDQ.Predicate S-Data Store ID S.a S.b
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification (f) Probing S-Data Store PSoup Matches { ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a S.a and S.b>1 S-Data Store ID R.a R.b R-Data Store Query Store >S.a and S.b>1 233>S.a and S.b>1 234>S.a and S.b>1 Hybrid Structs R.IDQ.IDQ.Predicate PROBE ? ? ? R,S,Q Results ID S.a S.b
Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification (f) Probing S-Data Store PSoup Matches { ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a S.a and S.b>1 S-Data Store ID R.a R.b R-Data Store Query Store >S.a and S.b>1 233>S.a and S.b>1 234>S.a and S.b>1 Hybrid Structs R.IDQ.IDQ.Predicate PROBE 14,21,23 31,21,23 31,25,23 R,S,Q Results ID S.a S.b
Sirish Chandrasekaran Joins over R and S: Arrival of New Data Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store (a) Initial State PSoup 23R.a<4 and R.b<S.b ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Data (b) Arrival of new Data PSoup New data 5354 Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store 23R.a<4 and R.b<S.b ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Data (c) Building R-Data Store PSoup Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b 23R.a<4 and R.b<S.b R-Data Store BUILD ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Data (c) Probing Query Store PSoup Matches { Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b 23R.a<4 and R.b<S.b R-Data Store PROBE ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Data (d) Constructing Hybrid Structs PSoup Matches { ? 53 ?4<S.b 21? 22? Hybrid Structs ID R.a R.b Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23R.a<4 and R.b<S.b R-Data Store R.IDQ.IDQ.Predicate ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Data (d) Constructing Hybrid Structs PSoup Matches { <S.b 214<S.b and S.a< >S.a and S.b>2 Hybrid Structs ID R.a R.b Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23R.a<4 and R.b<S.b R-Data Store R.IDQ.IDQ.Predicate ID S.a S.b S-Data Store
Sirish Chandrasekaran Joins over R and S: Arrival of New Data (e) Probing S-Data Store PSoup Matches } Hybrid Structs ID R.a R.b ID S.a S.b S-Data Store Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23R.a<4 and R.b<S.b R-Data Store PROBE <S.b 214<S.b and S.a< >S.a and S.b>2 R.IDQ.IDQ.Predicate 53,48,22 53,49,22 R,S,Q Results
Sirish Chandrasekaran Other Queries N-way Joins Similar to 2-way joins Probe, generate hybrid structs, repeat Can be executed without intermediate tables Aggregations Performed at query invocation Uses n-ary ranked tree, clustered on time
Sirish Chandrasekaran Telegraph Background: CACQ CACQ [MSHR02] Shared execution of multiple queries with one Eddy Tuple lineage Query Indices Queries and Data treated very differently Only Landmark Continuous Queries No support for disconnected operation
Sirish Chandrasekaran Leverage SteMs to store and index queries Changes to Eddies Encode queries as tuples break Where clause into individual boolean factors (BF) encode each BF as R.a relop [R.b|S.b] [+|-] constant Stream Prefix Consistency A new query or data tuple is completely processed before any other tuple: no holes in Result Structure. Results Structure: to buffer the results. PSoup in Telegraph
Sirish Chandrasekaran Experiments and Results Alternatives NoMat – No background processing PSoup-Partial – background processing, apply current window on invocation PSoup-Complete – current windows are also continuously applied in the background Experimental Parameters Unloaded Server with two Intel Pentium III, 666 MHz processors with 768 MB RAM Data arrives as fast as possible, in domain [0,255] Queries of form R.a relop C, where c in [0,255] Join Queries of form R.a relop S.b +/- C.
Sirish Chandrasekaran Experiments: Response Time vs. Window Size Interval Predicates, Selection Queries
Sirish Chandrasekaran Equality Predicates, Selection Queries Experiments: Response Time vs. Window Size
Sirish Chandrasekaran Window Size = 1000 tuples Experiments: Max data arrival rate vs. #SQCs
Sirish Chandrasekaran PSoup in traditional query processor PSoup = SQL QUERY over data and client query streams? Joins = expression evaluators Notes Conventional QPs do not have tuple lineage Conventional QPs always use intermediate tables
Sirish Chandrasekaran Conclusions Treating Queries and Data the same Combines approaches for previously studied queries Queries over the past and continuous queries Allows new functionality – hybrid queries Separating Result Generation and Delivery Makes disconnected operation feasible Efficient support for repeated query invocations