PSoup Kevin Menard CS 561 4/11/2005. Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin.

Slides:



Advertisements
Similar presentations
Evaluating Window Joins over Unbounded Streams Author: Jaewoo Kang, Jeffrey F. Naughton, Stratis D. Viglas University of Wisconsin-Madison CS Dept. Presenter:
Advertisements

Analysis of : Operator Scheduling in a Data Stream Manager CS561 – Advanced Database Systems By Eric Bloom.
Query Folding Xiaolei Qian Presented by Ram Kumar Vangala.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Macro Processor.
SECTION 21.5 Eilbroun Benjamin CS 257 – Dr. TY Lin INFORMATION INTEGRATION.
Static Optimization of Conjunctive Queries with Sliding Windows over Infinite Streams Presented by: Andy Mason and Sheng Zhong Ahmed M.Ayad and Jeffrey.
CS263 Lecture 19 Query Optimisation.  Motivation for Query Optimisation  Phases of Query Processing  Query Trees  RA Transformation Rules  Heuristic.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden, Mehul Shah, Joseph Hellerstein, and Vijayshankar Raman Presented by: Bhuvan.
1 NiagaraCQ: A Scalable Continuous Query System for Internet Databases CS561 Presentation Xiaoning Wang.
Access Path Selection in a RDBMS Shahram Ghandeharizadeh Computer Science Department University of Southern California.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
DSAC (Digital Signature Aggregation and Chaining) Digital Signature Aggregation & Chaining An approach to ensure integrity of outsourced databases.
Query Execution Chapter 15 Section 15.1 Presented by Khadke, Suvarna CS 257 (Section II) Id
Access Path Selection in a Relation Database Management System (summarized in section 2)
Indexing XML Data Stored in a Relational Database VLDB`2004 Shankar Pal, Istvan Cseri, Gideon Schaller, Oliver Seeliger, Leo Giakoumakis, Vasili Vasili.
Jingren Zhou, Per-Ake Larson, Ronnie Chaiken ICDE 2010 Talk by S. Sudarshan, IIT Bombay Some slides from original talk by Zhou et al. 1.
Hashed Samples Selectivity Estimators for Set Similarity Selection Queries.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
NiagaraCQ A Scalable Continuous Query System for Internet Databases Jianjun Chen, David J DeWitt, Feng Tian, Yuan Wang University of Wisconsin – Madison.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases (modified slides available on course webpage) Jianjun Chen et al Computer Sciences.
施賀傑 何承恩 TelegraphCQ. Outline Introduction Data Movement Implies Adaptivity Telegraph - an Ancestor of TelegraphCQ Adaptive Building.
1 XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena.
Lecture Set 14 B new Introduction to Databases - Database Processing: The Connected Model (Using DataReaders)
Database Management 9. course. Execution of queries.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Ashwani Roy Understanding Graphical Execution Plans Level 200.
Query Optimization Arash Izadpanah. Introduction: What is Query Optimization? Query optimization is the process of selecting the most efficient query-evaluation.
1 Fjording The Stream An Architecture for Queries over Streaming Sensor Data Samuel Madden, Michael Franklin UC Berkeley.
Research Case in Cloud Computing IST 501 Fall 2014 Dongwon Lee, Ph.D.
M.Kersten Dec 31, Cracking the database store The far side of the Moon Martin Kersten, Stefan Manegold Centre for Mathematics and Computer Science.
Master Informatique 1 Semantic Technologies Part 7SPARQL 1.1 Werner Nutt.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: CPU.
Set Containment Joins: The Good, The Bad and The Ugly Karthikeyan Ramasamy Jointly With Jignesh Patel, Jeffrey F. Naughton and Raghav Kaushik.
REED: Robust, Efficient Filtering and Event Detection in Sensor Networks Daniel Abadi, Samuel Madden, Wolfgang Lindner MIT United States VLDB 2005.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Query Execution Section 15.1 Shweta Athalye CS257: Database Systems ID: 118 Section 1.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Streaming Queries over Streaming Data Sirish Chandrasekaran (UC Berkeley) Michael J. Franklin (UC Berkeley) Presented by Andy Williamson.
1 Notes on: Clusters Index and Cluster Creation in SQL Elisa Bertino CS Department and CERIAS Purdue University.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
1 Continuously Adaptive Continuous Queries (CACQ) over Streams Samuel Madden SIGMOD 2002 June 4, 2002 With Mehul Shah, Joseph Hellerstein, and Vijayshankar.
1 CS 430 Database Theory Winter 2005 Lecture 5: Relational Algebra.
1 CSCD 326 Data Structures I Hashing. 2 Hashing Background Goal: provide a constant time complexity method of searching for stored data The best traditional.
CSCI 5708: Query Processing II Pusheng Zhang University of Minnesota Feb 5, 2004.
Efficient OLAP Operations in Spatial Data Warehouses Dimitris Papadias, Panos Kalnis, Jun Zhang and Yufei Tao Department of Computer Science Hong Kong.
Indexing OLAP Data Sunita Sarawagi Monowar Hossain York University.
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
CS 440 Database Management Systems Lecture 5: Query Processing 1.
NiagaraCQ : A Scalable Continuous Query System for Internet Databases Jianjun Chen et al Computer Sciences Dept. University of Wisconsin-Madison SIGMOD.
Storage Representations for Set-Oriented Selection Predicates Karthikeyan Ramasamy with Jeffrey F. Naughton and David Maier.
Diving into Query Execution Plans ED POLLACK AUTOTASK CORPORATION DATABASE OPTIMIZATION ENGINEER.
SERIALIZED DATA STORAGE Within a Database James Devens (devensj)
Lecture 6- Query Optimization (continued)
Module 11: File Structure
COP4710 Database Systems Relational Algebra.
Chapter 15 QUERY EXECUTION.
Examples of Physical Query Plan Alternatives
Relational Algebra Chapter 4 - part I.
Streaming Sensor Data Fjord / Sensor Proxy Multiquery Eddy
Lecture 21: ML Optimizers
Query Processing CSD305 Advanced Databases.
Chapter 8 Advanced SQL.
Self-organizing Tuple Reconstruction in Column-stores
TelegraphCQ: Continuous Dataflow Processing for an Uncertain World
PSoup: A System for streaming queries over streaming data
Yan Huang - CSCI5330 Database Implementation – Query Processing
Eddies for Continuous Queries
Adaptive Query Processing (Background)
Streams and Stuff Sirish and Sam and Mike.
Presentation transcript:

PSoup Kevin Menard CS 561 4/11/2005

Streaming Queries over Streaming Data Sirish Chandrasekaran UC Berkeley August 20, 2002 with Michael J. Franklin VLDB 2002 Slides are modified versions of the following original presentation:

Sirish Chandrasekaran Psoup Insight #1 Queries and data are duals Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data – Supports all three types of queries: queries over the past, (landmark and sliding window) continuous, and hybrid Dat a Index Result Queries Query Index

Sirish Chandrasekaran Psoup Insight #1 Index Dat a Result Data Queries Queries and data are duals Store new queries, apply to data that arrived earlier Store new data, apply to queries that arrived earlier Multiquery Processing = “join” of query and data – Supports all three types of queries: queries over the past, (landmark and sliding window) continuous, and hybrid

Sirish Chandrasekaran Motivation? Why another model for continuous queries? What is wrong with how Aurora and STREAM supply responses?

Sirish Chandrasekaran Motivation: Disconnected Operation Previous solutions stream out answers immediately Not feasible/suitable for all applications Intermittent Connectivity: e.g., Applications on hand-held devices (as in this morning’s keynote address) Even if connected: Not always interested in streaming answers

Sirish Chandrasekaran Psoup Insight #2 Separate computation from delivery Query answers continuously generated in background Apply windows on-demand to transmit “current” results Efficient support for disconnected operation Low response time, Shared computation and storage across invocations Data IDR.aR.b Query IDPredicate Results Structure Queries Data T TF F TT F FF T FF Register T T F T Invoke }

Sirish Chandrasekaran PSoup Query Model S ELECT select_list F ROM from_list W HERE where_clause B EGIN begin_time E ND end_time Where clause: conjunction of boolean factors B EGIN -E ND clause: system clock or sequence numbers (begin_time, end_time): (constant, constant) – snapshot query (constant, variable) – landmark window query (variable, variable) – sliding window query

Sirish Chandrasekaran Query Registration S ELECT select_list F ROM from_list W HERE where_clause B EGIN begin_time E ND end_time } } Standing Query Clause (SQC) Windows_Table Symmetric Join to the QueryID: handle for future query invocations

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification Data Store ID R.a R.b PSoup (a) Initial State Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification PSoup (b) Arrival of new Query Select * From R Where R.a =3 New query ID R.a R.b ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data StoreQuery Store

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification PSoup (c) Building Query Store 24R.a =3 ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID R.a R.b BUILD Data StoreQuery Store

Sirish Chandrasekaran (d) Probing Data Store Selections over Single Stream: Arrival of New Query Specification PSoup match 24R.a =3 ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 ID R.a R.b PROBE Data StoreQuery Store

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification Results Structure ? ? ? ? 52? 21 (e) Inserting Results Results Queries Data

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Query Specification Results Structure T F T F 52F 21 (e) Inserting Results Results Queries Data

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data Data Store ID R.a R.b PSoup (a) Initial State Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 24R.a =3

Sirish Chandrasekaran PSoup (b) Arrival of new Data New data 24R.a =3 Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID R.a R.b 5336 Selections over Single Stream: Arrival of New Data

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data PSoup (c) Building Data Store 24R.a =3 Query Store ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Data Store ID R.a R.b 5336 BUILD

Sirish Chandrasekaran (d) Probing Query Store Selections over Single Stream: Arrival of New Data PSoup 24R.a =3 ID Predicate 0<R.a<=5 R.a>4 and R.b=3 0>R.b>4 R.a=4 and R.b=3 Query StoreData Store ID R.a R.b 5336 match PROBE

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data Results Structure (e) Inserting Results Results Queries Data ????? 24R.a =3 200<R.a<=5

Sirish Chandrasekaran Selections over Single Stream: Arrival of New Data Results Structure (e) Inserting Results Results Queries Data TFFFT 24R.a =3 200<R.a<=5

Sirish Chandrasekaran Query Invocation Results Structure T F T F 52F 21 Queries Data 53TFFFT } Current Window BEGIN begin_time END end_time System returns the results corresponding to the current value of the B EGIN -E ND clause

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store (a) Initial State PSoup ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification 23R.a S.a and S.b>1 (b) Arrival of new Query PSoup New query Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store S-Data Store ID S.a S.b

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification 23R.a S.a and S.b>1 (c) Building Query Store PSoup ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store BUILD S-Data Store ID S.a S.b Query Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification (d) Probing R-Data Store PSoup } Matches 23R.a S.a and S.b>1 ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store PROBE S-Data Store ID S.a S.b Query Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a S.a and S.b>1 ID R.a R.b R-Data Store (e) Constructing Hybrid Structs PSoup } Matches >S.a and S.b>1 Query Store 233>S.a and S.b>1 234>S.a and S.b>1 Hybrid Structs R.IDQ.IDQ.Predicate S-Data Store ID S.a S.b

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification (f) Probing S-Data Store PSoup Matches { ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a S.a and S.b>1 S-Data Store ID R.a R.b R-Data Store Query Store >S.a and S.b>1 233>S.a and S.b>1 234>S.a and S.b>1 Hybrid Structs R.IDQ.IDQ.Predicate PROBE ? ? ? R,S,Q Results ID S.a S.b

Sirish Chandrasekaran Joins over R and S: Arrival of New Query Specification (f) Probing S-Data Store PSoup Matches { ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 R.a S.a and S.b>1 S-Data Store ID R.a R.b R-Data Store Query Store >S.a and S.b>1 233>S.a and S.b>1 234>S.a and S.b>1 Hybrid Structs R.IDQ.IDQ.Predicate PROBE 14,21,23 31,21,23 31,25,23 R,S,Q Results ID S.a S.b

Sirish Chandrasekaran Joins over R and S: Arrival of New Data Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store (a) Initial State PSoup 23R.a<4 and R.b<S.b ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Data (b) Arrival of new Data PSoup New data 5354 Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b R-Data Store 23R.a<4 and R.b<S.b ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Data (c) Building R-Data Store PSoup Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b 23R.a<4 and R.b<S.b R-Data Store BUILD ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Data (c) Probing Query Store PSoup Matches { Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 ID R.a R.b 23R.a<4 and R.b<S.b R-Data Store PROBE ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Data (d) Constructing Hybrid Structs PSoup Matches { ? 53 ?4<S.b 21? 22? Hybrid Structs ID R.a R.b Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23R.a<4 and R.b<S.b R-Data Store R.IDQ.IDQ.Predicate ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Data (d) Constructing Hybrid Structs PSoup Matches { <S.b 214<S.b and S.a< >S.a and S.b>2 Hybrid Structs ID R.a R.b Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23R.a<4 and R.b<S.b R-Data Store R.IDQ.IDQ.Predicate ID S.a S.b S-Data Store

Sirish Chandrasekaran Joins over R and S: Arrival of New Data (e) Probing S-Data Store PSoup Matches } Hybrid Structs ID R.a R.b ID S.a S.b S-Data Store Query Store ID Predicate R.a=5 and R.b<S.b R.a>4 and R.b<S.b and S.a<10 R.b=4 and R.a+5>S.a and S.b>2 23R.a<4 and R.b<S.b R-Data Store PROBE <S.b 214<S.b and S.a< >S.a and S.b>2 R.IDQ.IDQ.Predicate 53,48,22 53,49,22 R,S,Q Results

Sirish Chandrasekaran Other Queries N-way Joins Similar to 2-way joins Probe, generate hybrid structs, repeat Can be executed without intermediate tables Aggregations Performed at query invocation Uses n-ary ranked tree, clustered on time

Sirish Chandrasekaran Telegraph Background: CACQ CACQ [MSHR02] Shared execution of multiple queries with one Eddy Tuple lineage Query Indices Queries and Data treated very differently Only Landmark Continuous Queries No support for disconnected operation

Sirish Chandrasekaran Leverage SteMs to store and index queries Changes to Eddies Encode queries as tuples break Where clause into individual boolean factors (BF) encode each BF as R.a relop [R.b|S.b] [+|-] constant Stream Prefix Consistency A new query or data tuple is completely processed before any other tuple: no holes in Result Structure. Results Structure: to buffer the results. PSoup in Telegraph

Sirish Chandrasekaran Experiments and Results Alternatives NoMat – No background processing PSoup-Partial – background processing, apply current window on invocation PSoup-Complete – current windows are also continuously applied in the background Experimental Parameters Unloaded Server with two Intel Pentium III, 666 MHz processors with 768 MB RAM Data arrives as fast as possible, in domain [0,255] Queries of form R.a relop C, where c in [0,255] Join Queries of form R.a relop S.b +/- C.

Sirish Chandrasekaran Experiments: Response Time vs. Window Size Interval Predicates, Selection Queries

Sirish Chandrasekaran Equality Predicates, Selection Queries Experiments: Response Time vs. Window Size

Sirish Chandrasekaran Window Size = 1000 tuples Experiments: Max data arrival rate vs. #SQCs

Sirish Chandrasekaran PSoup in traditional query processor PSoup = SQL QUERY over data and client query streams? Joins = expression evaluators Notes Conventional QPs do not have tuple lineage Conventional QPs always use intermediate tables

Sirish Chandrasekaran Conclusions Treating Queries and Data the same Combines approaches for previously studied queries Queries over the past and continuous queries Allows new functionality – hybrid queries Separating Result Generation and Delivery Makes disconnected operation feasible Efficient support for repeated query invocations