Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Slides:



Advertisements
Similar presentations
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 More SQL: Complex Queries, Triggers, Views, and Schema Modification.
Advertisements

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.
Processing XML Keyword Search by Constructing Effective Structured Queries Jianxin Li, Chengfei Liu, Rui Zhou and Bo Ning Swinburne University of Technology,
Use trace algebra to formalize the YAPI model EE290N Spring2002 Alessandro Pinto Mentors: Roberto Passerone Jerry Burch.
Chapter 3 Tuple and Domain Relational Calculus. Tuple Relational Calculus.
C-Store: Self-Organizing Tuple Reconstruction Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Apr. 17, 2009.
Han-na Yang Trace Clustering in Process Mining M. Song, C.W. Gunther, and W.M.P. van der Aalst.
1 Efficient Temporal Coalescing Query Support in Relational Database Systems Xin Zhou 1, Carlo Zaniolo 1, Fusheng Wang 2 1 UCLA, 2 Simens Corporate Research.
3/13/2012Data Streams: Lecture 161 CS 410/510 Data Streams Lecture 16: Data-Stream Sampling: Basic Techniques and Results Kristin Tufte, David Maier.
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Maintaining Sliding Widow Skylines on Data Streams.
Windows in Niagara Jin (Jenny) Li, David Maier, Vassilis Papadimos, Peter Tucker, Kristin Tufte.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
An Abstract Semantics and Concrete Language for Continuous Queries over Streams and Relations Presenter: Liyan Zhang Presentation of ICS
Available bandwidth measurement as simple as running wget D. Antoniades, M. Athanatos, A. Papadogiannakis, P. Markatos Institute of Computer Science (ICS),
Semantics and Evaluation Techniques for Window Aggregates in Data Stream Jin Li, David Maier, Kristin Tufte, Vassillis Papadimos, Peter Tucker. Presented.
Panel on Stream Query Languages The Aurora View Stan Zdonik Brown University.
Avoiding Idle Waiting in the execution of Continuous Queries Carlo Zaniolo CSD CS240B Notes April 2008.
Ch. 28 Q and A IS 333 Spring Q1 Q: What is network latency? 1.Changes in delay and duration of the changes 2.time required to transfer data across.
Adaptive flow control via Interest Aggregation in CCN by Dojun Byun, Byoung-joon, Myeong-Wuk Jang Samsung Electronics, Advanced Institute of Technology.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
STREAM The Stanford Data Stream Management System.
Ontology Alignment/Matching Prafulla Palwe. Agenda ► Introduction  Being serious about the semantic web  Living with heterogeneity  Heterogeneity problem.
Skyline Queries Against Mobile Lightweight Devices in MANETs Zhiyong Huang 1 Christian S. Jensen 2 Hua Lu 1 Beng Chin Ooi 1 1 National University of Singapore,
Query Processing, Resource Management, and Approximation in a Data Stream Management System.
Michael Cafarella Alon HalevyNodira Khoussainova University of Washington Google, incUniversity of Washington Data Integration for Relational Web.
Chapter 2 Adapted from Silberschatz, et al. CHECK SLIDE 16.
Secure Systems Research Group - FAU Using patterns to compare web services standards E. Fernandez and N. Delessy.
Heartbeat Mechanism and its Applications in Gigascope Vladislav Shkapenyuk (speaker), Muthu S. Muthukrishnan Rutgers University Theodore Johnson Oliver.
Towards Robust Indexing for Ranked Queries Dong Xin, Chen Chen, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign VLDB.
POLICY ENGINE Research: Design & Language IRT Lab, Columbia University.
Data Streams: Lecture 101 Window Aggregates in NiagaraST Kristin Tufte, Jin Li Thanks to the NiagaraST PSU.
Integrating Scale Out and Fault Tolerance in Stream Processing using Operator State Management Author: Raul Castro Fernandez, Matteo Migliavacca, et al.
NetSearch: Googling Large-scale Network Management Data GROUP 2 MEMBERS SAMUEL LAWER WENBO HAN HUAN YAN PEI YAN SHREY YADAV SHUAI YU SHINE PANDITA.
Aum Sai Ram Security for Stream Data Modified from slides created by Sujan Pakala.
JONATHAN LESSINGER A CRITIQUE OF CQL. PLAN 1.Background (How CQL, STREAM work) 2.Issues.
P2P Streaming Protocol (PPSP) Requirements draft-zong-ppsp-reqs-03.
Spatial Query Processing Spatial DBs do not have a set of operators that are considered to be basic elements in a query evaluation. Spatial DBs handle.
Network Computing Laboratory A programming framework for Stream Synthesizing Service.
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
A Data Stream Publish/Subscribe Architecture with Self-adapting Queries Alasdair J G Gray and Werner Nutt School of Mathematical and Computer Sciences,
Application Ontology Manager for Hydra IST Ján Hreňo Martin Sarnovský Peter Kostelník TU Košice.
Department of Electronic Engineering City University of Hong Kong EE3900 Computer Networks Protocols and Architecture Slide 1 Use of Standard Protocols.
Mining real world data RDBMS and SQL. Index RDBMS introduction SQL (Structured Query language)
1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter Tucker This work.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
1 SWE Introduction to Software Engineering Lecture 14 – System Modeling.
IP Protocol CSE TCP/IP Concepts Connectionless Operation Internetworking involves connectionless operation at the level of the Internet Protocol.
Event Stream Processing with Out-of-Order Data Arrival Mo Liu Database System Research Group Worcester Polytechnic Institute.
Safety Guarantee of Continuous Join Queries over Punctuated Data Streams Hua-Gang Li *, Songting Chen, Junichi Tatemura Divykant Agrawal, K. Selcuk Candan.
Mechanisms for Requirements Driven Component Selection and Design Automation 최경석.
The latte Stream-Archive Query Project - Exploring Stream+Archive Data in Intelligent Transportation Systems Jin Li (with Kristin Tufte, Vassilis Papadimos,
Data Streams COMP3017 Advanced Databases Dr Nicholas Gibbins –
1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.
More SQL: Complex Queries, Triggers, Views, and Schema Modification
More SQL: Complex Queries,
Efficient Evaluation of XQuery over Streaming Data
COMP3211 Advanced Databases
Relational Algebra - Part 1
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
Activity Diagram.
Using Window Ranking, Offset, and Aggregate Functions
Introduction to Functions
Timing Optimization.
Dop d d 1 2 reconst reconst sop P P 1 2.
Theppatorn rhujittawiwat
OLAP Functions Order-Dependent Aggregates and Windows in SQL: SQL: same as SQL:1999.
Social Practice of the language: Describe and share information
Adaptive Query Processing (Background)
Relational Calculus Chapter 4, Part B
Presentation transcript:

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD 2005

Introduction Window aggregation is an important query capacity. Window aggregation is an important query capacity. Evaluating window aggregate queries over streams is non-trivial. Evaluating window aggregate queries over streams is non-trivial. Overlapping Overlapping Confusion by window definition with physical stream Confusion by window definition with physical stream Out-of-order data arrival. Out-of-order data arrival. …

Techniques Window-ID (WID): Window-ID (WID): Overlapping Overlapping Confusion by window definition with physical stream Confusion by window definition with physical stream Punctuation: Punctuation: Out-of-order data arrival Out-of-order data arrival

Example 1 Q1:SELECTseg-id, max(speed), min(speed) FROMTraffic [RANGE 300 seconds SLIDE 60 seconds WATTR ts] GROUP BY seg-id Q1:SELECTseg-id, max(speed), min(speed) FROMTraffic [RANGE 300 seconds SLIDE 60 seconds WATTR ts] GROUP BY seg-id

Example 1 tuple

Window Semantics Window semantics often has been described operationally. Window semantics often has been described operationally. Example: some window query operators process window extents sequentially, but data arrivals without in window extents order. Example: some window query operators process window extents sequentially, but data arrivals without in window extents order.

Window Specification Window specification: a window type and a set of parameters that defines a window to be used by a query. Window specification: a window type and a set of parameters that defines a window to be used by a query. ex: RANGE, SLIDE and WATTR in Q1. ex: RANGE, SLIDE and WATTR in Q1. Different window aggregate query has different window specification. Different window aggregate query has different window specification. Sliding window aggregate query. Sliding window aggregate query. Stream Query: Stream Query: Data-driven Data-driven Domain-driven Domain-driven

Window Specification Similar to the CQL (Continuous Query Language). Similar to the CQL (Continuous Query Language). Different: user specified WATTR and SLIDE parameters. Different: user specified WATTR and SLIDE parameters.

Sliding Window Aggregate Time-based: Time-based: Q1 Q1 Row-based: Row-based: RANGE and SLIDE are different attributes: RANGE and SLIDE are different attributes:

Sliding Window Aggregate Partitioned Window Aggregate: Partitioned Window Aggregate: Using function: a variation of Q3 Using function: a variation of Q3

Window Semantic Framework Three functions for mapping between window- ids and tuples in both directions Three functions for mapping between window- ids and tuples in both directions windows, extent and wids. windows, extent and wids. T : a set of tuples. T : a set of tuples. S : window specification S : window specification windows (T,S): set of window-ids that identify window extents to which tuples in T may belongs. windows (T,S): set of window-ids that identify window extents to which tuples in T may belongs. extent (w,T,S): the set of tuples in T belonging to the window extent identified by w, extent (w,T,S): the set of tuples in T belonging to the window extent identified by w,

windows, extent queries in which RANGE and SLIDE are specified on the WATTR attribute: queries in which RANGE and SLIDE are specified on the WATTR attribute: slide-by-tuple: slide-by-tuple:

slide-by-n_tuples: slide-by-n_tuples: slide-by-n_tuples over logical order: slide-by-n_tuples over logical order: partitioned tuple-based: partitioned tuple-based:

Mapping Tuples to Window-ids wids: Function for identifying window extent to which tuple t belongs. wids: Function for identifying window extent to which tuple t belongs. queries in which RANGE and SLIDE are specified on the WATTR attribute: queries in which RANGE and SLIDE are specified on the WATTR attribute: slide-by-tuple (and variations): slide-by-tuple (and variations):

Partitioned tuple-base: Partitioned tuple-base: r=rank(t,row-num,PATTR,T)

Towards Window Query Evaluation Backward-context Backward-context Given a tuple t, it s backward-context is information about tuples that have arrived before t. Given a tuple t, it s backward-context is information about tuples that have arrived before t. ex: partitioned tuple-based window. ex: partitioned tuple-based window. Forward-context Forward-context Given a tuple t, it s backward-context is information about tuples that have arrived after t. Given a tuple t, it s backward-context is information about tuples that have arrived after t. ex: slide-by-tuple. ex: slide-by-tuple. FCF( forward-context free) FCF( forward-context free) FCA (forward-context award) FCA (forward-context award)

Disorder Merging unsynchronized streams, network delays. Merging unsynchronized streams, network delays. ex: network flow sometimes use start time as timestamp. ex: network flow sometimes use start time as timestamp. Methods: slack, BSort, heartbeats. Methods: slack, BSort, heartbeats.

FCF Window with WID Approach Punctuation: A message embedded in a data stream indicating that a certain subset of data is complete. WID uses punctuations to signal the end of window extents. Punctuation: A message embedded in a data stream indicating that a certain subset of data is complete. WID uses punctuations to signal the end of window extents. wids function punctuation

FCA Windows with WID Approach FCB (forward-context bounded) FCB (forward-context bounded) FCU (forward-context unbounded) FCU (forward-context unbounded)

Performance Environment: Environment: Data generator: XMark data generator, and network analysis tool. Data generator: XMark data generator, and network analysis tool. 1. data in generated order. 1. data in generated order. 2. data in bounded-disorder 2. data in bounded-disorder 3. data in block-sorted-disorder. 3. data in block-sorted-disorder. Comparison: buffering mechanism. Comparison: buffering mechanism.

Parameters R: RANGE R: RANGE S: SLIDE S: SLIDE

Result WID V.S. Buffering WID V.S. Buffering

Result

Conclusion