Presentation is loading. Please wait.

Presentation is loading. Please wait.

Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD.

Similar presentations


Presentation on theme: "Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD."— Presentation transcript:

1 Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD 2005

2 Introduction Window aggregation is an important query capacity. Window aggregation is an important query capacity. Evaluating window aggregate queries over streams is non-trivial. Evaluating window aggregate queries over streams is non-trivial. Overlapping Overlapping Confusion by window definition with physical stream Confusion by window definition with physical stream Out-of-order data arrival. Out-of-order data arrival. …

3 Techniques Window-ID (WID): Window-ID (WID): Overlapping Overlapping Confusion by window definition with physical stream Confusion by window definition with physical stream Punctuation: Punctuation: Out-of-order data arrival Out-of-order data arrival

4 Example 1 Q1:SELECTseg-id, max(speed), min(speed) FROMTraffic [RANGE 300 seconds SLIDE 60 seconds WATTR ts] GROUP BY seg-id Q1:SELECTseg-id, max(speed), min(speed) FROMTraffic [RANGE 300 seconds SLIDE 60 seconds WATTR ts] GROUP BY seg-id

5 Example 1 tuple

6 Window Semantics Window semantics often has been described operationally. Window semantics often has been described operationally. Example: some window query operators process window extents sequentially, but data arrivals without in window extents order. Example: some window query operators process window extents sequentially, but data arrivals without in window extents order.

7 Window Specification Window specification: a window type and a set of parameters that defines a window to be used by a query. Window specification: a window type and a set of parameters that defines a window to be used by a query. ex: RANGE, SLIDE and WATTR in Q1. ex: RANGE, SLIDE and WATTR in Q1. Different window aggregate query has different window specification. Different window aggregate query has different window specification. Sliding window aggregate query. Sliding window aggregate query. Stream Query: Stream Query: Data-driven Data-driven Domain-driven Domain-driven

8 Window Specification Similar to the CQL (Continuous Query Language). Similar to the CQL (Continuous Query Language). Different: user specified WATTR and SLIDE parameters. Different: user specified WATTR and SLIDE parameters.

9 Sliding Window Aggregate Time-based: Time-based: Q1 Q1 Row-based: Row-based: RANGE and SLIDE are different attributes: RANGE and SLIDE are different attributes:

10 Sliding Window Aggregate Partitioned Window Aggregate: Partitioned Window Aggregate: Using function: a variation of Q3 Using function: a variation of Q3

11 Window Semantic Framework Three functions for mapping between window- ids and tuples in both directions Three functions for mapping between window- ids and tuples in both directions windows, extent and wids. windows, extent and wids. T : a set of tuples. T : a set of tuples. S : window specification S : window specification windows (T,S): set of window-ids that identify window extents to which tuples in T may belongs. windows (T,S): set of window-ids that identify window extents to which tuples in T may belongs. extent (w,T,S): the set of tuples in T belonging to the window extent identified by w, extent (w,T,S): the set of tuples in T belonging to the window extent identified by w,

12 windows, extent queries in which RANGE and SLIDE are specified on the WATTR attribute: queries in which RANGE and SLIDE are specified on the WATTR attribute: slide-by-tuple: slide-by-tuple:

13 slide-by-n_tuples: slide-by-n_tuples: slide-by-n_tuples over logical order: slide-by-n_tuples over logical order: partitioned tuple-based: partitioned tuple-based:

14 Mapping Tuples to Window-ids wids: Function for identifying window extent to which tuple t belongs. wids: Function for identifying window extent to which tuple t belongs. queries in which RANGE and SLIDE are specified on the WATTR attribute: queries in which RANGE and SLIDE are specified on the WATTR attribute: slide-by-tuple (and variations): slide-by-tuple (and variations):

15 Partitioned tuple-base: Partitioned tuple-base: r=rank(t,row-num,PATTR,T)

16 Towards Window Query Evaluation Backward-context Backward-context Given a tuple t, it s backward-context is information about tuples that have arrived before t. Given a tuple t, it s backward-context is information about tuples that have arrived before t. ex: partitioned tuple-based window. ex: partitioned tuple-based window. Forward-context Forward-context Given a tuple t, it s backward-context is information about tuples that have arrived after t. Given a tuple t, it s backward-context is information about tuples that have arrived after t. ex: slide-by-tuple. ex: slide-by-tuple. FCF( forward-context free) FCF( forward-context free) FCA (forward-context award) FCA (forward-context award)

17 Disorder Merging unsynchronized streams, network delays. Merging unsynchronized streams, network delays. ex: network flow sometimes use start time as timestamp. ex: network flow sometimes use start time as timestamp. Methods: slack, BSort, heartbeats. Methods: slack, BSort, heartbeats.

18 FCF Window with WID Approach Punctuation: A message embedded in a data stream indicating that a certain subset of data is complete. WID uses punctuations to signal the end of window extents. Punctuation: A message embedded in a data stream indicating that a certain subset of data is complete. WID uses punctuations to signal the end of window extents. wids function punctuation

19 FCA Windows with WID Approach FCB (forward-context bounded) FCB (forward-context bounded) FCU (forward-context unbounded) FCU (forward-context unbounded)

20 Performance Environment: Environment: Data generator: XMark data generator, and network analysis tool. Data generator: XMark data generator, and network analysis tool. 1. data in generated order. 1. data in generated order. 2. data in bounded-disorder 2. data in bounded-disorder 3. data in block-sorted-disorder. 3. data in block-sorted-disorder. Comparison: buffering mechanism. Comparison: buffering mechanism.

21 Parameters R: RANGE R: RANGE S: SLIDE S: SLIDE

22 Result WID V.S. Buffering WID V.S. Buffering

23 Result

24 Conclusion


Download ppt "Semantics and Evaluation Techniques for Window Aggregates in Data Streams Jin Li, David Maier, Kristin Tufte, Vassilis Papadimos, Peter A. Tucker SIGMOD."

Similar presentations


Ads by Google