Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David.

Similar presentations


Presentation on theme: "1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David."— Presentation transcript:

1 1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David Maier, and Kristin Tufte

2 2 Stream (monitoring) applications Network packets, transportation data, sensor data, stock quotes …  Process data online  Often require (near) real-time response  Often involve data from multiple sources  Have no control over the physical properties of the data Challenges for database-style support of stream applications  Continuous query vs. one-time query “count the number of packets in the past minute”  Adapt traditional query operators to data streams Requires a time attribute Stream Query System source A source B source C (srcIP, dstIP, len, ts)

3 3 Stream Query – Windowed Aggregation count (*) group by tb, srcIP, destIP union ts windows union 0 1 2 3 4 5 6 7 8 9 10 Query 1: SELECT tb, srcIP, destIP, count(*) FROM A union B union C GROUP BY ts/60 as tb, srcIP, destIP “Count the number of packets in each minutes; update result every minute” ABC

4 4 IOP Evaluation – Current Approach Merge is an order-preserving implementation of UnionAll How to determine end of windows? merge sort merge Query 1: SELECT tb, srcIP, destIP, count(*) FROM A union B union C GROUP BY ts/60 as tb, srcIP, destIP count (*) group by tb, srcIP, destIP ts windows 0 1 2 3 4 5 6 7 8 9 10 ABC

5 5 Problems Performance penalty  Burst Caused by maintaining stream order May overload the stream system  Memory Overhead Sort Order-preserving merge  time skew  lulls  Latency Maintaining stream order delays tuple processing count (*) group by tb, srcIP, destIP merge sort ABC

6 6 Do We Have to Maintain Stream Order?

7 7 Stream Query – Windowed Aggregation count (*) group by tb, srcIP, destIP union ts windows union 0 1 2 3 4 5 6 7 8 9 10 Query 1: SELECT tb, srcIP, destIP, count(*) FROM A union B union C GROUP BY ts/60 as tb, srcIP, destIP Stream query evaluation essentially requires information on stream progress ABC

8 8 Outline Stream query and existing evaluation approach (IOP) The out-of-order processing (OOP) alternative The OOP Implementation in Gigascope Initial performance results

9 9 Disorder External Sources  Merging multiple data sources  Different transmission routes (e.g., sensor networks)  Multiple possible windowing attributes, e.g., start time and end time of netflows Internal Sources  Data prioritization [Urhan and Franklin, 2001]  Query processing algorithms, e.g., shared window joins [Hammad, et al., 2003]

10 10 OOP Stream Query Evaluation – Leveraging Punctuation count (*) group by tb, srcIP, destIP union (202.2.5.3, 217.4.1.9, 64, 10:01:30am) (130.3.7.9, 121.7.0.3, 32, 10:02:05am)( *, *, *, 10:02:00am) (202.2.5.3, 217.4.1.9, 64, 10:01:45am) 202.2.5.3217.4.1.911857 … srcIPdestIPtbcnt130.3.7.9121.7.0.311981202.2.5.3217.4.1.911858 ( *, *, *, 10:02:00am) 10:02:00am ( *, *, *, 10:03:00am) 10:03:00am ( *, *, *, 10:03:00am) ( *, *, *, 10:02:00am) 10:02:00am10:03:00am (118, 202.2.5.3, 217.4.1.9, 58) 202.2.5.3217.4.1.911858  Punctuation is a special tuple embedded in a data stream that indicates the progress of the stream; e.g. (*, *, *, 10:02:03am) ABC

11 11 Outline Stream query and existing evaluation approach (IOP) The out-of-order processing (OOP) alternative The OOP implementation in Gigascope Initial performance results

12 12 Gigascope Architecture Bulk of the processing performed at the RTS. Low-level queries read directly from the packet buffer.  Avoid copying the packet data to multiple queries. Low-level queries are small and light-weight  Selection, projection, partial aggregation.  Ensure timely processing, small cache footprint. NIC q1q2q3 … Q2 Q1 App RTS Circular buffer

13 13 Aggregation in Gigascope (IOP) Low-level aggregation  Maintains fix-sized, small hash table – output on collisions  Slow flush to smooth output traffic Flush the results of window n-1 gradually as processing input tuples in window n  However, can still creates bursts in order to maintain output order Query 2: SELECT tb, srcIP, destIP, count(*) FROM TCP GROUP BY ts/60 as tb, srcIP, destIP q1 Q1 select tb, srcIP, destIP, count(*) from TCP group by ts/60 as tb, srcIP, destIP SELECT tb, srcIP, destIP, sum (Cnt) FROM q1 GROUP BY tb, srcIP, destIP tbdestIPsrcIPcnt 80ba26 80ca78 80bc99 80ad64 (a, c, 128, 4870) tbdestIPsrcIPcnt 80ca78 80bc99 80ad64 tbdestIPsrcIPcnt 81ca1 80bc99 80ad64 (80, a, b, 26) (x, y, 32, 4880) tbdestIPsrcIPcnt 81ca1 80bc99 80ad64 (80, a, c, 78) tbdestIPsrcIPcnt 80ca78 80bc99 80ad64

14 14 Aggregation in Gigascope (OOP) Low-level aggregation  Does not need to maintain stream order  Allows a delay of k windows  Smooth output traffic better Heartbeat carries punctuation  Initially generated by the callback function of a timer in low-level queries  In high-level queries, each operator propagates heartbeat/punctuation Query 2: SELECT tb, srcIP, destIP, count(*) FROM TCP GROUP BY ts/60 as tb, srcIP, destIP q1 Q1 select tb, srcIP, destIP, count(*) from TCP group by ts/60 as tb, srcIP, destIP SELECT tb, srcIP, destIP, sum (Cnt) FROM q1 GROUP BY tb, srcIP, destIP tbdestIPsrcIPcnt 82nm16 81ca78 83bc99 80ad64 (a, b, 32, 5050) tbdestIPsrcIPcnt 82nm16 81ca78 83bc99 (80, d, a, 64) tbdestIPsrcIPcnt 82nm16 81ca78 83bc99 (83, c, b, 99) tbdestIPsrcIPcnt 82nm16 81ca78 84ba1 tbdestIPsrcIPcnt 82nm16 81ca78

15 15 Outline Stream query and existing evaluation approach (IOP) The out-of-order processing (OOP) alternative The OOP implementation in Gigascope Initial performance results

16 16 Performance Study – Traffic Shaping Data skew: 90% of data goes to 10% of groups Query 2: SELECT tb, srcIP, destIP, count(*) FROM TCP GROUP BY time/60 as tb, srcIP, destIP number of groups max data rate (kilo pkts/sec)

17 17 Performance Study - Memory Data rate: 110,000 pkts/sec #. of groups: 65536 Query 3: SELECT tb, srcIP, destIP, count(*) FROM A union B GROUP BY time/10 as tb, srcIP, destIP Time Skew (sec) Memory Usage (MB)

18 18 Conclusion and Future Work Verifies the benefits of OOP with high volume data Other operators such as join More performance numbers

19 19 Questions?


Download ppt "1 Out of Order Processing for Stream Query Evaluation Jin Li (Portland State Universtiy) Joint work with Theodore Johnson, Vladislav Shkapenyuk, David."

Similar presentations


Ads by Google