Download presentation
Presentation is loading. Please wait.
1
Panel on Stream Query Languages The Aurora View Stan Zdonik Brown University
2
Aurora Queries We do not have an SQL-like language. We have a GUI for dataflow diagrams. –Boxes = operators –Arrows = streams Rationale: –CSE is tough for thousands of queries. –Workflow is more natural. –Easier for users to extend what’s been done. –Best to understand implementation first.
3
Aurora Operators Very relational in spirit. –Filter, Map, Union, Join, Aggregate Adds Windows (everyone seems to agree). … with some wrinkles that we will get to. Adds a few operators. –Wsort –Resample
4
Simple Aggregation Aggregate Agg(init,incr,final) Window(on C, size = 2 offset = 1) GroupBy A,B...... 1, 1 1, 1 1, 2 1, 3 1, 2, 1 1, 1 1, 2, 1 1, 2, 2 ABCABC init:called when window opens incr: called for each new value final: called when window closes One or more open window per group. Size and Offset given in: #tuples, attribute interval, or time interval Generalized aggregate ABCABC
5
Query 1 Generate the stream of packets whose length is greater than twice the average packet length over the last 1 hour. Aggregate agg(init,incr,final) Window(on time, size = 1 hr, offset=1 tuple) Join Match ( length > 2 * avgLen and time=time2) (pID, length, time) Map f(t): (t.ID, t.length, t.time) State = (sum int, num int, endtime int)) init = {sum :=0, num :=0} incr (p) ={sum := sum+p.length; num:=num+1; endtime := p.time} final= emit (time2=endtime, avgLen=sum/num)
6
Query 2 Create an alert when more than 20 type 'A' squirrels are in Jennifer's backyard. Join Match (sID1=sID2) ST Filter region = JWY and type = “A” Aggregate agg (count) Window(on time, size=p sec, offset=p sec) (sID2, type) (sID1, region, time) Filter count > 20 Assume squirrels report every p sec
7
Query 3 Stream an event each time 3 different squirrels within a pairwise distance of 5 meters from each other chirp within 10 seconds of each other. Join Match (1.sID not= 2.sID and dist(1.loc, 2.loc) < 5 m) Window (on time, size = 5 sec, offset = 1 tuple) Join Match (dist(1.1.loc, 2.loc) < 5 m and dist(1.2.loc, 2.loc) < 5 m and 1.1.sID not= 2.sID and 1.2.sID not= 2.sID) Window ( on time, size = 5 sec, offset = 1 tuple) (sID, loc, time) 1 1 2 2
8
Super-bonus Query Create a log of flow information from a stream of packets. A flow (simple definition) from a source S to a destination D ends when no packet from S to D is seen for at least 2 minutes after the last packet from S to D. The next packet from S to D starts a new flow. The flow log contains the source, destination, count of packets, and total length of packets for each flow. Are you kidding!!!!
9
Actually, it’s Pretty Easy Aggregate Aggr = (init 1, incr 1, final 1 ) Window (size = 2 tuples, offset = 1) GroupBy (src, dest) SD Aurora State 1 = (flow#: int, first packet, second packet) ) init 1 = {flow# :=0;first:=null;second:=null} Incr 1 (p) ={first:=second, second:=p; if second.time-first.time > 2 then flow# := flow# + 1} final 1 = emit (second.src,second.dest, second.length, second.time, flow#) Aggregate Aggr = (init 2, incr 2, final 2 ) Window (on flow#, size = 1, offset = 1) GroupBy (src, dest) (pID, src, dest, length, time) State 2 = (count int, len int) init 2 = {count :=0; len := 0} incr 2 (p) ={count =: count + 1 len := len + p.length} final 2 = emit (src,dest,len, count) 2 min
10
… but this is not enough! What if it was really important that I know about the squirrels within 1 minute of the intrusion? => Queries need Quality-of-Service support. In fact, QoS is an integral part of the declarative spec. of the query.
11
…but it gets worse! Networks (e.g., mobile) can arbitrarily delay or lose tuples. => Operators can’t block arbitrarily waiting. A corollary of latency-based Qos.
12
…and worse! Tuples may not arrive at an operator in sort order. –The network can reorder them –Operators themselves can shuffle them. –Priority scheduling might force them out of order. This complicates things. –windows –aggregates
13
Our Solution Problem has to do with when to close windows. Tradeoff: Latency (QoS) vs. Accuracy Define additional parameters on windows that determine termination. –might result in lost data.
14
Our Solution (cont.) For disorder (early tuples) => Slack 1111111 time timeout interval (time) 112111 time timeout interval (#tuples) slack For blocking (late tuples) => Timeout
15
Status Now: – users supply values for timeout and slack. –As in examples, not always needed. Goal: –automatically insert / adjust these values based on QoS specs.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.