UCLA, Fall 2018. CS240B Midterm Your Name: and your ID: Problem Max score Score Problem A 40% Problem B 20% Problem C Problem D Total 100% Extra Credit: 5% Final score:
Problem A [20%+20%=40%]. Luggage items at airports are often delivered by conveyor belts, where sensors are used to monitor the presence/passage of each item through a checkpoint. The resulting data stream has the following form: sensed(Time, CPS, LID). This denotes that at a certain Time (in seconds) the sensor at checkpoint CPS had detected a luggage item with id LID (assume that we have a reading every second). The airport is interested in detecting the occurrences of the following two problems, along with the CPS of the checkpoint where they were detected: A1. A luggage item that seems to be stuck: i.e. has spent more than 5 minutes spent at the same CPS. Return the LID, the CPS and the current time A2. A luggage item has been on the conveyor belt for at least 6 minutes, during which the item has gone through the same CPS at least 2 times (an indication that the item is wondering in aimless cycles—but the cycles do not need to be consecutive or identical). Write an SQL-TS query to detect situation A1 and an SQL-TS query for situation A2 above. In both cases return the luggage LID, the CPS for the problem checkpoint, and the current time at the expiration of the 6 minutes. (PS: you can write SQL-MR if you prefer) Select D.Time, D.CPS, D.LID From sensed Partitioned by LID ordered by Time AS (D+) Where D.CPS=previous(D.CPS) AND D.Time> first(D.Time)+300 Select D.Time, D.CPS, D.LID From sensed Partitioned by LID ordered by Time AS (S1 O1+ E1 X* S2 O2+ E2 W* Z) Where O1.CPS <>S1.CPS AND E1.CPS=S1.CPS and O2.CPS <>S2.CPS AND E2.CPS=S2.CPS and Z.Time > S.Time+360
Problem B [20 %]: Define a UDA that detects the occurrence of condition 1 from Problem A. Show the definition of the UDA using the notation used in the homeworks. The UDA will be called with a “partition by LID, unlimited preceding” clause. Window stuck(TimeIn timestamp, CPSin char) : (Stime timestamp, Sensor char) { Table previous(STime timestamp, SCPS char); initialize: {insert into previous value(Time, CPS)} Iterate: { update previous set STime= Timein, SCPS= CPSin) where CPSin<>SCPS; insert into return (STime, SCPS) select from previous where CPSin=SCPS and Timein = STime+301 % i.e., 5 minutes +1sec }
We will break to optimize memory and the response time will suffer. Problem C. 20% Source2 B Source3 C Sink A Source1 The processing speeds of these operators are: A: 100 tuple/sec B: 100 tuples/sec C: 100 tuples/sec U: 200 tuples/sec Layout: ||| U ||| We have timestamped tuples. Also we assume that the buffers feeding A, B, and C contain respectively 1000 tuples, 400 tuples and 600 tuples. Also B and C deliver one output tuple for each input tuple, while A is a selection that eliminates about 50% of its input tuples. All tuples have the same size. C1: What is the time T required to process all tuples? In which order should the operators be scheduled to minimize the memory usage? (1) Illustrate your answer with a memory diagram and (2) estimate the resulting average number of tuples kept in memory during time T. You can also assume that the idle waiting caused by U is negligible. Time A B+C+U 10s 15s 2000 1000 25s C1: B and C have the same memory release rate and they must operate together because of the union. Thus B and C process their 600+400 =1000 tuples in 10 sec. along with U that takes 5 for a total of 15 s. So A+B+U releases 1000 tuple of memory in 15 secds. 100/15= 66.6 tples/sec. A releases 100 tuples/sec and should go first. It will take 10 secs. The total area of 1,000x10+ 1000x10/2 + 1000x15/2= 22,500. The average memory usage over the 25 seconds is 22,500/25= 900 tuples Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C
We will break to optimize memory and the response time will suffer. I i Problem D. 20% The processing speeds of these operators are: A: 100 tuples/sec, B: 100 tuples/sec, C: 25 tuple/sec Sink A Source B C B1 B2 Layout II: Problem D: Consider Layout II and assume that the buffers feeding A contains 1000 tuples. A is a selection operator that discards 50% of its input tuples. However, for each tuple in its input, B delivers one tuple to its output, and so does C. D1. We wan to minimize the average latency measured in the number of missing query answers. Should we then partition this network and, if so, how? Illustrate your answer with a diagram and estimate the resulting average latency. Time A+B C 15s 20s 1000 500 35s D1: to process 1000 tuples, A takes 10 sec. Then to output 500 tuples B takes 5 sec and C takes 20. So the total output a is is 1000 answers in 35 sec. Unbroken it delivers at the rate of 1000/35 tuples per sec. If we cut A+B delivers 500 answers in 10+5 sec. i.e. 1000/30 faster we should break it Then to deliver 500 tuples, C takes 500/25= 20s. Avg Latency, i.e. the time taken by a tuple on the average. Area/1000 Area= 500x15+500x15/2+ 500x20/2= 500x32.5 Avg. Latency= 500X32.5/1000= 16.25 s Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C
range 300 preceding) as SNs ) where SNs=1. I i Extra Credit. 5% Rather than defining a new aggregate, express problem A1 using standard SQL OLAP functions %Idea: a window of size 300 seconds, in which there is only one value of CPS select, LID, Time from ( select LID, Time, count(DISTINCT CPS) over (partition by LID order by Time, range 300 preceding) as SNs ) where SNs=1. Answer: If there is no idle-waiting the time is all spent in processing tuples. Thus if N1=N1=0.5 N , these N tuples will be processed in time 0.5*N/1000+ 0.5*N/1000+ 0.5*N/500+ 0.5*N/1000= 2.5*N/1000. Then say that C also takes N/400 seconds to process N tuples. Now say that N2 =0 , so the union group process the N tuples in source1 in N/1000+N/1000= N/500 much faster. We will break to optimize memory and the response time will suffer. Now say that N1 =0 , so the union group process the N tuples in source2 in N/500+N/1000= 3*N/1000 1/333 . Then C
Alternative serial Schedules: Memory Used B Time A B+C+U 10s 15s 2000 C 1000 10s 25s