Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same.

Similar presentations


Presentation on theme: "CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same."— Presentation transcript:

1 CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same way as d_count. You can assume that you have available a function LmostbitH(X) that return K, where the K position contains a 1, whereas all the position to its right are zeros, for the value returned by a randomizing hash function H(X). We will design a window aggregate that e.g., could be called as follows: SELECT col_name1, dcount_sketch(col_name2)OVER (ROWS PRECEDING) FROM my_stream;

2 FM dcount_sketch WINDOW AGGREGATE dcount_sketch(next Real) : Real
{ TABLE bitarray (bitpos int, bitvalue int); TABLE inwindow(wnext Real); INITIALIZE : {insert into bitarray VALUES (1,0), …, (64, 0); update bitarray SET bitvalue=1 WHERE bitpos= LmostbitH(next )} ITERATE : {/*the system inserts the new tuple in invindow at the end of iterate*/ update bitarray SET bitvalue=1 WHERE bitpos= LmostbitH(next)}; DELETE FROM inwindow WHERE LmostbitH.wnext=LmostbitH(next) ; INSERT INTO RETURN SELECT 2** MAX(bitpos) /*the estimated count*/ FROM bitarray WHERE BITVALUE=1 %we could also delete weak bits---e.g. those that are less than max-8 %DELETE FROM inwindow WHERE bitpos< MAX(bitpos)-8} EXPIRE: { /*Expire is processed before iterate*/ UPDATE bitarray SET bitvalue= WHERE bitpos=(SELECT LmostbitH(wnext) FROM inwindow WHERE oldest(inwindow) )} }

3 Task 4.2:  Assume that  you have a stream of temperature readings temperature(Celsius Integer)
 that start everyday at time 00:01  and end at  time 23:59.  At the end of each day, we want to have 
10,000 temperature samples stored into a table   tenKsamples(Rowno integer, Celsius Integer).We do not know how many temperature readings are going to arrive every day, except that their number is significantly larger than 10,000. Please write a UDA that uses the reservoir algorithm to populate tenKsamples(Rowno , Celsius) with 10,000 random samples taken from 
temperature(Celsius Integer), which is then processed and reset to empty at midnight.   You can assume that the system support a function random(K), which given a  positive integer K returns a random integer between 1 and K.

4 AGGREGATE reservoir(next integer) : integer
{ TABLE tenKsamples(Rowno integer, Celsius Integer) external; TABLE cntuples (cnt Integer); INITIALIZE : {insert into cntuples values 1; insert into tenKsamples values (1, next); ITERATE : {update cntuples set cnt=cnt+1; Insert into tenKsamples select (cnt, Next) from cntuples where cnt<10000; UPDATE tenKsamples set Celsius=next, where Rowno= random(10000) and 1= select(random(cnt), from cntuples where cnt>10000) Terminate: {%we might want to return the count}


Download ppt "CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same."

Similar presentations


Ads by Google