Download presentation
Presentation is loading. Please wait.
Published byVera Gunardi Modified over 5 years ago
1
CS240B, Winter 2017 Task 2.1: Using a syntax based on that of notes and the two references above, express a user-defined aggregate d_count to perform the exact count of distinct values in a window on a data stream. Your window aggregate could, e.g., be called as follows: SELECT col_name1, d_count(col_name2)OVER (ROWS PRECEDING) FROM my_stream;
2
d_count exact count: optimize inwindow
Increase the count only if this is not a duplicate; to avoid duplicates in invindow delete the old occurrences WINDOW AGGREGATE d_count(next Real) : Real { TABLE state(cnt Int); TABLE inwindow(wnext Real); INITIALIZE : {INSERT INTO state VALUES (1)} ITERATE : {/*the system inserts the new tuple in invindow at the end of iterate*/ UPDATE state SET cnt=cnt WHERE next NOT IN (SELECT wnext FROM inwindow); DELETE FROM inwindow WHERE Wnext=next; INSERT INTO RETURN SELECT cnt FROM state} EXPIRE: { /* this is processed before ITERATE for each expired tuple */ UPDATE state SET cnt= cnt-1} }
3
d_count exact count of distinct values in a window
If the value is added to inwindow at the beginning of the iterate, then we have a problem and we must have an additional table to memorize counts. WINDOW AGGREGATE d_count(next Real) : Real { TABLE dc(cnt Int); TABLE freq(Val Real, Cfreq Int); INITIALIZE : {INSERT INTO dc VALUES (1); INSERT INTO freq VALUES (next, 1)} ITERATE : {UPDATE dc SET cnt=cnt WHERE next NOT IN (SELECT Val FROM freq WHERE Cfreq>0); INSERT INTO freq VALUES (next, 1) WHERE next NOT IN (SELECT Val FROM freq); UPDATE freq SET Cfreq = Cfre WHERE next IN (SELECT Val FROM freq ); INSERT INTO RETURN SELECT cnt FROM cd} EXPIRE: { /* this is processed before ITERATE for each expired tuple */ UPDATE dc SET cnt= cnt /* decrease dcount if freq=1*/ WHERE SELECT Cfreq FOM freq WHERE Val= oldest() AND Cfreq=1; UPDATE freq SET Cfreq= Cfreq-1 WHERE Val= oldest() }
4
Ranking Name Marks Rank DenseRank Tom 8 1 1 Jeff 7 2 2 Mary 7 2 2
Ranking is done in conjunction with an order by specification. Suppose we are given a relation student-marks(Name, marks) which stores the marks obtained by each student. The following query gives the rank of each student. Select Name, rank ( ) (order by (marks) desc) as s-rank, dense_rank ( ) (order by (marks) desc) as d-rank from student-marks order by s-rank Name Marks Rank DenseRank Tom Jeff Mary Alex
5
Rank: unlimited preceding and Window
If the value remains the same as the old one, so does the rank Otherwise the rank of the new value is the current count WINDOW AGGREGATE dense_rank(marks Real) : Real { TABLE state(lastrank Int, lastval Real); TABLE rcount(tcount), TABLE inwindow(wnext Real); INITIALIZE : {INSERT INTO state VALUES (marks, 1); INSERT INTO RETURN Values (marks, 1)} ITERATE : {UPDATE rcount SET tcount=tcount+1; UPDATE state SET Lastrank= (SELECT tcount FROM rcount), lastval=marks WHERE lastval<>marks INSERT INTO RETURN lastval, lastrank FROM state} EXPIRE: {UPDATE rcount SET tcnt= tcnt-1; UPDATE state SET Lastrank=Lastrank-1} /* this will produce the correct ranking for new tuples by revising the last count*/ }
6
Dense Rank If the value remains the same as the old one, so does the rank Otherwise the rank of the new value is the current d-count WINDOW AGGREGATE d_rank(marks Real) : Real {TABLE state(lastrank Int, lastval Real); TABLE rcnt(tcnt Int); TABLE inwindow(wnext Real); INITIALIZE : {INSERT INTO rcnt VALUES (marks, 1 ); INSERT INTO RETURN Values (marks, 1)} ITERATE : {UPDATE rcnt SET cnt=cnt WHERE next NOT IN (SELECT wnext FROM inwindow); DELETE FROM inwindow WHERE Wnext=next; UPDATE state SET Lastrank= (SELECT tcount FROM rcount), lastval=marks WHERE lastval<>marks INSERT INTO RETURN lastval, lastrank FROM state} EXPIRE: {UPDATE rcount SET tcnt= tcnt-1; UPDATE state SET Lastrank=Lastrank-1}
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.