CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same.

Slides:



Advertisements
Similar presentations
MySQL Access Privilege System
Advertisements

Tuesday, September 28, MIS Lecture Notes1 Midterm Exam: Tuesday, September 28 Covers all course material through today’s lecture: –Homeworks.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Topological Sort and Hashing
© 2004 Goodrich, Tamassia Hash Tables1  
Query Execution Optimizing Performance. Resolving an SQL query Since our SQL queries are very high level, the query processor must do a lot of additional.
Streaming Algorithms for Robust, Real- Time Detection of DDoS Attacks S. Ganguly, M. Garofalakis, R. Rastogi, K. Sabnani Krishan Sabnani Bell Labs Research.
Midterm Review Lecture 14b. 14 Lectures So Far 1.Introduction 2.The Relational Model 3.Disks and Files 4.Relational Algebra 5.File Org, Indexes 6.Relational.
Hash Tables1 Part E Hash Tables  
Hash Tables1 Part E Hash Tables  
Estimating Set Expression Cardinalities over Data Streams Sumit Ganguly Minos Garofalakis Rajeev Rastogi Internet Management Research Department Bell Labs,
Tirgul 9 Hash Tables (continued) Reminder Examples.
Hash Tables1 Part E Hash Tables  
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
Tirgul 8 Hash Tables (continued) Reminder Examples.
ATLaS: A Complete Database Language for Streams Carlo Zaniolo, Haixun Wang Richard Luo,Jan-Nei Law et al. Documentation and software downloads:
Dictionaries 4/17/2017 3:23 PM Hash Tables  
Cloud and Big Data Summer School, Stockholm, Aug., 2015 Jeffrey D. Ullman.
Lecture 10: Class Review Dr John Levine Algorithms and Complexity March 13th 2006.
Hashing Dr. Yingwu Zhu.
Given an integer value stored in a variable, develop an algorithm to print the value to the display device. Integer Output Note that the value could be.
Hash Tables1   © 2010 Goodrich, Tamassia.
© 2004 Goodrich, Tamassia Hash Tables1  
Comparison of Tarry’s Algorithm and Awerbuch’s Algorithm CS 6/73201 Advanced Operating System Presentation by: Sanjitkumar Patel.
Liang, Introduction to Programming with C++, Second Edition, (c) 2010 Pearson Education, Inc. All rights reserved Chapter 6 Arrays.
Data Structures Using C++
Blocking, Monotonicity, and Turing Completeness in a Database Language for Sequences and Streams Yan-Nei Law, Haixun Wang, Carlo Zaniolo 12/06/2002.
Duplicate Detection in Click Streams(2005) SubtitleAhmed Metwally Divyakant Agrawal Amr El Abbadi Tian Wang.
Programming for Beginners Martin Nelson Elizabeth FitzGerald Lecture 9: Arrays; Revision Session.
SQL Reminder Jiankang Yuan Martin Lemke. SQL Reminder - SELECT SELECT column_name1, column_name2, … FROM table_name SELECT * FROM table_name.
1. Advanced SQL Functions Procedural Constructs Triggers.
Introduction toData structures and Algorithms
Hash Tables 1/28/2018 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Mining Data Streams (Part 1)
CprE 185: Intro to Problem Solving (using C)
Design Patterns for SSIS Performance
Case Statements and Functions
JavaScript: Functions.
Dictionaries 9/14/ :35 AM Hash Tables   4
Hash Tables 3/25/15 Presentation for use with the textbook Data Structures and Algorithms in Java, 6th edition, by M. T. Goodrich, R. Tamassia, and M.
Counting How Many Elements Computing “Moments”
Mining Data Streams (Part 2)
Lecture 4: CountSketch High Frequencies
Lecture 7: Dynamic sampling Dimension Reduction
Range-Efficient Counting of Distinct Elements
Sidharth Mishra Dr. T.Y. Lin CS 257 Section 1 MH 222 SJSU - Fall 2016
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CREATE, INSERT, SELECT.
Dictionaries 1/17/2019 7:55 AM Hash Tables   4
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
CSCI B609: “Foundations of Data Science”
Range-Efficient Computation of F0 over Massive Data Streams
ARRAYS 2 GCSE COMPUTER SCIENCE.
Query Functions.
Information Management
CS240B, Winter 2017 Task 2.1:  Using a syntax based on that of notes and the two references above, express a user-defined aggregate d_count to perform.
CS240B: Assignment1 Winter 2016.
2017, Fall Pusan National University Ki-Joune Li
UCLA, Fall CS240B Midterm Your Name: and your ID:
CS240B, Spring 2014 Task 2.2:  Using a syntax based on that of notes and reference 3 above, express a user-defined aggregate d_count to perform the exact.
CS240B Midterm: Winter 2017 Your Name: and your ID:
2018, Fall Pusan National University Ki-Joune Li
Approximation and Load Shedding Sampling Methods
CS210- Lecture 16 July 11, 2005 Agenda Maps and Dictionaries Map ADT
From adaptive to intelligent: query processing in SQL Server 2019
Chap 7. Advanced Control Statements in Java
I can determine the different sampling techniques used in real life.
Lu Tang , Qun Huang, Patrick P. C. Lee
Dictionaries and Hash Tables
From adaptive to intelligent:
Presentation transcript:

CS240B—Fall 2018 Task 4.1.  Express the Flajolet-Martin's distinct_count sketch as a user-defined aggregate mamed dcount_sketch, to be called in the same way as d_count. You can assume that you have available a function LmostbitH(X) that return K, where the K position contains a 1, whereas all the position to its right are zeros, for the value returned by a randomizing hash function H(X). We will design a window aggregate that e.g., could be called as follows: SELECT col_name1, dcount_sketch(col_name2)OVER (ROWS 99999 PRECEDING) FROM my_stream;

FM dcount_sketch WINDOW AGGREGATE dcount_sketch(next Real) : Real { TABLE bitarray (bitpos int, bitvalue int); TABLE inwindow(wnext Real); INITIALIZE : {insert into bitarray VALUES (1,0), …, (64, 0); update bitarray SET bitvalue=1 WHERE bitpos= LmostbitH(next )} ITERATE : {/*the system inserts the new tuple in invindow at the end of iterate*/ update bitarray SET bitvalue=1 WHERE bitpos= LmostbitH(next)}; DELETE FROM inwindow WHERE LmostbitH.wnext=LmostbitH(next) ; INSERT INTO RETURN SELECT 2** MAX(bitpos) /*the estimated count*/ FROM bitarray WHERE BITVALUE=1 %we could also delete weak bits---e.g. those that are less than max-8 %DELETE FROM inwindow WHERE bitpos< MAX(bitpos)-8} EXPIRE: { /*Expire is processed before iterate*/ UPDATE bitarray SET bitvalue=0 WHERE bitpos=(SELECT LmostbitH(wnext) FROM inwindow WHERE oldest(inwindow) )} }

Task 4.2:  Assume that  you have a stream of temperature readings temperature(Celsius Integer)
 that start everyday at time 00:01  and end at  time 23:59.  At the end of each day, we want to have 
10,000 temperature samples stored into a table   tenKsamples(Rowno integer, Celsius Integer).We do not know how many temperature readings are going to arrive every day, except that their number is significantly larger than 10,000. Please write a UDA that uses the reservoir algorithm to populate tenKsamples(Rowno , Celsius) with 10,000 random samples taken from 
temperature(Celsius Integer), which is then processed and reset to empty at midnight.   You can assume that the system support a function random(K), which given a  positive integer K returns a random integer between 1 and K.

AGGREGATE reservoir(next integer) : integer { TABLE tenKsamples(Rowno integer, Celsius Integer) external; TABLE cntuples (cnt Integer); INITIALIZE : {insert into cntuples values 1; insert into tenKsamples values (1, next); ITERATE : {update cntuples set cnt=cnt+1; Insert into tenKsamples select (cnt, Next) from cntuples where cnt<10000; UPDATE tenKsamples set Celsius=next, where Rowno= random(10000) and 1= select(random(cnt), from cntuples where cnt>10000) Terminate: {%we might want to return the count}