Ariel Rosenfeld.  Input: a stream of m integers i1, i2,..., im. (over 1,…,n)  Output: the number of distinct elements in the stream.  Example – count.

Slides:



Advertisements
Similar presentations
Fast Moment Estimation in Data Streams in Optimal Space Daniel Kane, Jelani Nelson, Ely Porat, David Woodruff Harvard MIT Bar-Ilan IBM.
Advertisements

Optimal Approximations of the Frequency Moments of Data Streams Piotr Indyk David Woodruff.
Quantum Lower Bound for the Collision Problem Scott Aaronson 1/10/2002 quant-ph/ I was born at the Big Bang. Cool! We have the same birthday.
The Future (and Past) of Quantum Lower Bounds by Polynomials Scott Aaronson UC Berkeley.
Optimal Space Lower Bounds for all Frequency Moments David Woodruff Based on SODA 04 paper.
Tight Lower Bounds for the Distinct Elements Problem David Woodruff MIT Joint work with Piotr Indyk.
Why Simple Hash Functions Work : Exploiting the Entropy in a Data Stream Michael Mitzenmacher Salil Vadhan And improvements with Kai-Min Chung.
Sublinear Algorithms … Lecture 23: April 20.
Dana Shapira Hash Tables
Analysis of Algorithms CS 477/677
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Analysis of Algorithms
Order Statistics Sorted
1 Deciding Primality is in P M. Agrawal, N. Kayal, N. Saxena Presentation by Adi Akavia.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
COM 5336 Cryptography Lecture 7a Primality Testing
1 CS 361 Lecture 5 Approximate Quantiles and Histograms 9 Oct 2002 Gurmeet Singh Manku
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 13 June 25, 2006
An Algorithm for Polytope Decomposition and Exact Computation of Multiple Integrals.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Deciding Primality is in P M. Agrawal, N. Kayal, N. Saxena Slides by Adi Akavia.
Lecture 33 CSE 331 Nov 20, Homeworks Submit HW 9 by 1:10PM HW 8 solutions at the end of the lecture.
November 10, 2009Theory of Computation Lecture 16: Computation on Strings I 1Diagonalization Another example: Let TOT be the set of all numbers p such.
Generating Continuous Random Variables some. Quasi-random numbers So far, we learned about pseudo-random sequences and a common method for generating.
Statistic estimation over data stream Slides modified from Minos Garofalakis ( yahoo! research) and S. Muthukrishnan (Rutgers University)
CS151 Complexity Theory Lecture 10 April 29, 2004.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Sorting Lower Bound1. 2 Comparison-Based Sorting (§ 4.4) Many sorting algorithms are comparison based. They sort by making comparisons between pairs of.
ICS220 – Data Structures and Algorithms Lecture 10 Dr. Ken Cosh.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
1 Fingerprinting techniques. 2 Is X equal to Y? = ? = ?
1 Two-point Sampling. 2 X,Y: discrete random variables defined over the same probability sample space. p(x,y)=Pr[{X=x}  {Y=y}]: the joint density function.
Section 4.3 Zeros of Polynomials. Approximate the Zeros.
1.2 Represent Functions as Rules and Tables EQ: How do I represent functions as rules and tables??
Data Stream Algorithms Ke Yi Hong Kong University of Science and Technology.
Can’t provide fast insertion/removal and fast lookup at the same time Vectors, Linked Lists, Stack, Queues, Deques 4 Data Structures - CSCI 102 Copyright.
The Fast Fourier Transform and Applications to Multiplication
© 2001 by Charles E. Leiserson Introduction to AlgorithmsDay 12 L8.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 8 Prof. Charles E. Leiserson.
Homework #2: Functions and Arrays By J. H. Wang Mar. 20, 2012.
October 5, 2005Copyright © by Erik D. Demaine and Charles E. LeisersonL7.1 Prof. Charles E. Leiserson L ECTURE 8 Hashing II Universal hashing Universality.
Umans Complexity Theory Lectures Lecture 7b: Randomization in Communication Complexity.
Asymptotics and Recurrence Equations Prepared by John Reif, Ph.D. Analysis of Algorithms.
ICS 353: Design and Analysis of Algorithms
Communication Complexity Guy Feigenblat Based on lecture by Dr. Ely Porat Some slides where adapted from various sources Complexity course Computer science.
Section 10.5 Let X be any random variable with (finite) mean  and (finite) variance  2. We shall assume X is a continuous type random variable with p.d.f.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
1 Chapter 8-1: Lower Bound of Comparison Sorts. 2 About this lecture Lower bound of any comparison sorting algorithm – applies to insertion sort, selection.
Discrete Methods in Mathematical Informatics Kunihiko Sadakane The University of Tokyo
Mining Data Streams (Part 1)
A Deterministic Approach to Stochastic Computation
The Stream Model Sliding Windows Counting 1’s
School of Computing Clemson University Fall, 2012
Algebra II Elements 5.8: Analyze Graphs of Polynomial Functions
Streaming & sampling.
Big-Oh and Execution Time: A Review
(2,4) Trees 11/15/2018 9:25 AM Sorting Lower Bound Sorting Lower Bound.
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Lecture 7: Dynamic sampling Dimension Reduction
Linear sketching with parities
(2,4) Trees 12/4/2018 1:20 PM Sorting Lower Bound Sorting Lower Bound.
Warm-up: Find the equation of a quadratic function in standard form that has a root of 2 + 3i and passes through the point (2, -27). Answer: f(x) = -3x2.
Sublinear Algorihms for Big Data
Function Notation “f of x” Input = x Output = f(x) = y.
CSCI B609: “Foundations of Data Science”
Time Complexity Lecture 14 Sec 10.4 Thu, Feb 22, 2007.
(2,4) Trees 2/28/2019 3:21 AM Sorting Lower Bound Sorting Lower Bound.
Lecture 6: Counting triangles Dynamic graphs & sampling
Time Complexity Lecture 15 Mon, Feb 27, 2006.
Lesson 3.3 Writing functions.
Presentation transcript:

Ariel Rosenfeld

 Input: a stream of m integers i1, i2,..., im. (over 1,…,n)  Output: the number of distinct elements in the stream.  Example – count the distinct number of IP addresses you encounter.

 Bit vector of size n (mark 1 when encountered)  Keeping all m integers and naively answer. ◦ Sort and count O(min{n,mlogm})

 a determinitic exact algorithm is impossible using o(n) bits.  A deterministic approximation algorithm for this problem providing a (1 ± 1/1000)- approximation using o(n) bits is impossible.

 Pick random hash function h : [n] → [0, 1]  Calculate z = min i ∈ stream h(i)  Output 1/z − 1

 Same ints gets same hash value.  We will show that the output is a good approximation.

 This is idealized for 2 reasons: 1.We don’t have perfect precision. 2. We need n bits at least to remember the randomness associated with every i. Lets ignore it for now…

 S = {j1,…jt} (unique elements in the stream)  h(j1),..., h(jt) = X1,..., Xt are independent variables from Unif[0, 1]  Z = min{Xi}

P= F(x) 1 1

(HW) We get a bounded variance.

 q increases -> better approximation Chebyshev

 We want a function that doesn't need n bits or more to represent.  So we will use k-wise independent hash functions (H) each can be represented using a small number of bits (log|H|). ◦ In lecture.

 An example - Set q > k a prime power, and define H poly,k to be the set of all degree ≤ (k − 1) polynomials in Fq[x].  H poly,k is a k-wise independent family.  Size: q k  Needs: k log q bits.