Download presentation
Presentation is loading. Please wait.
Published byYuliana Irawan Modified over 6 years ago
1
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image pertinent to the presentation.
2
Administrivia, Plan Website moved: Piazza: sign-up! Plan:
sublinear.wikischolars.columbia.edu/main Piazza: sign-up! Plan: Median trick, Chernoff bound (from Tue) Distinct Elements Count Impossibility Results
3
Last Lecture Counting frequency Morris Algorithm: Initialize π=0
On increment, π=π+1 with prob. 1/ 2 π Estimator: 2 π β1 IP Frequency 3 Morris: πππ=π( π 2 ) Failure prob: 0.1 Morris+: Average of π=π(1/ π 2 ) πππ=π π 2 /π use Chebyshev for 1+π approx. Morris++: Median of π=π log 1 πΏ use Chernoff Failure prob: πΏ
4
βMedian trickβ Chernoff/Hoeffding bounds:
π 1 , π 2 ,β¦ π π are independent r.v. in {0,1} π=πΈ π π π πβ[0,1] Pr π π π βπ >ππ β€2 π β π 2 π/3 Algorithm π΄: output β correct range with 90% probability Algorithm π΄ β output β correct range with 1βπΏ probability Median trick: Repeat π΄ for π=π log 1 πΏ times Take median of the answers
5
Using Chernoff for Median trick
Chernoff: Pr π π π βπ >ππ β€2 π β π 2 π/3 Define π π = 1 iff π π‘β copy of π΄ is correct πΈ π π =0.9 (π΄ is correct with 90% prob.) π=0.9π New alg π΄ β is correct when β π π >0.5π Use Chernoff to bound: Pr β π π βπ >0.4π = Pr β π π βπ > π β€ π βπβ
0.9 π <πΏ for π=π log 1 πΏ
6
Problem: Distinct Elements
Streaming elements from [π] Approximate the number of elements with non-zero freq. Length of stream = π Space required? π(π) bits π(πβ
log π) bits IP Frequency 1 3 2 4 9 5 β¦ π
7
Algorithm for approximating DE
Main tool: hash function β: π β[0,1] β(π) random in [0,1] Algorithm [Flajolet-Martin 1985] Init π§=1 When see element π: π§=minβ‘{π§, β(π)} Estimator: 1 π§ β1 Where from? Will return laterβ¦
8
Analysis Let π = count of dist. elm. Claim 1: E π§ = 1 π+1 Proof:
Algorithm DE: Init: π§=1 when see element π: π§=minβ‘{π§,β π } Estimator: 1 π§ β1 Let π = count of dist. elm. Claim 1: E π§ = 1 π+1 Proof: π§ = minimum of π random numbers in [0,1] Pick another random number πβ[0,1] Whatβs the probability π<π§ ? 1) exactly π§ 2) probability it is smallest among π+1 reals: 1 π+1 5 7 2 β(5) β(7) β(2) ο»1/(π+1)
9
Analysis 2 Need variance tooβ¦ How do we get 1+π approximation though?
Algorithm DE: Init: π§=1 when see element π: π§=minβ‘{π§,β π } Estimator: 1 π§ β1 Need variance tooβ¦ Can prove var π§ β€2/ π 2 How do we get 1+π approximation though? We can take π§= 1 π π§ 1 + π§ 2 +β¦ π§ π for independent π§ 1 ,β¦ π§ π
10
Alternative: Bottom-k
Algorithm DE: Init: π§=1 when see element π: π§=minβ‘{π§,β π } Estimator: 1 π§ β1 Bottom-k alg. [BJKSβ02]: Init ( π§ 1 , π§ 2 ,β¦ π§ π )=1 Keep π smallest hashes seen π§ 1 β€ π§ 2 β€β¦ π§ π Estimator: π = π π§ π Proof: will prove Probability that π > 1+π π is 0.05 Probability that π < 1βπ π is 0.05 Overall only 0.1 probability π outside the correct range
11
Analysis for Bottom-k Compute: Pr π > 1+π π Suppose we see {1β¦d}
Algorithm Bottom-k: Init: π§ 1 ,β¦ π§ π =1 Keep π smallest hashes seen using π§ 1 ,β¦ π§ π Estimator: π = π π§ π Compute: Pr π > 1+π π Suppose we see {1β¦d} Define π π =1 iff β π < π 1+π π Then: π > 1+π π iff π π π >π We have: πΈ π π = π 1+π π πΈ π π π =πβ
πΈ π π = π 1+π var π π π =πβ
var π π β€πβ
πΈ π 1 2 β€ π 1+π β€π By Chebyshev: Pr β π π β π 1+π > 20π β€0.05 or: Pr β π π > π 1+π + 20π β€0.05 requires π>π Implied by β π π >π for π=Ξ©(1/ π 2 )
12
Hash functions in Streaming
We used β: π β[0,1] Issue 1: reals? Issue 2: how do we store it? Issue 1: Ok with: β: π β 0, 1 π , 2 π , 3 π ,β¦1 for πβ« π 3 Probability that πβ€π random numbers collide: at most 1/π
13
Issue 2: bounded randomness
Pairwise independent hash functions Definition: β: π β 1,2,β¦π s.t. for all πβ π and π,πβ[π] Pr β π =πβ§β π =π =1/ π 2 (i.e., like random on pairs) Such hash function enough: Variance cares only about pairs! We defined π π =1 iff β π <β¦ And computed π£ππ β π π =πΈ β π π 2 β πΈ β π π 2 =πΈ π 1 π 1 + π 1 π 2 +β¦ β πΈ β π π 2 same for fully random β and pairwise independent β
14
Pairwise-Independent: example
Definition: β: π β 0,1,β¦πβ1 s.t. for all πβ π and π,πβ{0,1,β¦πβ1} Pr β π =πβ§β π =π =1/ π 2 (A) construction: Suppose π is prime Pick π,πβ{0,1,β¦πβ1} β π =ππ+π (πππ π) Space: only π log π =π( log π ) bits Proof of correctness: β π =π and β π =π : system of 2 equations in 2 unknowns (π,π) Exactly one pair (π,π) satisfies it Probability it is chosen: exactly 1/ π 2
15
Impossibility Results
Relaxations: Approximation Randomization Need both for space βͺminβ‘{π,π}
16
Deterministic Exact Wonβt Work
Suppose algorithm π΄, estimator π
uses space π βͺπ,π We build the following stream: Let vector π₯β 0,1 π π in stream iff π₯ π =1 Run π΄ on it and let π be memory content 1 π₯= 1 π didnβt change β π₯ 1 =1 1 3 5 6 7 8 9 10 π π 2 π increased β π₯ 2 =0 π
17
Deterministic Exact Wonβt Work
Using π, can recover entire π₯ ! βπ= encoding of a string π₯ of length πβ But π has only π βͺπ bits! Can think π΄: 0,1 π β 0,1 π 1 π didnβt change β π₯ 1 =1 1 3 5 6 7 8 9 10 π π 2 π increased β π₯ 2 =0 π
18
Deterministic Exact Wonβt Work
Using π, can recover entire π₯ βπ = encoding of a string π₯ of length πβ But π has only π βͺπ bits! Can think π΄: 0,1 π β 0,1 π Must be injective Otherwise, suppose π΄ π₯ =π΄ π₯ β² =π The recovery implies π₯=π₯β² Hence π β₯π
19
Deterministic Approx Wonβt Too
Similar: use π΄ to compress π₯ from a code Code: set πβ 0,1 π s.t. π¦βπ₯ β₯π/6 for all distinct π₯,π¦βπ π β₯ 2 Ξ© π Use π΄ to encode an input π₯ into π For each π¦βπ check whether π₯=π¦: Append π¦ If π β²>1.01 π , then π₯β π¦ By injectivity of π΄ on π: 2 π β₯|π| or π =Ξ© π 1 π₯= 1 π¦= 2 4 7 8 9 10 11 1 3 5 6 7 8 9 10 πβ²=π΄(π₯+π¦) π β²=π
(πβ²) π=π΄(π₯) π =π
(π) π=π΄(π₯)
20
Concluding Remarks Median trick + Chernoff Distinct Elements
Can also store hashes β(π) approximately (store number of leading zeros) π(ππππππ π) bit per hash value Plus other bells and whisles HyperLogLog Impossibility results Can also prove randomized, exact wonβt work
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.