Download presentation
Presentation is loading. Please wait.
1
Lecture 7: Dynamic sampling Dimension Reduction
Left to the title, a presenter can insert his/her own image pertinent to the presentation.
2
Plan Admin: Plan: Scriber? PSet 2 released later today, due next Wed
Alex office hours: Tue 2:30-4:30 Plan: Dynamic streaming graph algorithms S2: Dimension Reduction & Sketching Scriber?
3
Sub-Problem: dynamic sampling
Stream: general updates to a vector π₯β β1,0,1 π Goal: Output π with probability π₯ π π | π₯ π |
4
Dynamic Sampling Goal: output π with probability π₯ π π | π₯ π |
Let π·={π s.t. π₯ π β 0} Intuition: Suppose |π·|=10 How can we sample π with π₯ π β 0? Each π₯ π β 0 is a 1/10-heavy hitter Use CountSketch β recover all of them π log π space total Suppose π· =10 π Downsample: pick a random set πΌβ[π] s.t. Pr πβπΌ = 1 π Focus on substream on πβπΌ only (ignore the rest) Whatβs |π·β©πΌ| ? In expectation = 10 Use CountSketch on the downsampled stream πΌβ¦ In general: prepare for all levels
5
Basic Sketch Hash function π: π β[π] Let β π = #tail zeros in π(π)
Pr β π =π = 2 βπβ1 for π=0..πΏβ1 and πΏ= log 2 π Partition stream into substreams πΌ 0 , πΌ 1 ,β¦ πΌ πΏ Substream πΌ π focuses on elements with β π =π πΈ |π·β© πΌ π | =|π·|β
2 βπβ1 Sketch: for each π=0,β¦πΏ, Store πΆ π π : CountSketch for π=0.01 Store π·πΆ π : distinct count sketch for approx=1.1 πΉ 2 would be sufficient here! Both for success probability 1β1/π
6
Estimation Find a substream πΌ π s.t. π· πΆ π output β[1,20]
Algorithm DynSampleBasic: Initialize: hash function π: π β[π] β(π)= # tail zeros in π(π) CountSketch sketches πΆ π π , πβ[πΏ] DistinctCount sketches π·πΆ π , πβ[πΏ] Process(int π, real πΏ π ): Let π=β(π) Add (π, πΏ π ) to πΆ π π and π· πΆ π Estimator: Let j be s.t. π· πΆ π β[1,20] If no such j, FAIL π= random heavy hitter from πΆ π π Return π Find a substream πΌ π s.t. π· πΆ π output β[1,20] If no such stream, then FAIL Recover all πβ πΌ π with π₯ π β 0 (using πΆ π π ) Pick any of them at random
7
Analysis If π· <10 Suppose π·β₯10 πΈ π·β© πΌ π = π· β
2 βπβ1 β 5,10
Algorithm DynSampleBasic: Initialize: hash function π: π β[π] β(π)= # tail zeros in π(π) CountSketch sketches πΆ π π , πβ[πΏ] DistinctCount sketches π·πΆ π , πβ[πΏ] Process(int π, real πΏ π ): Let π=β(π) Add (π, πΏ π ) to πΆ π π and π· πΆ π Estimator: Let j be s.t. π· πΆ π β[1,20] If no such j, FAIL π= random heavy hitter from πΆ π π Return π If π· <10 then π·β© πΌ π β[1,10] for some π Suppose π·β₯10 Let π be such that π· β 10β
2 π ,10β
2 π+1 πΈ π·β© πΌ π = π· β
2 βπβ β 5,10 πππ π·β© πΌ π β€ π· β
2 βπβ1 β€10 Chebyshev: π·β© πΌ π deviates from expectation by 4> 1.5πππ with probability at most <0.7 Ie., probability of FAIL is at most 0.7
8
Analysis (cont) Algorithm DynSampleBasic: Initialize: hash function π: π β[π] β(π)= # tail zeros in π(π) CountSketch sketches πΆ π π , πβ[πΏ] DistinctCount sketches π·πΆ π , πβ[πΏ] Process(int π, real πΏ π ): Let π=β(π) Add (π, πΏ π ) to πΆ π π and π· πΆ π Estimator: Let j be s.t. π· πΆ π β[1,20] If no such j, FAIL π= random heavy hitter from πΆ π π Return π Let π with π· πΆ π β[1,20] All heavy hitters = π·β© πΌ π πΆ π π will recover a heavy hitter, i.e., πβπ·β© πΌ π By symmetry, once we output some π, it is random over π· Randomness? We just used Chebyshev β pairwise π is OK !
9
Dynamic Sampling: overall
DynSampleBasic guarantee: FAIL: with probability β€0.7 Otherwise, output a random πβπ· Modulo a negligible probability of πΆπ/π·πΆ failing Reduce FAIL probability? DynSample-Full: Take π=π( log π ) independent DynSampleBasic Will not FAIL in at least one with probability at least 1β 0.7 π β₯1β1/π Space: π( log 4 π ) words for: π=π( log π ) repetitions π( log π ) substreams π( log 2 π ) for each πΆ π π ,π· πΆ π
10
Back to Dynamic Graphs Graph πΊ with edges inserted/deleted
Define node-edge incidence vectors: For node π£, we have vector: π₯ π£ β π
π where π= π 2 For π>π£: π₯ π£ (π£,π)=+1 if edge (π£,π) exists For π<π£: π₯ π£ (π,π£)=β1 if edge (π,π£) exists Idea: Use Dynamic-Sample-Full to sample an edge from each vertex π£ Collapse edges How to iterate? Property: For a set π of nodes Consider: π£βπ π₯ π£ Claim: has non-zero in coordinate (π,π) iff edge (π,π) crosses from π to outside (i.e., πβ© π,π =1) Sketch enough for: for any set π, can sample an edge from π ! (π,π) π +1 π -1
11
Dynamic Connectivity Sketching algorithm: Check connectivity:
π₯ π£ β π
π where π= π 2 for π>π£: π₯ π£ (π£,π)=+1 if β(π£,π) for π<π£: π₯ π£ (π,π£)=β1 if β(π,π£) Sketching algorithm: Dynamic-Sample-Full for each π₯ π£ Check connectivity: Sample an edge from each node π£ Contract all sampled edges β partitioned the graph into a bunch of components π 1 ,β¦ π π (each is connected) Iterate on the components π 1 ,β¦ π π How many iterations? π( log π ) β each time we reduce the number of components by a factor β₯2 Issue: iterations not independent! Can use a fresh Dynamic-Sampling-Full for each of the π( log π ) iterations
12
A little history [Ahn-Guha-McGregorβ12]: the above streaming algorithm
Overall π(πβ
log 4 π) space [Kapron-King-Mountjoyβ13]: Data structure for maintaining graph connectivity under edge inserts/deletes First algorithm with log π π 1 time for update/connectivity ! Open since β80s
13
Section 2: Dimension Reduction & Sketching
14
Why? Application: Nearest Neighbor Search in high dimensions
Preprocess: a set π· of points Query: given a query point π, report a point πβπ· with the smallest distance to π π π
15
Motivation Generic setup: Application areas: Distance can be:
Points model objects (e.g. images) Distance models (dis)similarity measure Application areas: machine learning: k-NN rule speech/image/video/music recognition, vector quantization, bioinformatics, etcβ¦ Distance can be: Euclidean, Hamming 000000 011100 010100 000100 011111 000000 001100 000100 110100 111111 π π
16
Low-dimensional: easy
Compute Voronoi diagram Given query π, perform point location Performance: Space: π(π) Query time: π( log π )
17
High-dimensional case
All exact algorithms degrade rapidly with the dimension π Algorithm Query time Space Full indexing π(log πβ
π) π π(π) (Voronoi diagram size) No indexing β linear scan π(πβ
π)
18
Dimension Reduction Reduce high dimension?!
βflattenβ dimension π into dimension πβͺπ Not possible in general: packing bound But can if: for a fixed subset of β π
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.