Lecture 6: Counting triangles Dynamic graphs & sampling

Slides:



Advertisements
Similar presentations
Lower Bounds for Additive Spanners, Emulators, and More David P. Woodruff MIT and Tsinghua University To appear in FOCS, 2006.
Advertisements

Efficient Algorithms via Precision Sampling Robert Krauthgamer (Weizmann Institute) joint work with: Alexandr Andoni (Microsoft Research) Krzysztof Onak.
Size-estimation framework with applications to transitive closure and reachability Presented by Maxim Kalaev Edith Cohen AT&T Bell Labs 1996.
Efficient access to TIN Regular square grid TIN Efficient access to TIN Let q := (x, y) be a point. We want to estimate an elevation at a point q: 1. should.
COMP 553: Algorithmic Game Theory Fall 2014 Yang Cai Lecture 21.
2/14/13CMPS 3120 Computational Geometry1 CMPS 3120: Computational Geometry Spring 2013 Planar Subdivisions and Point Location Carola Wenk Based on: Computational.
Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
1 On Compressing Web Graphs Michael Mitzenmacher, Harvard Micah Adler, Univ. of Massachusetts.
CSE 421 Algorithms Richard Anderson Lecture 4. What does it mean for an algorithm to be efficient?
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
Sketching, Sampling and other Sublinear Algorithms: Streaming Alex Andoni (MSR SVC)
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
Streaming Algorithms Piotr Indyk MIT. Data Streams A data stream is a sequence of data that is too large to be stored in available memory Examples: –Network.
Sublinear Algorithms via Precision Sampling Alexandr Andoni (Microsoft Research) joint work with: Robert Krauthgamer (Weizmann Inst.) Krzysztof Onak (CMU)
CSE 421 Algorithms Richard Anderson Winter 2009 Lecture 5.
NOTE: To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image. Fast.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Algorithms for Big Data: Streaming and Sublinear Time Algorithms
5-2 Polynomials, Linear Factors, & Zeros
Stochastic Streams: Sample Complexity vs. Space Complexity
New Characterizations in Turnstile Streams with Applications
Approximating the MST Weight in Sublinear Time
A paper on Join Synopses for Approximate Query Answering
Lecture 22: Linearity Testing Sparse Fourier Transform
Algorithms and Networks
Approximate Matchings in Dynamic Graph Streams
Lecture 18: Uniformity Testing Monotonicity Testing
Sublinear Algorithmic Tools 3
Distributed Submodular Maximization in Massive Datasets
Lecture 11: Nearest Neighbor Search
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Lecture 10: Sketching S3: Nearest Neighbor Search
Lecture 4: CountSketch High Frequencies
Data Integration with Dependent Sources
Lecture 7: Dynamic sampling Dimension Reduction
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Lecture 16: Earth-Mover Distance
Intro to NP Completeness
Computational Geometry Capter:1-2.1
CIS 700: “algorithms for Big Data”
Linear sketching with parities
Richard Anderson Lecture 25 NP-Completeness
Matrix Martingales in Randomized Numerical Linear Algebra
Locally Decodable Codes from Lifting
The Curve Merger (Dvir & Widgerson, 2008)
CSE 589 Applied Algorithms Spring 1999
Sublinear Algorihms for Big Data
Linear sketching over
CSCI B609: “Foundations of Data Science”
Neuro-RAM Unit in Spiking Neural Networks with Applications
Linear sketching with parities
CSCI B609: “Foundations of Data Science”
Chapter 11 Limitations of Algorithm Power
Range-Efficient Computation of F0 over Massive Data Streams
Richard Anderson Winter 2009 Lecture 6
Algorithms and Networks
Lecture 15: Least Square Regression Metric Embeddings
Computer Vision Lecture 19: Object Recognition III
Trevor Brown DC 2338, Office hour M3-4pm
Metropolis Light Transit
Richard Anderson Lecture 5 Graph Theory
Switching Lemmas and Proof Complexity
Locality In Distributed Graph Algorithms
Presentation transcript:

Lecture 6: Counting triangles Dynamic graphs & sampling Left to the title, a presenter can insert his/her own image pertinent to the presentation.

Plan Problem 3: Counting triangles Streaming for dynamic graphs Scriber?

Streaming for Graphs Graph 𝐺 Stream: list of edges 𝑛 vertices 𝑚 edges Stream: list of edges (e.g., on a hard drive) ( , ) ( , ) ( , ) ( , ) ( , ) ( , )

Streaming for Graphs Small (work)space: Aim: to use ~𝑛 space E.g., for web can have 𝑛=1⋅ 10 9 nodes 𝑚=100⋅ 10 9 edges Small (work)space: Aim: to use ~𝑛 space or O(𝑛⋅ log 𝑛 ) Still much less than the size of the graph, 𝑚 (that could be up to 𝑛2) ≪𝑛 is usually not achievable ( , ) ( , ) ( , ) ( , ) ( , ) ( , )

Problems 1. connectivity 2. distances 3. Count # triangles… Exact in 𝑂(𝑛) space 2. distances 𝛼 (odd) approximation in 𝑂 𝑛 1+ 2 𝛼+1 3. Count # triangles…

Problem 3: triangle counting 𝑇= number of triangles Motivation: How often 2 friends of a person know each other? 𝐹= 𝑇 3 𝑣 deg 𝑣 2 𝐹∈[0,1] Can we compute 𝑣 deg 𝑣 2 in 𝑂 𝑛 space? Yes: just count the degrees, and compute 𝑇? Hard to distinguish 𝑇=0 vs 𝑇=1 in ≪𝑚 space Suppose we have a lower bound 𝑡≤𝑇 𝐹= 1 1+1+3+1+3

Triangle counting: Approach Define a vector 𝑥 with coordinate for each subset 𝑆 of 3 nodes 𝑥 𝑆 = how many edges among 𝑆 𝑇= # of coordinates which are 3 Remember frequencies: 𝐹 𝑝 = 𝑆 𝑥 𝑆 𝑝 Claim: 𝑇= 𝐹 0 −1.5 𝐹 1 +0.5 𝐹 2 2 1 3 …

Triangle counting: Approach Claim: 𝑇= 𝐹 0 −1.5 𝐹 1 +0.5 𝐹 2 if 𝑥 𝑆 =0, contributes 0 on the right if 𝑥 𝑆 =1, contributes 0 If 𝑥 𝑆 =2, contributes 0 if 𝑥 𝑆 =3, contributes 1 ! Why such a formula exist? Want a polynomial 𝑓 𝑥 𝑆 which is =0 on {0,1,2} and =1 on {3} Polynomial interpolation! Need a polynomial of degree 3, but 𝐹 0 provides a degree of freedom, hence degree=2 is sufficient

Triangle Counting: Algorithm 𝑇= 𝐹 0 −1.5 𝐹 1 +0.5 𝐹 2 Algorithm: Let 𝐹 0 , 𝐹 1 , 𝐹 2 be 1+𝛾 estimates General streaming! Edge (𝑖,𝑗) increases count for all 𝑥 𝑆 s.t. 𝑖,𝑗 ⊂𝑆 𝑇 = 𝐹 0 −1.5 𝐹 1 +0.5 𝐹 2 How do we set 𝛾? Not a 1+𝛾 multiplicative! Additive error: 𝑂 𝛾 𝐹 0 =𝑂(𝛾𝑚𝑛) Set 𝛾= 𝑂 𝑡 𝜖𝑚𝑛 for ±𝜖𝑡 additive error Total space: 𝑂 𝛾 −2 log 𝑛 =𝑂 𝑚𝑛 𝜖𝑡 2 log 𝑛

Triangle Counting: Algorithm 2 Let’s consider an even simpler algorithm: Pick a few random 𝑆 1 ,… 𝑆 𝑘 of 3 nodes (at start) Compute 𝑥 𝑆 𝑖 , for 𝑖∈[𝑘] while stream goes by Let 𝑐 be the number of 𝑖 s.t. 𝑥 𝑆 𝑖 =3 Estimate: 𝑅= 𝑀 𝑘 ⋅𝑐, where 𝑀= 𝑛 3 We can see: 𝐸 𝑅 =𝑇 𝑉𝑎𝑟 𝑅 ≤ 𝑀 𝑘 𝑇 Chebyshev: |𝑅−𝑇|≤𝑂 𝑀𝑇 𝑘

Triangle Counting: Algorithm 2+ |𝑅−𝑇|≤𝑂 𝑀𝑇 𝑘 Need 𝑘= 𝑂(1) 𝜖 2 ⋅ 𝑀 𝑡 where 𝑀=𝑂 𝑛 3 Better? Sample 𝑆 from set of smaller size 𝑀 ′ ≪𝑀! for which 𝑥 𝑆 ≥1 Then obtain: Chebyshev: |𝑅−𝑇|≤𝑂 𝑀 ′ 𝑇 𝑘 Need 𝑘= 𝑂(1) 𝜖 2 ⋅ 𝑀 ′ 𝑡 =𝑂 1 𝜖 2 ⋅ 𝑚𝑛 𝑡 since 𝑀′=𝑂 𝑚𝑛

Sampling in Graphs Setting 1: Setting 2: Suppose we just have positive updates Not linear Setting 2: General steaming: also negative updates… Why? Dynamic graphs

Dynamic Graphs Stream contains both insertions and deletions of edges Use 1: a log of updates to the graph [Ahn-Guha-McGregor’12]: “hyperlinks can be removed and tiresome friends can be de-friended” Use 2: graph is distributed over a number of computers Then want linear sketches Generally: dynamic streams  linear sketches Use 3: if time-efficient, then it’s a data structure!

Revisit Problem 1: Connectivity Can we do connectivity in dynamic graphs? Algorithm from the previous lecture? No… Theorem [AGM12]: can check s-t connectivity in dynamic graphs with 𝑂 𝑛⋅ log 5 𝑛 space (with 90% success probability) Approach: sampling in (dynamic) graphs

Sub-Problem: dynamic sampling Stream: general updates to a vector 𝑥∈ −1,0,1 𝑛 (will work for general 𝑥 too) Goal: Output 𝑖 with probability 𝑥 𝑖 𝑗 | 𝑥 𝑗 | Does “standard” sampling work? No: After putting 𝑥 𝑖 =1 for 𝑛/2 coordinates, add 1 more and delete the first 𝑛/2…

Dynamic Sampling Goal: output 𝑖 with probability 𝑥 𝑖 𝑗 | 𝑥 𝑗 | Let 𝐴={𝑖 s.t. 𝑥 𝑖 ≠0} Intuition: Suppose |𝐴|=10 How can we sample 𝑖 with non-zero 𝑥 𝑖 ? Use CountSketch! Each 𝑥 𝑖 ≠0 (𝑖∈𝐴) is a 1/10-heavy hitter Can recover all of them 𝑂 log 𝑛 space total Suppose 𝐴=𝑛/10 Downsample first: pick a random set 𝐷⊂[𝑛], of size 𝐷 ≈100 Focus on substream on 𝑖∈𝐷 only (ignore the rest) What’s |𝐴∩𝐷| ? In expectation =10 Use CountSketch on the downsampled stream… In general: prepare for all levels