Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University
The Type of Problems we Consider Let f be a “natural” function defined on graphs. The domain of f : vertices/pairs of vertices/etc. The goal: Estimate the average value of f. The means: (1) Queries to f ; (2) Queries to the graph: a. Neighbor queries (who is j’th neighbor of v?) b. Vertex-pair queries (are u and v neighbors?) Questions of interest: (1) Can we do this much more efficiently as compared to general functions? (2) How do different queries influence the complexity?
The Particular Problems we Study Problem1: Estimating the average degree in a graph (first considered by Feige (STOC04)) Problem2: Estimating the average distance to a given vertex in a graph Problem3: Estimating the average distance between pairs of vertices in a graph deg(v) dist v (u) dist(u,v)
Our Results Problem1: Estimating average degree in a (simple) graph G=(V,E), |V|=n, |E|=m, d G =2m/n deg(v) UB: Can obtain (1+ )-approximation in time n 1/2 poly(log n / ) by using neighbor queries (only). LB: A (1+)-approximation requires ((n / ) 1/2 ) queries (when allowed all types of queries). Compare to Feige: (2+ )-approx in similar complexity using degree queries only (queries to f ); (2-o(1))-approx requires (n) degree queries. Note: Can improve when avg degree is high: (n/d G ) 1/2 instead of n 1/2 with matching lower bound.
Our Results Cont’ Problem2: Estimating the average distance to a vertex Problem3: Estimating the average distance between vertices G=(V,E), |V|=n, |E|=m, d G = avg distance UB: Can obtain (1+ )-approx in time (n/d G ) 1/2 poly(log n / ) by using distance queries (only). LB: A (1+)-approximation requires ( ( n /( d G ) ) 1/2 ) queries when allowed all types of queries. If allow only neighbor queries: (m) necessary for constant approximation. Note (non sublinear): Can obtain (1+)-approx using O(m n 1/2 poly(1/)) neighbor queries only. When graph not too dense ( m=o(n 3/2 ) ) save as compared to computing exact/approx for all pairs.
Our Results: Summary Common to problems: Can obtain (1+)-approximation in sublinear-time. Dependence on n: n 1/2 ; dependence on (1/ ): polynomial. Running time improves as avg increases. Matching lower bounds in terms of dependence on n/avg Differences between problems: In case of avg degree: neighbor queries help (to improve quality of estimate), can do without degree queries (to f) In case of avg distance: neighbor queries do not help, cannot do without distance queries (to f)
Estimating Average Degree G=(V,E), |V|=n, |E|=m, d G =2m/n ≥ 1 deg(v) Idea for algorithm inspired by [Kaufman, Krivelevich, R]’s procedure for selecting an edge “almost uniformly ” Ingredient 1: consider partition of all graph vertices into buckets: In bucket B i vertices v s.t. (1+) i-1 < deg(v) ≤ (1+) i ( = /8, O ( (log n)/ ) buckets ) Suppose for every B i had estimate b i s.t. b i = |Bi|(1 /8). (1/n) i b i (1+) i = (1 ) d G (*) How to get b i ? By sampling. Difficulty: if |B i | is small ( > n 1/2 ). Ingredient 2: ignore small B i ‘s, take sum in (*) over sufficiently large b i ‘s (B i ‘s). ( For large B i ‘s get b i by sampling n 1/2 vertices )
Estimating Average Degree Cont’ G=(V,E), |V|=n, |E|=m, d G =2m/n ≥ 1 Reminders: B i = { v : (1+) i-1 ( ( n) 1/2 / 4(log n)/ ) ) Consider sum: (1/n) i: B i large b i (1+) i (**) i.e., small overestimate. By how much can we underestimate? clearly: ≤ (1+ ) d G Sum of degrees = 2 num of edges. Large buckets Small buckets Counted twice Counted once Not Counted Total not counted ≤ ( n)/2. All others, at least once. Hence, sum in (**) ≥ d G / (2 + )
Estimating Average Degree Cont’ G=(V,E), |V|=n, |E|=m, d G =2m/n ≥ 1 Recap: Can get factor-(2+ ) approx using n 1/2 poly((log n)/ ) degree queries only (alternative to [Feige]). Recall [Feige]: cannot do (much) better using degree queries only. Large buckets Small buckets Counted twice Counted once Not Counted (few) Ingredient 3: Estimate num of edges counted once and compensate for them.
Estimating Average Degree Cont’ G=(V,E), |V|=n, |E|=m, d G =2m/n ≥ 1 Ingredient 3: Estimate num of edges counted once and compensate. Large buckets Small buckets BiBi For each large B i estimate num of edges btwn B i and small buckets. Implementation: Let S i be vertices sampled in B i. For each uS i select random neighbor v. Let i be frac of v’s in small buckets. Estimate is: (1/n) i: B i large b i (1+ i ) (1+) i W.h.p estimate is (1 ) d G Large buckets Small buckets BiBi
Estimating Average Degree Summary G=(V,E), |V|=n, |E|=m, d G =2m/n (≥ 1) Average Degree Approximation Algorithm: 1.Unif. and indep. select K=O ( n 1/2 poly((log n)/ ) ) vertices. Let S be (multi-)set of selected vertices. 2.For i=0,…,log 1+ n, let S i =S B i. 3.Let L = { i : |S i |/K ≥ ( /(8n)) 1/2 / (log 1+ n +1) } 4.For each i L, every u S i, select random neighbor v of u. Let i be frac. of rand. neighbors in small buckets. 5.Output (1/K) iL |S i | (1+ i ) (1+ ) i Note 1: If have l.b., d G ≥, then : O((n/ ) 1/2 poly((log n)/ )) (sufficient to get good b i for |B i | > (( n) 1/2 / 4(log n)/ )) ) Note 2: Can get same complexity without knowing l.b. (Can search for l.b. because alg does never overestimates)
Estimating Average Degree L.B. G=(V,E), |V|=n, |E|=m, d G =2m/n (≥ 1) Thm. For any n, d [2, o(n) ], ((1/(dn)), o(n/d) ) distinguishing between avg. deg. d and avg degree (1+ )d requires ( ( n/( d) ) 1/2 ) queries (all types allowed) For k ( d n) 1/2 consider (random labelings of) graphs: d G 1 = d k vertices clique G 2 : n-k vertices d-regular d G 2 = (1+ ) d To distinguish must select vertex in small component n-k vertices d-regular G 1 : k vertices d-regular
Estimating Average Distance Problem2: Estimating the average distance to a vertex Problem3: Estimating the average distance between vertices G=(V,E), |V|=n, |E|=m, d G = avg distance UB: Can obtain (1+ )-approx in time (n/d G ) 1/2 poly(log n / ) by using distance queries (only). LB: A (1+)-approximation requires ( ( n /( d G ) ) 1/2 ) queries when allowed all types of queries. If allow only neighbor queries: (m) necessary for constant approximation. Note (non sublinear): Can obtain (1+)-approx using O(m n 1/2 poly(1/)) neighbor queries only. When graph not too dense ( m=o(n 3/2 ) ) save as compared to computing exact/approx for all pairs.
Estimating Average Distance Algorithms: For both problems simply take sample of vertices / pairs of vertices and compute average over sample. Analysis: Uses Chebyshev’s inequality – prove small variance. Sketch for Problem 3 (avg. dist. btwn pairs): Let d max be max distance btwn pairs; For i=0,…,d max let p i be fraction of pairs at distance i; Let be distributed according to p i : E[ ]=d G. We show that E[ 2 ] = O(n 1/2 E[ ] 2 ). Core of proof: showing that E[ ]= (d 2 max /n). Reason: if some pair are far, then many pairs are far. v0v0 v1v1 vdvd... vivi v d -i w For i=1,…, d/3, d=d max, w dist(w,v i )+dist(w,v d-i )≥d/3
Estimating Average Distance L.B. (for Problem 2 – avg. dist. to vertex s) G=(V,E), |V|=n, |E|=m, d G = avg dist to s Thm. For any n, d [2, o(n) ], ((1/(dn)), o(n/d) ) distinguishing between avg. dist. d and avg dist (1+ )d requires ( ( n/( d) ) 1/2 ) queries (all types allowed) Construction for >1/4 : For k ( 2(1+ )(d n) ) 1/2 and t ((1+ ) 1/2 - 1/2 )(2 d n) 1/2 consider (rand labelings of) graphs: d G 1 = (1+ ) d G 2 (two-sided broom graph): dG2 < ddG2 < d To distinguish must select/reach right-side edge G 1 (broom graph): vkvk s... v1v1 w1w1 w2w2 w n-k-1.. s... v1v1 vtvt w1w1 w2w2 w n-k-1.. v t+1 v t+2 vkvk
Summary Study estimation of average value of “natural” functions on graphs: average degree and average distance. Give sublinear algorithms (dependence on n is n 1/2 ) and roughly matching lower bounds. Different problems exhibit different behavior in terms of the “power” of the queries: queries to the estimated functions vs. queries to the structure of the graph.