Download presentation
Presentation is loading. Please wait.
Published byJeremy Stone Modified over 9 years ago
1
Complexity and Efficient Algorithms Group / Department of Computer Science Approximating Structural Properties of Graphs by Random Walks Christian Sohler
2
Complexity and Efficient Algorithms Group / Department of Computer Science 2 Very Large Networks Examples Social networks The human brain Crystals Chip design Size 10 9 – 10 23 vertices Petabytes of additional information possible
3
Complexity and Efficient Algorithms Group / Department of Computer Science 3 Very Large Networks Classical graph problems Connectivity MinCut, MaxCut Graphclustering Graphisomorphism Difficulties Graph does not fit into main memory
4
Complexity and Efficient Algorithms Group / Department of Computer Science 4 Classification of Very Large Networks – A Vision Exampe questions Is a country a democracy or a totalitarian country? Is a patient schizophrenic? Is software malicious? Formalization Given a set of graphs with class labels (training set) Find a classifier for new graphs
5
Complexity and Efficient Algorithms Group / Department of Computer Science 5 Classification of Very Large Networks – A Vision A typical szenario Hundreds or thousands of graphs Each graph is extremly large Graphs are sparse A possible approach Describe graphs by features (graph properties) Apply classical learning algorithms The challenge Computation of ten thousands of features for graphs with billions of vertices (12,3,-5,10,0,0,…,20,3)
6
Complexity and Efficient Algorithms Group / Department of Computer Science 6 Classification of Very Large Networks – A Sampling Approach Random Sampling Compute a graph property approximately by random sampling Informal Question What can we learn from the local structure of a sparse graph about its global properties? Sampling from Graphs How can we sample a graph?
7
Complexity and Efficient Algorithms Group / Department of Computer Science 7 Classification of Very Large Networks – A Sampling Approach Examples of different sampling strategies 1.Sample set S of s vertices and look at all edges within S (the subgraph G[S] induced by S) 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them Many more possibilities… Question Which is the right sampling strategy for my learning problem?
8
Complexity and Efficient Algorithms Group / Department of Computer Science 8 Classification of Very Large Networks – A Sampling Approach Examples of different sampling strategies 1.Sample set S of s vertices and look at all edges within S (the subgraph G[S] induced by S) 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them Many more possibilities… Question Which is the right sampling strategy for my learning problem? Depends on the problem…
9
Complexity and Efficient Algorithms Group / Department of Computer Science 9 Classification of Very Large Networks – A Sampling Approach Question 1 Assume you have some classification task that involves city maps. Which of our four sampling methods is your method of choice? Possible Answers 1.Sample set S of s vertices and look at all edges within S 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them
10
Complexity and Efficient Algorithms Group / Department of Computer Science 10 Classification of Very Large Networks – A Sampling Approach Question 2 Assume you have some classification task that involves social networks. Which of our four sampling methods is your method of choice? Possible Answers 1.Sample set S of s vertices and look at all edges within S 2.Sample set S of s edges and look at their graph 3.Sample a set S of s vertices and perform a BFS from each of them 4.Sample a set S of s vertices and perform a random walk from each of them
11
Complexity and Efficient Algorithms Group / Department of Computer Science 11 First Wrap-Up Motivation Some classification problems involve sets of huge graphs No efficient algorithm for some fundamental graph problems known Sampling approach We would like to pick small samples from the graph(s) and use them for graph classification Challenge There are many different sampling procedures We need to understand which is the right one for which problem
12
Complexity and Efficient Algorithms Group / Department of Computer Science 12 Sampling from Very Large Networks Property Testing [Rubinfeld, Sudan, 1996, Goldreich, Goldwasser, Ron, 1998] Formal framework to study sampling algorithms for very large networks Relaxation of „Standard Decision Problems“ Want to distinguish whether input graph G has a property or is far away from it If G neither has the property nor is far away from it the algorithm may give an arbitrary answer Randomized algorithms with bounded (worst case) error probability Only looks at small part of the graph Different graph models Dense graphs, bounded degree graphs, directed graphs
13
Complexity and Efficient Algorithms Group / Department of Computer Science 13 Property Testing in Bounded Degree Graphs Bounded degree graphs [Goldreich, Ron, 2002] Undirected Graph G=(V,E) Maximum degree bounded by D D constant Oracle access V={1,…,n} n is known to the algorithm Query(i,j) returns j-th neighbor of vertex i or a symbol that indicates that this neighbor does not exist 12 3 4 5
14
Complexity and Efficient Algorithms Group / Department of Computer Science 14 Property Testing in Bounded Degree Graphs Graph properties A graph property is a set of graphs that is closed under isomorphism Definition [Goldreich, Ron, 2002] G=(V,E) is -far from P, if one has to modify more than Dn edges to obtain a bounded degree graph with property P. connected -far
15
Complexity and Efficient Algorithms Group / Department of Computer Science 15 Property Testing in Bounded Degree Graphs Property Tester for property P [Goldreich, Ron, 2002] Oracle access to input graph G Accepts with probability at least 2/3, if G has property P Rejects with probability at least 2/3, if G is -far from P Quality measures Query complexity: Maximum number of oracle queries Running time
16
Complexity and Efficient Algorithms Group / Department of Computer Science 16 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/( D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/( D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept
17
Complexity and Efficient Algorithms Group / Department of Computer Science 17 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/( D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/( D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Observation ConnectivityTester accepts every connected graph
18
Complexity and Efficient Algorithms Group / Department of Computer Science 18 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/( D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/( D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Claim If G is -far from connected, then G has more than Dn/2 connected components.
19
Complexity and Efficient Algorithms Group / Department of Computer Science 19 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/( D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/( D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Claim At least Dn/4 of the connected components have size at most 4/( D).
20
Complexity and Efficient Algorithms Group / Department of Computer Science 20 A First Example: Connectivity Connectivitytester(G, ,D) [Goldreich, Ron, 2002] (1) Sample set S with s=8/( D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) 4/( D) vertices have been discovered or (b) all vertices of a small connected component have been discovered (4) if (b) then reject (5) accept Theorem Connectivitytester is a property tester with query complexity O(1/( ²D)).
21
Complexity and Efficient Algorithms Group / Department of Computer Science 21 Second Wrap-Up – Introduction to Property Testing Property Testing Approximately decide based on random sampling whether a graph has a property or is far away from it Quality measure: Query complexity Connectivity Sampling + BFS Check whether the sample violates the property
22
Complexity and Efficient Algorithms Group / Department of Computer Science 22 Second Wrap-Up – Introduction to Property Testing Question 3 Is the following algorithm a property tester for planarity (for right choice of f)? Planaritytester(G, ,D) (1) Sample set S with s= f( ,D) vertices uniformly at random from V (2) For every vertex from S: (3) Perform a BFS until (a) f( ,D) vertices have been discovered or (b) the discovered graph is not planar (4) if (b) then reject (5) accept
23
Complexity and Efficient Algorithms Group / Department of Computer Science 23 Second Wrap-Up – Introduction to Property Testing Bad news There is a class of graphs such that every cycle has Length (log n) and that are -far from planar Good news The sampling is fine, we just need to modify our acceptance condition 23
24
Complexity and Efficient Algorithms Group / Department of Computer Science 24 Random Walks, Stationary Distributions & Convergence Random Walk In each step: move from current vertex v to a neighbor chosen uniformly at random Convergence If G is connected and not bipartite, a random walk converges to a unique stationary distribution Pr[Random Walk is at vertex v] deg(v)
25
Complexity and Efficient Algorithms Group / Department of Computer Science 25 Random Walks, Stationary Distributions & Convergence Random Walks on Maps A random walk on a planar graph has the tendency to stay local It takes a long time to reach the stationary distribution Reason: The network has sparse cuts Random Walks on Social Networks A random walk will quickly move to a „random place“ Fast convergence The network does not have sparse cuts
26
Complexity and Efficient Algorithms Group / Department of Computer Science 26 Random Walks, Stationary Distributions & Convergence Lazy Random Walk In each step: - Probability to move from current vertex v to neighbor u is 1/(2D) - stays at v with remaining probability Convergence of Lazy Random Walks Stationary distribution is uniform Rate of Convergence Can be expressed in terms of the conductance of G or the second largest eigenvalue of the transition matrix O(log n) steps, if G is an expander graph
27
Complexity and Efficient Algorithms Group / Department of Computer Science 27 Conductance, Expanders & Small Worlds Definition The expansion (U) of a set U is defined as The conductance G of G is min U:1≤|U|≤|V|/2 (U) Definition A graph G=(V,E) is called -expander, if G ≥ for some constant Interpretations Expander graphs satisfy the „small-world phenomenon“ Conductance can be viewed as a measure for the social connectivity of a network
28
Complexity and Efficient Algorithms Group / Department of Computer Science 28 Testing Expanders Facts A lazy random walk converges to uniform distribution A lazy random walk converges quickly in expander graphs Hope A lazy random walk converges much slower, if the graph is -far from an expander graph In particular, we hope that the distribution of the endpoints of a (log n)- step lazy random walk differs significantly from the uniform distribution Question If so, how could we exploit this to design a property testing algorithm?
29
Complexity and Efficient Algorithms Group / Department of Computer Science 29 The Birthday Problem & Testing Uniform Distributions Birthday Problem n possible birthdays k persons with birthday chosen uniformly at random How large must k be so that with constant probability two person have the same birthday? Analysis p=(1/n,..,1/n) T ||p||² is the collision probability of two birthdays If we have k persons then the expected number of collision is So, for k = ( n) we expect to see a collision
30
Complexity and Efficient Algorithms Group / Department of Computer Science 30 Testing Uniform Distributions Observation The uniform distribution minimizes the expected number of pairwise collisions If a distribution q differs significantly from the uniform distribution then ||q||²>>||p||² TestUniformDistribution(distribution q) 1. Sample ( n) elements according to q 2. if the number of pairwise collisions is too large then reject 3. else accept
31
Complexity and Efficient Algorithms Group / Department of Computer Science 31 Testing Expanders TestingExpanders(G) 1. Sample set S of s vertices uniformly at random 2. for each v S do 3. Let q be the distribution of endpoints of a (log n)-step lazy random walk 4. if TestUniformDistribution(q) rejects then reject 5. accept History Algorithm was invented by [Goldreich and Ron, 2000] and algorithm conjectured to be a property tester First complete analysis by [Czumaj and Sohler, 2010] (but weaker than conjectured) Later improved by [Nachmias and Shapira, 2010] and [Kale and Seshadhri, 2011]
32
Complexity and Efficient Algorithms Group / Department of Computer Science 32 Final Result Theorem [ Nachmias and Shapira, 2010, Kale and Seshadhri, 2011] Algorithm TestingExpansion accepts every -expander and rejects every graph that is -far from a ²)-expander. The algorithm has a running time of O(n 1/2+ ). Key structural property of „ -far“-graphs If G is -far from a ²)-expander then there exists a set U of ( n) vertices with (U) = O( ²). Implies that for many vertices, the distribution of endpoints of a random walk of length O(log n) is significantly different from the uniform distribution
33
Complexity and Efficient Algorithms Group / Department of Computer Science 33 Third Wrap-Up – Testing Expansion (Lazy) Random Walks Moves from a vertex to a random neighbor Converges to uniform distribution Speed of convergence depends on graph structure Testing Expansion Random Walk converges quickly in expander graphs Random Walk converges slower if we are far from expander graphs Number of collisions among end points of random walks is minimized in expander graphs We can test expansion by counting collisions
34
Complexity and Efficient Algorithms Group / Department of Computer Science 34 Graph Clustering & Web Communities Web Graph Communities Set of vertices that induces an expander graph and has a sparse cut to the rest of the graph Question: Is the web graph composed of a set of at most k communities? Definition A subset C V is called ( in, out )-cluster, if G (G[C]) ≥ in (C) ≤ out Definition A partition of V into at most k ( in, out )-clusters is called (k, in, out )-clustering
35
Complexity and Efficient Algorithms Group / Department of Computer Science 35 Testing k-Clusterings A Simple Case? Distinguish between a union of at most k expander graphs with no edges in between and a set of more than k (large) expander graphs with no edges in between Can we use our previous algorithm to test for a k-clustering? Expander
36
Complexity and Efficient Algorithms Group / Department of Computer Science 36 Testing k-Clusterings A Simple Case? No! We do not know the size of the clusters (expander graphs) and estimating the support size of a distribution is hard [Raskhodnikova et al., 2009] Expander
37
Complexity and Efficient Algorithms Group / Department of Computer Science 37 Testing k-Clusterings New idea If two vertices come from the same cluster, the random walks quickly converge to the same distribution So, we could try to sample a set of vertices and check for sets of vertices whose random walks induce the same distributions Expander
38
Complexity and Efficient Algorithms Group / Department of Computer Science 38 Main Idea [Batu et al. 2013; Chan et al. 2014] if p q then then the following experiments should give roughly the same number of collisions between elements from S and T: Draw two sets S and T of m elements from p Draw two sets S and T of m elements from q Draw set S of m elements from p and set T of m elements from q If p and q differ significantly, at least one of the three values is different Testing Closeness of Distributions
39
Complexity and Efficient Algorithms Group / Department of Computer Science 39 Theorem [Batu et al. 2013; Chan et al. 2014] There is a tester that w.p. 2/3 accepts, if ||p-q||≤ /2 and rejects, if ||p-q||≥ . The query complexity of the algorithms is O( b/ ²), where b is an upper bound on ||p||² and ||q||². Testing Closeness of Distributions
40
Complexity and Efficient Algorithms Group / Department of Computer Science 40 Theorem [Batu et al. 2013; Chan et al. 2014] There is a tester that w.p. 2/3 accepts, if ||p-q||≤ /2 and rejects, if ||p-q||≥ . The query complexity of the algorithms is O( b/ ²), where b is an upper bound on ||p||² and ||q||². We will need b to be O(1/n) Testing Closeness of Distributions
41
Complexity and Efficient Algorithms Group / Department of Computer Science 41 The Algorithm ClusteringTest 1. Sample set S of s vertices uniformly at random 2. For any v S let D(v) be the distribution of end points of a random walk of length (log n) starting at v 3. for each pair u,v S do 4. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S 5. accept, if and only if the cluster graph is a collection of at most k cliques
42
Complexity and Efficient Algorithms Group / Department of Computer Science 42 Testing k-Clusterings Observation Algorithm ClusteringTest distinguishes between at most k expanders and more than k (large) expanders Expander
43
Complexity and Efficient Algorithms Group / Department of Computer Science 43 Testing k-Clusterings Observation Algorithm ClusteringTest distinguishes between at most k expanders and more than k (large) expanders Can we generalize it to testing of (k, in, out )-clusterings ? Expander
44
Complexity and Efficient Algorithms Group / Department of Computer Science 44 Testing k-Clusterings - Soundness Challenge Since the clusters may be connected in a (k, in, out )-clustering the stationary distribution may be uniform over G (and not over the cluster)
45
Complexity and Efficient Algorithms Group / Department of Computer Science 45 Testing k-Clusterings - Soundness Challenge Since the clusters may be connected in a (k, in, out )-clustering the stationary distribution may be uniform over G (and not over the cluster) Need to show that for proper length of the random walk there is an „intermediate“ distribution that it is „reasonably stable“ w.r.t. l 2 -error
46
Complexity and Efficient Algorithms Group / Department of Computer Science 46 The Algorithm ClusteringTest 1. Sample set S of s vertices uniformly at random 2. For any v S let D(v) be the distribution of end points of a random walk of length (log n) starting at v 3. for each pair u,v S do 4. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S 5. accept, if and only if the cluster graph is a collection of at most k cliques
47
Complexity and Efficient Algorithms Group / Department of Computer Science 47 The Algorithm ClusteringTest 1. Sample set S of s vertices uniformly at random 2. For any v S let D(v) be the distribution of end points of a random walk of length (log n) starting at v 3. if ||D(v)||² > O(1/n) then reject 4. for each pair u,v S do 5. if D(u) and D(v) are close then add an edge (u,v) to the „cluster graph“ on vertex set S 6. accept, if and only if the cluster graph is a collection of at most k connected components
48
Complexity and Efficient Algorithms Group / Department of Computer Science 48 Testing k-Clusterings - Completeness Required Properties of a (k, in, out )-clustering For most vertices v: The distribution D(v) of end points of a lazy random walk of proper length has ||D(v)||² = O(1/n) For most pairs u,v from the same cluster: ||D(v)- D(u)||² is very small Useful Tool – Higher Order Cheeger‘s Inequality [Lee et al. 2014] Relates (k, in, out )-clustering to the k+1 largest eigenvalues
49
Complexity and Efficient Algorithms Group / Department of Computer Science 49 Testing k-Clusterings - Soundness Structural property of „ -far“-graphs (similarly to expanders) If G is -far from a (k, in *, out * )-clusterings then there exists a partition into k+1 sets C 1,…,C k+1 each of ( ²n/k) vertices and with (C i ) = O( in */ ²).
50
Complexity and Efficient Algorithms Group / Department of Computer Science 50 Testing k-Clusterings Theorem [Czumaj, Peng, Sohler, 2015] Algorithm ClusteringTester accepts every (k, in, out )-clustering with probability at least 2/3 and rejects every graph that is -far from every (k, in *, out *)-clustering with probability at least 2/3, where out =O( 4 in ²) and in * = ( 4 in ²/log n) for constants k,D. The running time of the algorithm is O*( n).
51
Complexity and Efficient Algorithms Group / Department of Computer Science 51 Fourth Wrap-Up Testing Clusterings End points of Random Walk of proper length should be uniform on its cluster with not much probability „outside“ If Random Walks start from two different points of the same cluster, their end point distributions are similar Collision statistics can be used to pairwise test similarity of distributions This can be used to approximate the cut structure Take away message The distribution of end points of random walks (possibly comparing different starting vertices) contains a lot of information about the cut structure of a graph
52
Complexity and Efficient Algorithms Group / Department of Computer Science 52 Summary Vision Learning from very large sets of massive graphs Approach Feature computation by random sampling Analysis in the framework of property testing Two Examples Expanders (connectivity measure in social networks) Clustering (structure of social networks)
53
Complexity and Efficient Algorithms Group / Department of Computer Science 53 Thank you! Source Slide 2: Allan Ajifo und cobalt123; creative common license Slide 3: GustavoG und Jasper Nance; creative common license Slide 4: Wikipedia; Jason Brown; creative common license Slide 5: GustavoG; creative common license Slide 6: GoldenRibbon, creative common license
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.