Download presentation
Presentation is loading. Please wait.
Published byJustin Tate Modified over 9 years ago
1
FAST COUNTING OF TRIANGLES IN LARGE NETWORKS: ALGORITHMS AND LAWS RPI Theory Seminar, 24 November 2008 Charalampos (Babis) Tsourakakis School of Computer Science Carnegie Mellon University http://www.cs.cmu.edu/~ctsourak
2
Counting Triangles RPI, November 2008 2 Given an undirected, simple graph G(V,E) a triangle is a set of 3 vertices such that any two of them by an edge of the graph. Related Problems a) Decide if a graph is triangle-free. b) Count the total number of triangles δ (G). c) Count the number of triangles δ (v) that each vertex v participates at. d) List the triangles that each vertex v participates at. Our focus
3
Why is triangle counting important*? RPI, November 2008 3 Social Network Analysis: “Friends of friends are friends” [WF94] Web Spam Detection [BPCG08] Hidden Thematic Structure of the Web [EM02] Motif Detection e.g. biological networks [YPSB05] *few indicative reasons, from the graph mining perspective
4
Why is triangle counting important? RPI, November 2008 4 Furthermore, two often used metrics are: Clustering Coefficient where: Transitivity Ratio where: Triple at node v Triangle v
5
Outline RPI, November 2008 5 Related Work Proposed Method Experiments Triangle-related Laws Triangles in Kronecker Graphs Future Work & Open Problems
6
Counting methods Dense graphs FastLow space Time complexityO(n 2.37 )O(n 3 ) Space complexityO(n2)O(n2)O(m) FastLow space Time complexity O(m 0.7 n 1.2 +n 2+o(1) ) e.g. O( n ) Space complexity Θ (n 2 ) (eventually) Θ (m) Sparse graphs RPI, November 2008 6
7
Outline RPI, November 2008 7 Related Work Proposed Method Experiments Triangle-related Laws Triangles in Kronecker Graphs Future Work & Open Problems
8
Outline of the Proposed Method 8 EigenTriangle theorem EigenTriangleLocal theorem EigenTriangle algorithm EigenTriangleLocal algorithm Efficiency & Complexity Power law degree distributions Gershgorin discs Real world network spectra RPI, November 2008
9
Theorem [EigenTriangle] 9 Theorem The number of triangles δ (G) in an undirected, simple graph G(V,E) is given by: where are the eigenvalues of the adjacency matrix of graph G. RPI, November 2008
10
Proof 10 Call A the adjacency matrix of the graph. Consider the i-th diagonal element of A 3, α ii. This element is equal to the number of triangles vertex i participates at. So the trace is 6 δ (G) because each triangle is counted 6 times (3 participating vertices and is also counted as i-j-k, and i-k-j). Furthermore, if Ax= λ x, then λ 3 is an eigenvalue of A 3 (*) and vice versa if λ is an eigenvalue of A 3, then is an eigenvalue of A. * A 3 x=AAAx=AA λ x= λΑΑ x= λΑλ x= λ 2 Α x= λ 3 x RPI, November 2008
11
Theorem [EigenTriangleLocal] 11 Theorem The number of triangles δ (i) vertex i partipates at is equal to: where is the j-th entry of the i-th eigenvector Proof [Sketch] Follows from the previous theorem and the fact that A is symmetric, therefore diagonalizable and also RPI, November 2008
12
EigenTriangle Algorithm 12 RPI, November 2008
13
EigenTriangleLocal Algorithm 13 RPI, November 2008 Why are these two algorithms efficient?
14
Skewed Degree Distributions 14 Skewed degree distribution ubiquitous in nature! Have been termed as “the signature of human activity”[FKP02] but appear as well to all other kind of networks, e.g. biological. See [N05][M04] for generative models of power law distributions. Typically referred to as power-laws (even if sometimes we abuse the strict definition of a power law, i.e ). RPI, November 2008
15
Examples of power laws 15 Newman [N05] demonstrated how often power laws appear using may different types of networks, ranging from word frequencies to population of cities. RPI, November 2008 Many cities have a small population Few cities have a huge population
16
Gershgorin’s Discs RPI, November 2008 16 Theorem Let B an arbitrary matrix. Then the eigenvalues λ of B are located in the union of the n discs For a proof see Demmel [D97], p.82.
17
Gershgorin Discs RPI, November 2008 17 Bounds on the airports network (Observe how loose)
18
Typical real world spectra 18 RPI, November 2008 AirportsPolitical blogs
19
Top Eigenvalues 19 Zooming in the top eigenvalues and plotting the rank vs. the eigenvalue in log-log scale reveals that the top eigenvalues follow a power law [FFF99] Some years later, Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] proved this fact. RPI, November 2008
20
Our idea 20 Simple & clear: Use a low-rank approximation of A 3 to estimate the diagonal elements and the trace. Suggests also a way of thinking: Take advantage of special properties (e.g. power laws) to reduce the complexity of certain computational tasks in real-world networks. RPI, November 2008
21
Summing up: Why does it work? 21 Almost symmetry of the spectrum around 0 for the bulk of the eigenvalues except the top ones is the first main reason. Cubes amplify strongly this phenomenon! RPI, November 2008
22
Complexity Analysis 22 Main computational bottleneck that determines the complexity is the Lanczos method. Lanczos runs in linear time with respect to the non- zero entries of the matrix, i.e. the edges, assuming that we compute a few constant number of eigenvalues. Convergence of Lanczos is fast due to the eigenvalue power law (see Kaniel-Paige theory [GL89]) RPI, November 2008
23
Outline RPI, November 2008 23 Related Work Proposed Method Experiments Triangle-related Laws Triangles in Kronecker Graphs Future Work & Open Problems
24
Datasets 24 RPI, November 2008
25
Competitor: Node Iterator 25 Node Iterator algorithm considers each node at the time, looks at its neighbors and checks how many among them are connected among them. Complexity: O(n ) We report the results as the speedup that EigenTriangle algorithm gives compared to the running time of the Node Iterator. RPI, November 2008
26
Results: #Eigenvalues vs. Speedup 26 RPI, November 2008
27
Results: #Edges vs. Speedup 27 RPI, November 2008
28
Main points 28 Some interesting facts for the two scatterplots: Mean required approximations rank for at least 95% is 6.2 Speedups are between 33.7x and 1159x. The mean speedup is 250. Notice the increasing speedup as the size of the network grows. RPI, November 2008
29
Zooming in 29 RPI, November 2008 Zooming in this point
30
Evaluating the Local Counting Method 30 Pearson’s correlation coefficient ρ Relative Reconstruction Error RPI, November 2008 Political Blogs: RRE 7*10 -4 ρ 99.97%
31
#Eigenvalues vs. ρ for three networks 31 RPI, November 2008 Observe how a low rank results in almost optimal results. This holds for surprisingly many real world networks
32
Outline RPI, November 2008 32 Related Work Proposed Method Experiments Triangle-related Laws Triangles in Kronecker Graphs Future Work & Open Problems
33
Triangle Participation Law RPI, November 2008 33 Plots the number of triangles δ (x-axis) vs. the count of vertices with δ participating triangles. a) EPINIONS, who trusts-whos b) ASN, social network c) HEP_TH, collaboration network (a)(b) (c)
34
Degree Triangle Law RPI, November 2008 34 Plots the degree d i (x-axis) vs. the mean number of triangles that nodes with degree d i participate at. EpinionsASN
35
Outline RPI, November 2008 35 Related Work Proposed Method Experiments New Triangle-related Laws Triangles in Kronecker Graphs Future Work & Open Problems
36
Kronecker Graphs RPI, November 2008 36 This model was introduced in [LCKF05]. It is based on the simple operation of the Kronecker product to generate graphs that mimic real world networks. Deterministic Kronecker Graphs: Kronecker Product of the adjacency matrix at the current step k with the initiator adjacency matrix (typically small). Stochastic Kronecker Graphs: Kronecker Product of the matrix at the current step k with the initiator matrix. Initiator matrix contains probabilities. For more details see [LF07].
37
Triangles in Kronecker Graphs RPI, November 2008 37 Some notation first: A: nxn initiatior adjacency matrix of the undirected, simple graph G A B = A [k] k-th Kronecker product λ =( λ 1,..., λ n ) the eigenvalues of A Δ (G A ), Δ (G Β ) #triangles of G A, G Β Theorem [KroneckerTRC]
38
Proof 38 We use induction on the number of recursion steps k. For k=0 the theorem trivially holds. Assume now that KroneckerTRC holds now for some.Call C=A [r], D=A [r+1] and the eigenvalues of C, [ μ i ] i=1..s.By the assumption The eigenvalues of D are given by the Kronecker product. By the EigenTriangle theorem, the number of triangles in D is given by: RPI, November 2008
39
Proof 39 RPI, November 2008 Therefore KroneckerTRC holds for all. Q.E.D
40
Outline RPI, November 2008 40 Related Work Proposed Method Experiments New Triangle-related Laws Triangles in Kronecker Graphs Future Work & Open Problems
41
Theoretical Challenge I: Spectra of real world networks 41 Can we prove things about the distribution of the eigenvalues, adopting a random graph model such as the expected degree model G(w) [CLV03]? An analog to Wigner’s semicircle law for random Erdos-Renyi graphs (see Furedi-Komlos [FK81]) RPI, November 2008 Spectrum of over 100000 Iterations [S07]
42
Theoretical Challenge I: Spectra of real world networks 42 RPI, November 2008 Empirically, the rest of the spectrum: Triangular-like distribution [FDBV01] Can we prove Something about this empirical observation ?
43
Theoretical Challenge II: Eigenvectors of real world networks RPI, November 2008 43 Things even “worse” than the case of spectra. Very few knowledge about the eigenvectors. Related work: See [P08] for random graphs.
44
Theoretical Challenge III: Degree Triangle Law 44 Prove using the expected degree random graph model G(w) the pattern we saw (see [S04]) Conjecture: The relationship we observed probably appears for some cases of the slope of the degree distribution. Further experiments, recently showed that for some graphs this pattern does not hold. RPI, November 2008
45
Experimental Challenge I: Compare with Streaming Methods 45 Streaming or Semi-Streaming methods, perform one or O(1) passes over the graph. [YKS02] [BFLSS06] [BPCG08] Common Underlying Idea: Sophisticated sampling methods Implement and compare. RPI, November 2008
46
Practical Challenge I: Triangles in Large Scale Graph Mining 46 Many Giga-byte and Peta-byte sized graphs. How to handle these graphs? HADOOP EigenTriangle algorithms are based just on simple matrix vector multiplications. Easy to parallelize in all sorts of architectures (distributed memory, shared memory). See [ DHV93 ] for the details. RPI, November 2008
47
PEGASUS: Peta-Graph Mining from the Triangle perspective 47 RPI, November 2008 On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research. Among others: Implement EigenTriangle algorithms in HADOOP and compare to other methods. Find outliers in graphs with many billions of edges wrt triangles. Soon… Stay tuned!
48
Curious about: RPI, November 2008 48
49
Acknowledgements RPI, November 2008 49 Christos Faloutsos Yiannis Koutis For the helpful discussions
50
Acknowledgements RPI, November 2008 50 Maria Tsiarli For the PEGASUS logo
51
51 RPI, November 2008
52
References RPI, November 2008 52 [WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)” [EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web” [BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs [FKP02] Fabrikant, Koutsoupias, Papadimitriou: “Heuristically Optimized Trade-offs: A New Paradigm for Power Laws in the Internet” [N05] Newman: “Power laws, Pareto distributions and Zipf's law” [M04] Mitzenmacher: “A brief history of generative models for power law and lognormal distributions” [FK81] Furedi-Komlos: “Eigenvalues of random symmetric matrices”
53
References RPI, November 2008 53 [S04] Danilo Sergi: “Random graph model with power-law distributed triangle subgraphs” [D97] Demmel: “Applied Numerical Algebra” [LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos: “Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication” [LK07] Leskovec, Faloutsos: “Scalable Modeling of Real Graphs using Kronecker Multiplication” [FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology” [MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law” [CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with given expected degrees”
54
References RPI, November 2008 54 [YKS02] Yossef, Kumar, Sivakumar: “Scalable Modeling of Real Graphs using Kronecker Multiplication” [GL89] Golub, Van Loan: “Matrix Computations” [BFLSS06] Buriol, Frahling, Leonardi, Spaccamela, Sohler: “Counting triangles in data streams” [DHV93] Demmel, Heath, Vorst: “Parallel Numerical Linear Algebra” [YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast” [P08] Mitra Pradipta: “Entrywise Bounds for Eigenvectors of Random Graphs” [FDBV01] Farkas, Derenyi, Barabasi, Vicsek: “Spectra of "real-world" graphs: Beyond the semi-circle law” [S07] Spielman’s “Spectral Graph Theory and its Applications” class (YALE): http://www.cs.yale.edu/homes/spielman/eigs/ http://www.cs.yale.edu/homes/spielman/eigs/
55
References RPI, November 2008 55 [F08] Faloutsos’ “Multimedia Databases and Data Mining” class (CMU): http://www.cs.cmu.edu/~christos/courses/826.S08 http://www.cs.cmu.edu/~christos/courses/826.S08 For more references, take a look also in the paper: http://www.cs.cmu.edu/~ctsourak/tsourICDM08.pdf http://www.cs.cmu.edu/~ctsourak/tsourICDM08.pdf
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.