CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms.

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Charalampos (Babis) E. Tsourakakis KDD 2013 KDD'131.
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
1 On the Eigenvalue Power Law Milena Mihail Georgia Tech Christos Papadimitriou U.C. Berkeley &
Introduction to Network Theory: Modern Concepts, Algorithms
Link Analysis: PageRank
Xiaowei Ying, Xintao Wu, Daniel Barbara Spectrum based Fraud Detection in Social Networks 1.
SILVIO LATTANZI, D. SIVAKUMAR Affiliation Networks Presented By: Aditi Bhatnagar Under the guidance of: Augustin Chaintreau.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Efficient Distribution Mining and Classification Yasushi Sakurai (NTT Communication Science Labs), Rosalynn Chong (University of British Columbia), Lei.
Advanced Topics in Data Mining Special focus: Social Networks.
Mining and Searching Massive Graphs (Networks)
Modeling Real Graphs using Kronecker Multiplication
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
Network Statistics Gesine Reinert. Yeast protein interactions.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Advanced Topics in Data Mining Special focus: Social Networks.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec,
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Charalampos (Babis) E. Tsourakakis WAW 2010, Stanford 16 th December ‘10 WAW '101.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
MapReduce on Matlab By: Erum Afzal.
CS8803-NS Network Science Fall 2013
Spectral Graph Theory (Basics)
Charalampos (Babis) E. Tsourakakis Brown University Brown University May 22 nd 2014 Brown University1.
Information Networks Power Laws and Network Models Lecture 3.
Alan Frieze Charalampos (Babis) E. Tsourakakis WAW June ‘12 WAW '121.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
FAST COUNTING OF TRIANGLES IN LARGE NETWORKS: ALGORITHMS AND LAWS RPI Theory Seminar, 24 November 2008 Charalampos (Babis) Tsourakakis School of Computer.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Efficient Triangle Motif Counting in Large Scale Complex Networks with GPUs Hakan Kardeş CS 791v.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
On-line Social Networks - Anthony Bonato 1 Dynamic Models of On-Line Social Networks Anthony Bonato Ryerson University WAW’2009 February 13, 2009 nt.
Random Dot Product Graphs Ed Scheinerman Applied Mathematics & Statistics Johns Hopkins University IPAM Intelligent Extraction of Information from Graphs.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
DM GROUP MEETING PRESENTATION PLAN Eigenvector-based Centrality Measures For Temporal Networks by D Taylor et.al. Uncovering the Small Community.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
On the behaviour of an edge number in a power-law random graph near a critical points E. V. Feklistova, Yu.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
Density of States for Graph Analysis
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
DOULION: Counting Triangles in Massive Graphs with a Coin
Sequential Algorithms for Generating Random Graphs
Network analysis.
NetMine: Mining Tools for Large Graphs
Degree and Eigenvector Centrality
Section 8.6 of Newman’s book: Clustering Coefficients
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Clustering Coefficients
3.3 Network-Centric Community Detection
Lecture 6: Counting triangles Dynamic graphs & sampling
Modelling and Searching Networks Lecture 2 – Complex Networks
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Analysis of Large Graphs: Overlapping Communities
Presentation transcript:

CHARALAMPOS E. TSOURAKAKIS SCHOOL OF COMPUTER SCIENCE CARNEGIE MELLON UNIVERSITY Fast counting of triangles in large networks without counting: Algorithms and laws 1 ICDM, Dec. '08

C. E. Tsourakakis Triangle related problems Given an undirected, simple graph G(V,E) a triangle is a set of three vertices such that any two of them are connected by an edge of the graph. Related problems  Decide if a graph is triangle-free.  Count the total number of triangles Δ(G).  Count the number of triangles Δ(v) that vertex v participates in.  List the triangles that each vertex v participates in. 2 ICDM, Dec. '08 Generality Our focus

C. E. Tsourakakis Why is Triangle Counting important? From the Graph Mining Perspective ICDM, Dec. '08 3 Clustering coefficient Transitivity ratio Social Network Analysis fact: “Friends of friends are friends” [WF94] Other applications include: Hidden Thematic Structure of the Web [EM02] Motif Detection e.g. biological networks [YPSB05] Web Spam Detection [BPCG08] A C B

C. E. Tsourakakis Outline ICDM, Dec. '08 4 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Related Work ICDM, Dec. '08 5 FastLow space Time complexityO(n 2.37 )O(n 3 ) Space complexityO(n2)O(n2)O(m)=O(n 2 ) FastLow space Time complexity O(m 0.7 n 1.2 +n 2+o(1) ) e.g. O( n ) Space complexityO(n 2 ) (eventually) O(m) Dense graphs S p a r s e g r a p h s

C. E. Tsourakakis Outline ICDM, Dec. '08 6 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Theorem [EigenTriangle] ICDM, Dec. '08 7 Theorem 1 Δ(G) = # triangles in graph G(V,E) = eigenvalues of adjacency matrix A G

C. E. Tsourakakis Theorem [EigenTriangleLocal] ICDM, Dec. '08 8 Theorem 2 Δ(i) = #Δ s vertex i participates at. = i-th eigenvector = j-th entry of i Δ(i) = 2

C. E. Tsourakakis Outline ICDM, Dec. '08 9 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis EigenTriangle Algorithm (interactively) ICDM, Dec. '08 10 I want to compute the number of triangles! Use Lanczos to compute the first two eigenvalues please! Is the cube of the second one significantly smaller than the cube of the first? NO Iterate then! After some iterations… (hopefully few!) Compute the k-th eigenvalue. Is much smaller than ? YES! Algorithm terminates! The estimated # of Δs is the sum of cubes of λ i’ s divided by 6!

C. E. Tsourakakis EigenTriangle Algorithm ICDM, Dec. '08 11

C. E. Tsourakakis EigenTriangleLocal Algorithm ICDM, Dec. '08 12 Why are these two algorithms efficient on power law networks?

C. E. Tsourakakis Typical Spectra of Power Law Networks ICDM, Dec. '08 13 AirportsPolitical blogs

C. E. Tsourakakis 1 st Reason : Top Eigenvalues of Power-Law Graphs ICDM, Dec. '08 14 Very important for us because:  Few eigenvalues contribute a lot!  Cubes amplify this even more!  Lanczos converges fast due to large spectral gaps [GL89]!

C. E. Tsourakakis 1 st Reason : Top Eigenvalues of Power-Law Graphs ICDM, Dec. '08 15 One of the first to observe that the top eigenvalues follow a power-law were Faloutsos, Faloutsos and Faloutsos [FFF99]. Some years later Mihail & Papadimitriou [MP02] and Chung, Lu and Vu [CLV03] gave an explanation of this fact.

C. E. Tsourakakis 2 nd Reason : Bulk of eigenvalues ICDM, Dec. '08 16 Almost symmetric around 0! Sum of cubes almost cancels out! Political Blogs Omit! Keep only 3! 3

C. E. Tsourakakis Outline ICDM, Dec. '08 17 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Datasets ICDM, Dec. '08 18 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps)

C. E. Tsourakakis Datasets ICDM, Dec. '08 19 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks

C. E. Tsourakakis Datasets ICDM, Dec. '08 20 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network

C. E. Tsourakakis Datasets ICDM, Dec. '08 21 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network Information Networks

C. E. Tsourakakis Datasets ICDM, Dec. '08 22 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network Information Networks Web Graphs

C. E. Tsourakakis Datasets ICDM, Dec. '08 23 NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps) Social Networks Co-authorship network Information Networks Web Graphs Internet Graphs

C. E. Tsourakakis Datasets ICDM, Dec. '08 24 ~3.15M nodes ~37M edges NodesEdgesDescription ~75K~405KEpinions network ~404K~2.1MFlickr ~27K~341KArxiv Hep-Th ~1K~17KPolitical blogs ~13K~148KReuters news ~3M35MWikipedia 2006-Sep-05 ~3.15M~37MWikipedia 2006-Nov-04 ~13.5K~37.5KAS Oregon ~23.5K~47.5KCAIDA AS 2004 to 2008 (means over 151 timestamps)

C. E. Tsourakakis Competitor: Node Iterator 25 Node Iterator algorithm For each node, look at its neighbors, then check how many edges among them. Complexity: O( ) We report the results as the speedup vs. Node Iterator. ICDM, Dec. '08

C. E. Tsourakakis Results: #Eigenvalues vs. Speedup 26 ICDM, Dec. '08

C. E. Tsourakakis Results: #Edges vs. Speedup 27 ICDM, Dec. '08 Observe the trend

C. E. Tsourakakis Some interesting observations typical rank for at least 95% Speedups are between 33.7x and 1159x. The mean speedup is 250. Notice the increasing speedup as the size of the network grows. ICDM, Dec. '08

C. E. Tsourakakis Evaluating the Local Counting Method ICDM, Dec. '08 29 Triangles node i participates according to our estimation

C. E. Tsourakakis #Eigenvalues vs. ρ for three networks 30 ICDM, Dec. ' eigenvalues almost ideal results!

C. E. Tsourakakis Outline ICDM, Dec. '08 31 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Triangle Participation Power Law (TPPL) ICDM, Dec. '08 32 EPINIONS δ = #Triangles Count of nodes participating in δ triangles

C. E. Tsourakakis Triangle Participation Power Law (TPPL) ICDM, Dec. '08 33 HEP_TH (coauthorship) Flickr

C. E. Tsourakakis Degree Triangle Power Law (DTPL) ICDM, Dec. '08 34 EPINIONS d, all degrees appearing in the graph Mean #Δs over all nodes with degree d

C. E. Tsourakakis Degree Triangle Power Law (DTPL) ICDM, Dec. '08 35 Flickr Reuters

C. E. Tsourakakis Observations on TPPL & DTPL ICDM, Dec. '08 36 TTPL: Many nodes few triangles Few nodes many triangles

C. E. Tsourakakis Observations on TPPL & DTPL ICDM, Dec. '08 37 DTPL:  Power law fits nicely to the Degree-Triangle plot.  Slope is the opposite of the slope of the degree distribution (slope complementarity).

C. E. Tsourakakis Outline ICDM, Dec. '08 38 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Kronecker graphs ICDM, Dec. '08 39 Kronecker graphs is a model for generating graphs that mimic properties of real-world networks. The basic operation is the Kronecker product([LCKF05]) Initiator graph Adjacency matrix A [0] Kronecker Product Adjacency matrix A [1] Adjacency matrix A [2] Repeat k times Adjacency matrix A [k]

C. E. Tsourakakis Triangles in Kronecker Graphs ICDM, Dec. '08 40 Theorem[KroneckerTRC ] Let B = A [k] k-th Kronecker product and Δ(G A ), Δ(G Β ) the total number of triangles in G A, G Β. Then, the following equality holds:

C. E. Tsourakakis Outline ICDM, Dec. '08 41 Related Work Proposed Method  Theorems  Algorithms  Explaining efficiency Experiments Triangle-related Laws Triangles in Kronecker Graphs Conclusions

C. E. Tsourakakis Conclusions ICDM, Dec. '08 42 Triangles can be approximated with high accuracy in power law networks by taking a few, constant number of eigenvalues. The method is easily parallelizable (matrix-vector multiplications only) and converges fast due to large spectral gaps. New triangle-related power laws Closed formula for triangles in Kronecker graphs.

C. E. Tsourakakis Future Work ICDM, Dec. '08 43 Import in HADOOP PEGASUS (Peta-Graph Mining)  On-going work with U Kang and Christos Faloutsos in collaboration with Yahoo! Research.

C. E. Tsourakakis Christos Faloutsos Ioannis Koutis ICDM, Dec. '08 44 Acknowledgements For the helpful discussions

C. E. Tsourakakis Maria Tsiarli ICDM, Dec. '08 45 Acknowledgements For the PEGASUS logo

C. E. Tsourakakis 46 ICDM, Dec. '08

C. E. Tsourakakis References ICDM, Dec. '08 47 [WF94] Wasserman, Faust: “Social Network Analysis: Methods and Applications (Structural Analysis in the Social Sciences)” [EM02] Eckmann, Moses: “Curvature of co-links uncovers hidden thematic layers in the World Wide Web” [YPSB05] Ye, Peyser, Spencer, Bader: “Commensurate distances and similar motifs in genetic congruence and protein interaction networks in yeast”

C. E. Tsourakakis References ICDM, Dec. '08 48 [BPCG08] Becchetti, Boldi, Castillo, Gionis Efficient Semi-Streaming Algorithms for Local Triangle Counting in Massive Graphs [LCKF05] Leskovec, Chakrabarti, Kleinberg, Faloutsos: “Realistic, Mathematically Tractable Graph Generation and Evolution using Kronecker Multiplication” [FFF09] Faloutsos, Faloutsos, Faloutsos: “On power-law relationships of the Internet topology”

C. E. Tsourakakis References ICDM, Dec. '08 49  [MP02] Mihail, Papadimitriou: “On the Eigenvalue Power Law”  [CLV03] Chung, Lu, Vu: “Spectra of Random Graphs with given expected degrees”  [GL89] Golub, Van Loan: “Matrix Computations”

C. E. Tsourakakis References ICDM, Dec. '08 50 For more references, paper and slides:

C. E. Tsourakakis Questions? ICDM, Dec. '08 51