Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.

Similar presentations


Presentation on theme: "CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman."— Presentation transcript:

1 CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman Chau, Polo Kang, U OpenCirrus'10

2 CMU SCS ICDM-LDMTA 2009C. Faloutsos 2 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]

3 CMU SCS ICDM-LDMTA 2009C. Faloutsos 3 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph Social networking sites (Facebook, twitter) Users posing and answering questions Click-streams (user – page bipartite graph)... and more – any M:N db relationship D1D1 DNDN T1T1 TMTM...

4 CMU SCS C. Faloutsos (CMU) 4 Our goal: One-stop solution for mining huge graphs: PEGASUS project (PEta GrAph mining System) www.cs.cmu.edu/~pegasus Open-source code and papers OpenCirrus'10

5 CMU SCS C. Faloutsos (CMU) 5 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

6 CMU SCS HADI for diameter estimation Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10 Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B) Our HADI: linear on E (~10B) –Near-linear scalability wrt # machines –Several optimizations -> 5x faster C. Faloutsos (CMU) 6 OpenCirrus'10

7 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? ?? 19+? [Barabasi+] 7 C. Faloutsos (CMU)OpenCirrus'10 Radius Count

8 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. 8 C. Faloutsos (CMU)OpenCirrus'10

9 CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. 9 C. Faloutsos (CMU)OpenCirrus'10

10 CMU SCS Radius Plot of GCC of YahooWeb. 10 C. Faloutsos (CMU)OpenCirrus'10

11 CMU SCS Running time - Kronecker and Erdos-Renyi Graphs with billions edges. #11C. Faloutsos (CMU)OpenCirrus'10

12 CMU SCS C. Faloutsos (CMU) 12 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

13 CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) OpenCirrus'10C. Faloutsos (CMU) 13 PEGASUS: A Peta-Scale Graph Mining System - Implementation and ObservationsSystem - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up).ICDM

14 CMU SCS Generalized Iterated Matrix Vector Multiplication (GIMV) OpenCirrus'10C. Faloutsos (CMU) 14 PageRank proximity (RWR) Diameter Connected components (eigenvectors, Belief Prop. … ) Matrix – vector Multiplication (iterated)

15 CMU SCS 15 Example: GIM-V At Work Connected Components Size Count C. Faloutsos (CMU)OpenCirrus'10

16 CMU SCS 16 Example: GIM-V At Work Connected Components Size Count 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? C. Faloutsos (CMU)OpenCirrus'10

17 CMU SCS 17 Example: GIM-V At Work Connected Components Size Count suspicious financial-advice sites (not existing now) C. Faloutsos (CMU)OpenCirrus'10

18 CMU SCS C. Faloutsos (CMU) 18 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

19 CMU SCS ASONAM 2009C. Faloutsos 19 Triangles Real social networks have a lot of triangles

20 CMU SCS ASONAM 2009C. Faloutsos 20 Triangles Real social networks have a lot of triangles –Friends of friends are friends Q1: how to compute quickly? Q2: Any patterns?

21 CMU SCS ASONAM 2009C. Faloutsos 21 Triangles : Computations [Tsourakakis ICDM 2008] Q: Can we do that quickly? Triangles are expensive to compute (3-way join; several approx. algos)

22 CMU SCS ASONAM 2009C. Faloutsos 22 Triangles : Computations [Tsourakakis ICDM 2008] But: triangles are expensive to compute (3-way join; several approx. algos) Q: Can we do that quickly? A: Yes! #triangles = 1/6 Sum ( i 3 ) (and, because of skewness, we only need the top few eigenvalues!

23 CMU SCS ASONAM 2009C. Faloutsos 23 Triangles : Computations [Tsourakakis ICDM 2008] 1000x+ speed-up, high accuracy

24 CMU SCS C. Faloutsos (CMU) 24 Triangles Easy to implement on hadoop: it only needs eigenvalues (working on it, using Lanczos) OpenCirrus'10

25 CMU SCS ASONAM 2009C. Faloutsos 25 Triangles Real social networks have a lot of triangles –Friends of friends are friends Q1: how to compute quickly? Q2: Any patterns?

26 CMU SCS ASONAM 2009C. Faloutsos 26 Triangle Law: #1 [Tsourakakis ICDM 2008] ASNHEP-TH Epinions X-axis: # of Triangles a node participates in Y-axis: count of such nodes

27 CMU SCS ASONAM 2009C. Faloutsos 27 Triangle Law: #2 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles Notice: slope ~ degree exponent (insets)

28 CMU SCS C. Faloutsos (CMU) 28 CentralizedHadoop/PEG ASUS Degree Distr. old Pagerank old Diameter/ANF old DONE Conn. Comp old DONE TrianglesDONE VisualizationSTARTED Outline – Algorithms & results OpenCirrus'10

29 CMU SCS Visualization: ShiftR Supporting Ad Hoc Sensemaking: Integrating Cognitive, HCI, and Data Mining Approaches Aniket Kittur, Duen Horng (‘Polo’) Chau, Christos Faloutsos, Jason I. Hong Sensemaking Workshop at CHI 2009, April 4-5. Boston, MA, USA. OpenCirrus'10C. Faloutsos (CMU) 29

30 CMU SCS

31 C. Faloutsos (CMU) 31 Conclusions One-stop shopping for large graph mining: www.cs.cmu.edu/~pegasus OpenCirrus'10 Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tsourakakis, Babis THANKS: NSF, Yahoo (M45), LLNL


Download ppt "CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman."

Similar presentations


Ads by Google