Download presentation
Presentation is loading. Please wait.
Published byChristian Hines Modified over 9 years ago
1
CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec, Jure Kang, U
2
CMU SCS Yahoo/Hadoop, 2008 2 Our goal: One-stop solution for mining huge graphs
3
CMU SCS Yahoo/Hadoop, 2008 3 CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)
4
CMU SCS Yahoo/Hadoop, 2008 4 100 machines - 8min Degree Distributions - NetFlix Movie in-degree count
5
CMU SCS Yahoo/Hadoop, 2008 5 100 machines - 8min Degree Distributions - NetFlix Movie in-degree count Theoretically expected
6
CMU SCS Yahoo/Hadoop, 2008 6 100 machines - 8min Degree Distributions - NetFlix User out-degree count
7
CMU SCS Yahoo/Hadoop, 2008 7 100 machines - 8min Degree Distributions - NetFlix User out-degree count Theoretically expected Sharp drop below 100 ratings
8
CMU SCS Yahoo/Hadoop, 2008 8 Nodes:259M - Edges: 1B 100 machines - 6h Degree Distributions - Kronecker degree count
9
CMU SCS Yahoo/Hadoop, 2008 9 Degree Distributions - timings Edge file size (MB) Time (sec) 1 task 24 tasks 48 tasks
10
CMU SCS Yahoo/Hadoop, 2008 10 CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)
11
CMU SCS Yahoo/Hadoop, 2008 11 Diameter of a graph Maximum shortest path Normally, > O(N**2) ANF : `Approximate Neighborhood function’ [Palmer+02]: O(E) Goal : calculate neighborhood function Neighborhood N(h) : number of pairs of nodes within distance h Diameter
12
CMU SCS Yahoo/Hadoop, 2008 12 For large jobs, parallelization helps Unstable results due to shared machines Diameter Edge file (MB) Time (min) 1 node 48 nodes 28 nodes
13
CMU SCS Yahoo/Hadoop, 2008 13 Diameter / Hop Plot (Netflix) h: # of hops # of reachable pairs within <= h hops
14
CMU SCS Yahoo/Hadoop, 2008 14 Diameter / Hop Plot (Netflix) h: # of hops # of reachable pairs within <= h hops Diameter: 3
15
CMU SCS Yahoo/Hadoop, 2008 15 CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)
16
CMU SCS Yahoo/Hadoop, 2008 16 Community detection Cross associations [Chakrabarti+ ’04]
17
CMU SCS Yahoo/Hadoop, 2008 17 Community detection
18
CMU SCS Yahoo/Hadoop, 2008 18 CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)
19
CMU SCS Yahoo/Hadoop, 2008 19 Triangles ‘friends of friends are friends’
20
CMU SCS Yahoo/Hadoop, 2008 20 Triangles ‘friends of friends are friends’
21
CMU SCS Yahoo/Hadoop, 2008 21 Triangles ‘friends of friends are friends’ Naïve algo: 3-way join (slow) [Tsourakakis’08]: # triangles ~ sum of cubes of eigenvalues Thus, super-fast computation of #triangles (100x - 25,000x faster than naïve; >95% accuracy
22
CMU SCS Yahoo/Hadoop, 2008 22 Triangles Easy to implement on hadoop: it only needs eigenvalues (to do, with Lanczos)
23
CMU SCS Yahoo/Hadoop, 2008 23 CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Outline Datasets: (a) Synthetic (‘Kronecker’, ~300M nodes, 1B edges) (b) NetFlix (20K movies, ~500K users, 100M edges)
24
CMU SCS Yahoo/Hadoop, 2008 24 Visualization Principled visualization of large graphs (show few most `important’ edges)
25
CMU SCS Yahoo/Hadoop, 2008 25 CentralizedHadoop Degree Distributionold Pagerankold Diameter/ANFoldX CommunitiesoldX TrianglesXtodo VisualizationXtodo Summary Goal: one-stop solution for mining huge graphs
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.