Download presentation
Presentation is loading. Please wait.
Published byErin Lambert Modified over 9 years ago
1
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU
2
CMU SCS Thank you The Department of Informatics Happy 20-th! Prof. Yannis Manolopoulos Prof. Kostas Tsichlas Mrs. Nina Daltsidou AUTH, May 30, 2012C. Faloutsos (CMU) 2
3
CMU SCS International-caliber friends among AUTH alumni Prof. Evimaria Terzi (U. Boston) Prof. Kyriakos Mouratidis (SMU) Dr. Michalis Vlachos (IBM) … AUTH, May 30, 2012C. Faloutsos (CMU) 3
4
CMU SCS C. Faloutsos (CMU) 4 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012
5
CMU SCS C. Faloutsos (CMU) 5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] AUTH, May 30, 2012 $10s of BILLIONS revenue >500M users
6
CMU SCS C. Faloutsos (CMU) 6 Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM... AUTH, May 30, 2012
7
CMU SCS C. Faloutsos (CMU) 7 Graphs - why should we care? web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection.... [subject-verb-object: graph] Graph == relational table with 2 columns (src, dst) BIG DATA – big graphs AUTH, May 30, 2012
8
CMU SCS C. Faloutsos (CMU) 8 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs –Weighted graphs –Time evolving graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012
9
CMU SCS C. Faloutsos (CMU) 9 Problem #1 - network and graph mining What does the Internet look like? What does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? AUTH, May 30, 2012
10
CMU SCS C. Faloutsos (CMU) 10 Graph mining Are real graphs random? AUTH, May 30, 2012
11
CMU SCS C. Faloutsos (CMU) 11 Laws and patterns Are real graphs random? A: NO!! –Diameter –in- and out- degree distributions –other (surprising) patterns So, let’s look at the data AUTH, May 30, 2012
12
CMU SCS C. Faloutsos (CMU) 12 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) internet domains att.com ibm.com AUTH, May 30, 2012
13
CMU SCS C. Faloutsos (CMU) 13 Solution# S.1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com AUTH, May 30, 2012
14
CMU SCS C. Faloutsos (CMU) 14 But: How about graphs from other domains? AUTH, May 30, 2012
15
CMU SCS C. Faloutsos (CMU) 15 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic in-degree (log scale) Count (log scale) Zipf users sites ``ebay’’ AUTH, May 30, 2012
16
CMU SCS And numerous more Who-trusts-whom (epinions.com) Income [Pareto] –’80-20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) Size of files of a user … ‘Black swans’ AUTH, May 30, 2012C. Faloutsos (CMU) 16
17
CMU SCS C. Faloutsos (CMU) 17 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs degree, diameter, eigen, Triangles –Time evolving graphs Problem#2: Tools AUTH, May 30, 2012
18
CMU SCS C. Faloutsos (CMU) 18 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles AUTH, May 30, 2012
19
CMU SCS C. Faloutsos (CMU) 19 Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles –Friends of friends are friends Any patterns? AUTH, May 30, 2012
20
CMU SCS C. Faloutsos (CMU) 20 Triangle Law: #S.3 [Tsourakakis ICDM 2008] SNReuters Epinions X-axis: degree Y-axis: mean # triangles n friends -> ~n 1.6 triangles AUTH, May 30, 2012
21
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 21 AUTH, May 30, 2012 21 C. Faloutsos (CMU)
22
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 22 AUTH, May 30, 2012 22 C. Faloutsos (CMU)
23
CMU SCS Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] 23 AUTH, May 30, 2012 23 C. Faloutsos (CMU)
24
CMU SCS C. Faloutsos (CMU) 24 Outline Introduction – Motivation Problem#1: Patterns in graphs –Static graphs –Time evolving graphs Problem#2: Tools … AUTH, May 30, 2012
25
CMU SCS C. Faloutsos (CMU) 25 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell – sabb. @ CMU) AUTH, May 30, 2012
26
CMU SCS C. Faloutsos (CMU) 26 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? AUTH, May 30, 2012
27
CMU SCS C. Faloutsos (CMU) 27 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time AUTH, May 30, 2012
28
CMU SCS C. Faloutsos (CMU) 28 T.1 Diameter – “Patents” Patent citation network 25 years of data @1999 –2.9 M nodes –16.5 M edges time [years] diameter AUTH, May 30, 2012
29
CMU SCS C. Faloutsos (CMU) 29 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools –Belief Propagation Problem#3: Scalability Conclusions AUTH, May 30, 2012
30
CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 30 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU [www’07]
31
CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 31 E-bay Fraud detection
32
CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 32 E-bay Fraud detection
33
CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 33 E-bay Fraud detection - NetProbe
34
CMU SCS Popular press And less desirable attention: E-mail from ‘Belgium police’ (‘copy of your code?’) AUTH, May 30, 2012C. Faloutsos (CMU) 34
35
CMU SCS C. Faloutsos (CMU) 35 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability -PEGASUS Conclusions AUTH, May 30, 2012
36
CMU SCS AUTH, May 30, 2012C. Faloutsos (CMU) 36 Scalability Google: > 450,000 processors in clusters of ~2000 processors each [ Barroso, Dean, Hölzle, “Web Search for a Planet: The Google Cluster Architecture” IEEE Micro 2003 ] Yahoo: 5Pb of data [Fayyad, KDD’07] Problem: machine failures, on a daily basis How to parallelize data mining tasks, then? A: map/reduce – hadoop (open-source clone) http://hadoop.apache.org/ http://hadoop.apache.org/
37
CMU SCS C. Faloutsos (CMU) 37 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability –PEGASUS –Radius plot Conclusions AUTH, May 30, 2012
38
CMU SCS HADI for diameter estimation Radius Plots for Mining Tera-byte Scale Graphs U Kang, Charalampos Tsourakakis, Ana Paula Appel, Christos Faloutsos, Jure Leskovec, SDM’10 Naively: diameter needs O(N**2) space and up to O(N**3) time – prohibitive (N~1B) Our HADI: linear on E (~10B) –Near-linear scalability wrt # machines –Several optimizations -> 5x faster C. Faloutsos (CMU) 38 AUTH, May 30, 2012
39
CMU SCS ???? 19+ [Barabasi+] 39 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ~1999, ~1M nodes
40
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? 19+ [Barabasi+] 40 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ?? ~1999, ~1M nodes
41
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Largest publicly available graph ever studied. ???? 19+? [Barabasi+] 41 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 14 (dir.) ~7 (undir.)
42
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) 7 degrees of separation (!) Diameter: shrunk ???? 19+? [Barabasi+] 42 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 14 (dir.) ~7 (undir.)
43
CMU SCS YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) Q: Shape? ???? 43 C. Faloutsos (CMU) Radius Count AUTH, May 30, 2012 ~7 (undir.)
44
CMU SCS 44 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality (?!) AUTH, May 30, 2012
45
CMU SCS Radius Plot of GCC of YahooWeb. 45 C. Faloutsos (CMU)AUTH, May 30, 2012
46
CMU SCS 46 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012
47
CMU SCS 47 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012 EN ~7 Conjecture: DE BR
48
CMU SCS 48 C. Faloutsos (CMU) YahooWeb graph (120Gb, 1.4B nodes, 6.6 B edges) effective diameter: surprisingly small. Multi-modality: probably mixture of cores. AUTH, May 30, 2012 ~7 Conjecture:
49
CMU SCS C. Faloutsos (CMU) 49 Outline Introduction – Motivation Problem#1: Patterns in graphs Problem#2: Tools Problem#3: Scalability Conclusions AUTH, May 30, 2012
50
CMU SCS C. Faloutsos (CMU) 50 OVERALL CONCLUSIONS – low level: Several new patterns (shrinking diameters, triangle-laws, etc) New tools: –Fraud detection (belief propagation) Scalability: PEGASUS / hadoop AUTH, May 30, 2012
51
CMU SCS C. Faloutsos (CMU) 51 OVERALL CONCLUSIONS – medium-level BIG DATA: Large datasets reveal patterns/outliers that are invisible otherwise AUTH, May 30, 2012
52
CMU SCS C. Faloutsos (CMU) 52 Project info Akoglu, Leman Chau, Polo Kang, U McGlohon, Mary Tong, Hanghang Prakash, Aditya AUTH, May 30, 2012 Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC ; Yahoo (M45), LLNL, IBM, SPRINT, Google, INTEL, HP, iLab www.cs.cmu.edu/~pegasus Koutra, Danae
53
CMU SCS Thank you for the honor! Congratulations for 20-th anniversary and… AUTH, May 30, 2012C. Faloutsos (CMU) 53
54
CMU SCS High-level conclusion: Collaborations Sociology + CS (triangles) Civil engineering + CS (sensor placement) fMRI/medical + graphs (medical db’s) … AUTH, May 30, 2012C. Faloutsos (CMU) 54
55
CMU SCS Never stop learning AUTH, May 30, 2012C. Faloutsos (CMU) 55 Socrates Plato Aristotle
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.