Download presentation
Presentation is loading. Please wait.
Published byHerbert Day Modified over 9 years ago
1
Social Networks and Graph Mining Christos Faloutsos CMU - MLD
2
MLD-AB '072 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions
3
MLD-AB '073 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do viruses propagate? Problem#3: How to spot fraudsters in e-bay?
4
MLD-AB '074 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)
5
MLD-AB '075 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]
6
MLD-AB '076 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection....
7
MLD-AB '077 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What constitutes a ‘normal’ social network? What is ‘normal’/‘abnormal’? which patterns/laws hold?
8
MLD-AB '078 Graph mining Are real graphs random?
9
MLD-AB '079 Laws and patterns NO!! Diameter in- and out- degree distributions other (surprising) patterns
10
MLD-AB '0710 Solution Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com
11
MLD-AB '0711 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?
12
MLD-AB '0712 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]
13
MLD-AB '0713 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman
14
MLD-AB '0714 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt
15
MLD-AB '0715 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’
16
MLD-AB '0716 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user
17
MLD-AB '0717 But: Q1: How about graphs from other domains? Q2: How about temporal evolution?
18
MLD-AB '0718 Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – sabb. @ CMU)
19
MLD-AB '0719 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?
20
MLD-AB '0720 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time –As the network grows the distances between nodes slowly decrease
21
MLD-AB '0721 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter
22
MLD-AB '0722 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter
23
MLD-AB '0723 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter
24
MLD-AB '0724 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter
25
MLD-AB '0725 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)
26
MLD-AB '0726 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’
27
MLD-AB '0727 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??
28
MLD-AB '0728 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69
29
MLD-AB '0729 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree
30
MLD-AB '0730 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2
31
MLD-AB '0731 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66
32
MLD-AB '0732 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18
33
MLD-AB '0733 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15
34
MLD-AB '0734 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions
35
MLD-AB '0735 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon?
36
MLD-AB '0736 The model: SIS ‘Flu’ like: Susceptible-Infected-Susceptible Virus ‘strength’ s= / Infected Healthy NN1 N3 N2 Prob. Prob. β Prob.
37
MLD-AB '0737 Epidemic threshold of a graph: the value of , such that if strength s = / < an epidemic can not happen Thus, given a graph compute its epidemic threshold
38
MLD-AB '0738 Epidemic threshold What should depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter?
39
MLD-AB '0739 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A
40
MLD-AB '0740 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A largest eigenvalue of adj. matrix A attack prob. recovery prob. epidemic threshold Proof: [Wang+03]
41
MLD-AB '0741 Experiments (Oregon) / > τ (above threshold) / = τ (at the threshold) / < τ (below threshold)
42
MLD-AB '0742 Outline Problem definition / Motivation Graphs and power laws [Virus propagation] [e-bay fraud detection] Conclusions
43
MLD-AB '0743 E-bay Fraud detection w/ Polo Chau, CMU
44
MLD-AB '0744 E-bay Fraud detection - NetProbe
45
MLD-AB '0745 Conclusions Graphs pose fascinating problems self-similarity/fractals and power laws work, when textbook methods fail! Need: ML/AI, Stat, NA, DB (Gb/Tb), Systems (Networks+), sociology, ++…
46
MLD-AB '0746 Contact info christos@cs.cmu.edu www.cs.cmu.edu/~christos
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.