Download presentation
Presentation is loading. Please wait.
Published byBranden Hill Modified over 9 years ago
1
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU
2
CMU SCS IRDS'07C. Faloutsos 2 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
3
CMU SCS IRDS'07C. Faloutsos 3 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time
4
CMU SCS IRDS'07C. Faloutsos 4 Problem#1: Joint work with Dr. Deepayan Chakrabarti (CMU/Yahoo R.L.)
5
CMU SCS IRDS'07C. Faloutsos 5 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Protein Interactions [genomebiology.com] Friendship Network [Moody ’01]
6
CMU SCS IRDS'07C. Faloutsos 6 Graphs – why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph... and more: D1D1 DNDN T1T1 TMTM...
7
CMU SCS IRDS'07C. Faloutsos 7 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection....
8
CMU SCS IRDS'07C. Faloutsos 8 Problem #1 - network and graph mining How does the Internet look like? How does the web look like? What is ‘normal’/‘abnormal’? which patterns/laws hold?
9
CMU SCS IRDS'07C. Faloutsos 9 Graph mining Are real graphs random?
10
CMU SCS IRDS'07C. Faloutsos 10 Laws and patterns Are real graphs random? A: NO!! –Diameter –in- and out- degree distributions –other (surprising) patterns
11
CMU SCS IRDS'07C. Faloutsos 11 Solution#1 Power law in the degree distribution [SIGCOMM99] log(rank) log(degree) -0.82 internet domains att.com ibm.com
12
CMU SCS IRDS'07C. Faloutsos 12 Solution#1’: Eigen Exponent E A2: power law in the eigenvalues of the adjacency matrix E = -0.48 Exponent = slope Eigenvalue Rank of decreasing eigenvalue May 2001
13
CMU SCS IRDS'07C. Faloutsos 13 But: How about graphs from other domains?
14
CMU SCS IRDS'07C. Faloutsos 14 The Peer-to-Peer Topology Frequency versus degree Number of adjacent peers follows a power-law [Jovanovic+]
15
CMU SCS IRDS'07C. Faloutsos 15 More power laws: citation counts: (citeseer.nj.nec.com 6/2001) log(#citations) log(count) Ullman
16
CMU SCS IRDS'07C. Faloutsos 16 Swedish sex-web Nodes: people (Females; Males) Links: sexual relationships Liljeros et al. Nature 2001 4781 Swedes; 18-74; 59% response rate. Albert Laszlo Barabasi http://www.nd.edu/~networks/ Publication%20Categories/ 04%20Talks/2005-norway- 3hours.ppt
17
CMU SCS IRDS'07C. Faloutsos 17 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic log(in-degree) log(count) Zipf users sites ``ebay’’
18
CMU SCS IRDS'07C. Faloutsos 18 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] (out) degree count trusts-2000-people user
19
CMU SCS IRDS'07C. Faloutsos 19 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
20
CMU SCS IRDS'07C. Faloutsos 20 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time
21
CMU SCS IRDS'07C. Faloutsos 21 Problem#2: Time evolution with Jure Leskovec (CMU/MLD) and Jon Kleinberg (Cornell – sabb. @ CMU)
22
CMU SCS IRDS'07C. Faloutsos 22 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data?
23
CMU SCS IRDS'07C. Faloutsos 23 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: –diameter ~ O(log N) –diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time
24
CMU SCS IRDS'07C. Faloutsos 24 Diameter – ArXiv citation graph Citations among physics papers 1992 –2003 One graph per year time [years] diameter
25
CMU SCS IRDS'07C. Faloutsos 25 Diameter – “Autonomous Systems” Graph of Internet One graph per day 1997 – 2000 number of nodes diameter
26
CMU SCS IRDS'07C. Faloutsos 26 Diameter – “Affiliation Network” Graph of collaborations in physics – authors linked to papers 10 years of data time [years] diameter
27
CMU SCS IRDS'07C. Faloutsos 27 Diameter – “Patents” Patent citation network 25 years of data time [years] diameter
28
CMU SCS IRDS'07C. Faloutsos 28 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t)
29
CMU SCS IRDS'07C. Faloutsos 29 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! –But obeying the ``Densification Power Law’’
30
CMU SCS IRDS'07C. Faloutsos 30 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) ??
31
CMU SCS IRDS'07C. Faloutsos 31 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69
32
CMU SCS IRDS'07C. Faloutsos 32 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 1: tree
33
CMU SCS IRDS'07C. Faloutsos 33 Densification – Physics Citations Citations among physics papers 2003: –29,555 papers, 352,807 citations N(t) E(t) 1.69 clique: 2
34
CMU SCS IRDS'07C. Faloutsos 34 Densification – Patent Citations Citations among patents granted 1999 –2.9 million nodes –16.5 million edges Each year is a datapoint N(t) E(t) 1.66
35
CMU SCS IRDS'07C. Faloutsos 35 Densification – Autonomous Systems Graph of Internet 2000 –6,000 nodes –26,000 edges One graph per day N(t) E(t) 1.18
36
CMU SCS IRDS'07C. Faloutsos 36 Densification – Affiliation Network Authors linked to their publications 2002 –60,000 nodes 20,000 authors 38,000 papers –133,000 edges N(t) E(t) 1.15
37
CMU SCS IRDS'07C. Faloutsos 37 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
38
CMU SCS IRDS'07C. Faloutsos 38 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time
39
CMU SCS IRDS'07C. Faloutsos 39 Problem#3: Generation Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns
40
CMU SCS IRDS'07C. Faloutsos 40 Problem Definition Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns –Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter –Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters
41
CMU SCS IRDS'07C. Faloutsos 41 Problem Definition Given a growing graph with count of nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns Idea: Self-similarity –Leads to power laws –Communities within communities –…
42
CMU SCS IRDS'07C. Faloutsos 42 There are many obvious (but wrong) ways –Does not obey Densification Power Law –Has increasing diameter Kronecker Product is exactly what we need Recursive Graph Generation Initial graph Recursive expansion
43
CMU SCS IRDS'07C. Faloutsos 43 Adjacency matrix Kronecker Product – a Graph Intermediate stage Adjacency matrix
44
CMU SCS IRDS'07C. Faloutsos 44 Kronecker Product – a Graph Continuing multiplying with G 1 we obtain G 4 and so on … G 4 adjacency matrix
45
CMU SCS IRDS'07C. Faloutsos 45 Kronecker Graphs – Formally: We create the self-similar graphs recursively: –Start with a initiator graph G 1 on N 1 nodes and E 1 edges –The recursion will then product larger graphs G 2, G 3, …G k on N 1 k nodes –Since we want to obey Densification Power Law graph G k has to have E 1 k edges
46
CMU SCS IRDS'07C. Faloutsos 46 Kronecker Product – Definition The Kronecker product of matrices A and B is given by We define a Kronecker product of two graphs as a Kronecker product of their adjacency matrices N x MK x L N*K x M*L
47
CMU SCS IRDS'07C. Faloutsos 47 Kronecker Graphs We propose a growing sequence of graphs by iterating the Kronecker product Each Kronecker multiplication exponentially increases the size of the graph
48
CMU SCS IRDS'07C. Faloutsos 48 Kronecker Graphs – Intuition Intuition: –Recursive growth of graph communities –Nodes get expanded to micro communities –Nodes in sub-community link among themselves and to nodes from different communities
49
CMU SCS IRDS'07C. Faloutsos 49 Properties: We can PROVE that –Degree distribution is multinomial ~ power law –Diameter: constant –Eigenvalue distribution: multinomial –First eigenvector: multinomial See [Leskovec+, PKDD’05] for proofs
50
CMU SCS IRDS'07C. Faloutsos 50 Problem Definition Given a growing graph with nodes N 1, N 2, … Generate a realistic sequence of graphs that will obey all the patterns –Static Patterns Power Law Degree Distribution Power Law eigenvalue and eigenvector distribution Small Diameter –Dynamic Patterns Growth Power Law Shrinking/Stabilizing Diameters First and only generator for which we can prove all these properties
51
CMU SCS IRDS'07C. Faloutsos 51 Stochastic Kronecker Graphs Create N 1 N 1 probability matrix P 1 Compute the k th Kronecker power P k For each entry p uv of P k include an edge (u,v) with probability p uv 0.40.2 0.10.3 P1P1 Instance Matrix G 2 0.160.08 0.04 0.120.020.06 0.040.020.120.06 0.010.03 0.09 PkPk flip biased coins Kronecker multiplication skip
52
CMU SCS IRDS'07C. Faloutsos 52 Experiments How well can we match real graphs? –Arxiv: physics citations: 30,000 papers, 350,000 citations 10 years of data –U.S. Patent citation network 4 million patents, 16 million citations 37 years of data –Autonomous systems – graph of internet Single snapshot from January 2002 6,400 nodes, 26,000 edges We show both static and temporal patterns
53
CMU SCS IRDS'07C. Faloutsos 53 Arxiv – Degree Distribution degree count Real graph Deterministic Kronecker Stochastic Kronecker
54
CMU SCS IRDS'07C. Faloutsos 54 Arxiv – Scree Plot Rank Eigenvalue Real graph Deterministic Kronecker Stochastic Kronecker
55
CMU SCS IRDS'07C. Faloutsos 55 Arxiv – Densification Nodes(t) Edges Real graph Deterministic Kronecker Stochastic Kronecker
56
CMU SCS IRDS'07C. Faloutsos 56 Arxiv – Effective Diameter Nodes(t) Diameter Real graph Deterministic Kronecker Stochastic Kronecker
57
CMU SCS IRDS'07C. Faloutsos 57 Arxiv citation network
58
CMU SCS IRDS'07C. Faloutsos 58 U.S. Patent citations Static patternsTemporal patterns
59
CMU SCS IRDS'07C. Faloutsos 59 Autonomous Systems Static patterns
60
CMU SCS IRDS'07C. Faloutsos 60 (Q: how to fit the parm’s?) A: Stochastic version of Kronecker graphs + Max likelihood + Metropolis sampling [Leskovec+, ’07, under review]
61
CMU SCS IRDS'07C. Faloutsos 61 Experiments on real AS graph Degree distributionHop plot Network valueAdjacency matrix eigen values
62
CMU SCS IRDS'07C. Faloutsos 62 Conclusios Kronecker graphs have: –All the static properties Heavy tailed degree distributions Small diameter Multinomial eigenvalues and eigenvectors –All the temporal properties Densification Power Law Shrinking/Stabilizing Diameters –We can formally prove these results
63
CMU SCS IRDS'07C. Faloutsos 63 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
64
CMU SCS IRDS'07C. Faloutsos 64 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time
65
CMU SCS IRDS'07C. Faloutsos 65 Problem#4: MasterMind – ‘CePS’ w/ Hanghang Tong, KDD 2006 htong cs.cmu.edu
66
CMU SCS IRDS'07C. Faloutsos 66 Center-Piece Subgraph(Ceps) Given Q query nodes Find Center-piece ( ) App. –Social Networks –Law Inforcement, … Idea: –Proximity -> random walk with restarts
67
CMU SCS IRDS'07C. Faloutsos 67 Case Study: AND query R.AgrawalJiawei Han V.VapnikM.Jordan
68
CMU SCS IRDS'07C. Faloutsos 68 Case Study: AND query
69
CMU SCS IRDS'07C. Faloutsos 69 2_SoftAnd query ML/Statistics databases
70
CMU SCS IRDS'07C. Faloutsos 70 Conclusions Q1:How to measure the importance? A1: RWR+K_SoftAnd Q2: How to find connection subgraph? A2:”Extract” Alg. Q3:How to do it efficiently? A3:Graph Partition (Fast CePS) –~90% quality –6:1 speedup; 150x speedup (ICDM’06, b.p. award)
71
CMU SCS IRDS'07C. Faloutsos 71 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
72
CMU SCS IRDS'07C. Faloutsos 72 Motivation Data mining: ~ find patterns (rules, outliers) Problem#1: How do real graphs look like? Problem#2: How do they evolve? Problem#3: How to generate realistic graphs TOOLS Problem#4: Who is the ‘master-mind’? Problem#5: Track communities over time
73
CMU SCS IRDS'07C. Faloutsos 73 Tensors for time evolving graphs [Jimeng Sun+ KDD’06] [ “, SMD’07] [ CF, Kolda, Sun, SDM’07 tutorial]
74
CMU SCS IRDS'07C. Faloutsos 74 Social network analysis Static: find community structures DB A u t h o r s Keywords 1990
75
CMU SCS IRDS'07C. Faloutsos 75 Social network analysis Static: find community structures Dynamic: monitor community structure evolution; spot abnormal individuals; abnormal time-stamps
76
CMU SCS IRDS'07C. Faloutsos 76 DB DM Application 1: Multiway latent semantic indexing (LSI) DB 2004 1990 Michael Stonebreaker Query Pattern U keyword authors keyword U authors Projection matrices specify the clusters Core tensors give cluster activation level Philip Yu
77
CMU SCS IRDS'07C. Faloutsos 77 Bibliographic data (DBLP) Papers from VLDB and KDD conferences Construct 2nd order tensors with yearly windows with Each tensor: 4584 3741 11 timestamps (years)
78
CMU SCS IRDS'07C. Faloutsos 78 Multiway LSI AuthorsKeywordsYear michael carey, michael stonebreaker, h. jagadish, hector garcia-molina queri,parallel,optimization,concurr, objectorient 1995 surajit chaudhuri,mitch cherniack,michael stonebreaker,ugur etintemel distribut,systems,view,storage,servic,pr ocess,cache 2004 jiawei han,jian pei,philip s. yu, jianyong wang,charu c. aggarwal streams,pattern,support, cluster, index,gener,queri 2004 Two groups are correctly identified: Databases and Data mining People and concepts are drifting over time DM DB
79
CMU SCS IRDS'07C. Faloutsos 79 Application 2: Network Anomaly Detection Anomaly detection –Reconstruction error driven –Multiple resolution Data –TCP flows collected at CMU backbone –Raw data 500GB with compression –Construct 3 rd order tensors with hourly windows with –1200 timestamps (hours)
80
CMU SCS IRDS'07C. Faloutsos 80 with Hui Zhang Yinglian Xie (Vyas Sekar)
81
CMU SCS IRDS'07C. Faloutsos 81 destination source Network anomaly detection Identify when and where anomalies occurred. Prominent difference between normal and abnormal ones is mainly due to unusual scanning activity (confirmed by the campus admin). scanners Time (hour) destination source error AbnormalNormal
82
CMU SCS IRDS'07C. Faloutsos 82 Conclusions Tensor-based methods (WTA/DTA/STA): spot patterns and anomalies on time evolving graphs, and on streams (monitoring)
83
CMU SCS IRDS'07C. Faloutsos 83 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
84
CMU SCS IRDS'07C. Faloutsos 84 Virus propagation How do viruses/rumors propagate? Will a flu-like virus linger, or will it become extinct soon?
85
CMU SCS IRDS'07C. Faloutsos 85 The model: SIS ‘Flu’ like: Susceptible-Infected-Susceptible Virus ‘strength’ s= / Infected Healthy NN1 N3 N2 Prob. Prob. β Prob.
86
CMU SCS IRDS'07C. Faloutsos 86 Epidemic threshold of a graph: the value of , such that if strength s = / < an epidemic can not happen Thus, given a graph compute its epidemic threshold
87
CMU SCS IRDS'07C. Faloutsos 87 Epidemic threshold What should depend on? avg. degree? and/or highest degree? and/or variance of degree? and/or third moment of degree? and/or diameter?
88
CMU SCS IRDS'07C. Faloutsos 88 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A
89
CMU SCS IRDS'07C. Faloutsos 89 Epidemic threshold [Theorem] We have no epidemic, if β/δ <τ = 1/ λ 1,A largest eigenvalue of adj. matrix A attack prob. recovery prob. epidemic threshold Proof: [Wang+03]
90
CMU SCS IRDS'07C. Faloutsos 90 Experiments (Oregon) / > τ (above threshold) / = τ (at the threshold) / < τ (below threshold)
91
CMU SCS IRDS'07C. Faloutsos 91 Outline Problem definition / Motivation Static & dynamic laws; generators Tools: CenterPiece graphs; Tensors Other projects (Virus propagation, e-bay fraud detection) Conclusions
92
CMU SCS IRDS'07C. Faloutsos 92 E-bay Fraud detection w/ Polo Chau & Shashank Pandit, CMU
93
CMU SCS IRDS'07C. Faloutsos 93 E-bay Fraud detection - NetProbe
94
CMU SCS IRDS'07C. Faloutsos 94 OVERALL CONCLUSIONS Graphs pose a wealth of fascinating problems self-similarity and power laws work, when textbook methods fail! New patterns (shrinking diameter!) New generator: Kronecker
95
CMU SCS IRDS'07C. Faloutsos 95 ‘Philosophical’ observation Graph mining brings together: ML/AI / IR Stat, Num. analysis, DB (Gb/Tb), Systems (Networks+), sociology, ++…
96
CMU SCS IRDS'07C. Faloutsos 96 References Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan Fast Random Walk with Restart and Its Applications ICDM 2006, Hong Kong.Fast Random Walk with Restart and Its Applications Hanghang Tong, Christos Faloutsos Center-Piece Subgraphs: Problem Definition and Fast Solutions, KDD 2006, Philadelphia, PACenter-Piece Subgraphs: Problem Definition and Fast Solutions,
97
CMU SCS IRDS'07C. Faloutsos 97 References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations KDD 2005, Chicago, IL. ("Best Research Paper" award).Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker Multiplication (ECML/PKDD 2005), Porto, Portugal, 2005.Realistic, Mathematically Tractable Graph Generation and Evolution, Using Kronecker MultiplicationECML/PKDD 2005
98
CMU SCS IRDS'07C. Faloutsos 98 References Jimeng Sun, Dacheng Tao, Christos Faloutsos Beyond Streams and Graphs: Dynamic Tensor Analysis, KDD 2006, Philadelphia, PA Beyond Streams and Graphs: Dynamic Tensor Analysis, Jimeng Sun, Yinglian Xie, Hui Zhang, Christos Faloutsos. Less is More: Compact Matrix Decomposition for Large Sparse Graphs, SDM, Minneapolis, Minnesota, Apr 2007. [pdf]pdf
99
CMU SCS IRDS'07C. Faloutsos 99 Contact info: WeH 7107 {christos, htong, jimeng, jure} cs.cmu.edu www. cs.cmu.edu /~christos (w/ papers, datasets, code, etc)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.