Presentation is loading. Please wait.

Presentation is loading. Please wait.

Part 1: Graph Mining – patterns

Similar presentations


Presentation on theme: "Part 1: Graph Mining – patterns"— Presentation transcript:

1 Part 1: Graph Mining – patterns
(C) C. Faloutsos, 2017 Part 1: Graph Mining – patterns Christos Faloutsos CMU

2 Our goal: Open source system for mining huge graphs:
(C) C. Faloutsos, 2017 11/19/2018 Our goal: Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining System) code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017

3 (C) C. Faloutsos, 2017 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

4 Outline Introduction – Motivation Part#1: Patterns in graphs
Part#2: Tools (Ranking, proximity) Conclusions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

5 Graphs - why should we care?
(C) C. Faloutsos, 2017 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Friendship Network [Moody ’01] Protein Interactions [genomebiology.com] Tepper, CMU, April 4 (c) C. Faloutsos, 2017

6 Graphs - why should we care?
IR: bi-partite graphs (doc-terms) web: hyper-text graph ... and more: D1 DN T1 TM ... Tepper, CMU, April 4 (c) C. Faloutsos, 2017

7 Graphs - why should we care?
(C) C. Faloutsos, 2017 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: /IP traffic and anomaly detection .... Tepper, CMU, April 4 (c) C. Faloutsos, 2017

8 Outline Introduction – Motivation Patterns in graphs
Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Tepper, CMU, April 4 (c) C. Faloutsos, 2017

9 Network and graph mining
(C) C. Faloutsos, 2017 Network and graph mining How does the Internet look like? How does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

10 Network and graph mining
(C) C. Faloutsos, 2017 Network and graph mining How does the Internet look like? How does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? To spot anomalies (rarities), we have to discover patterns Tepper, CMU, April 4 (c) C. Faloutsos, 2017

11 Network and graph mining
(C) C. Faloutsos, 2017 Network and graph mining How does the Internet look like? How does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? To spot anomalies (rarities), we have to discover patterns Large datasets reveal patterns/anomalies that may be invisible otherwise… Tepper, CMU, April 4 (c) C. Faloutsos, 2017

12 Topology How does the Internet look like? Any rules?
(Looks random – right?) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

13 Graph mining Are real graphs random? Tepper, CMU, April 4
(C) C. Faloutsos, 2017 Graph mining Are real graphs random? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

14 Laws and patterns Are real graphs random? A: NO!!
(C) C. Faloutsos, 2017 Laws and patterns Are real graphs random? A: NO!! Diameter in- and out- degree distributions other (surprising) patterns So, let’s look at the data Tepper, CMU, April 4 (c) C. Faloutsos, 2017

15 Laws – degree distributions
Q: avg degree is ~2 - what is the most probable degree? count ?? 2 degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017

16 Laws – degree distributions
Q: avg degree is ~2 - what is the most probable degree? degree count ?? WRONG ! 2 2 degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017

17 Solution S1 .Power-law: outdegree O
Frequency Exponent = slope O = -2.15 -2.15 Nov’97 Outdegree The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

18 Solution# S.1’ Power law in the degree distribution [SIGCOMM99]
(C) C. Faloutsos, 2017 Solution# S.1’ Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) -0.82 ibm.com log(rank) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

19 Solution# S.2: Eigen Exponent E
Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue A2: power law in the eigenvalues of the adjacency matrix Tepper, CMU, April 4 (c) C. Faloutsos, 2017

20 Solution# S.2: Eigen Exponent E
Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent Tepper, CMU, April 4 (c) C. Faloutsos, 2017

21 But: How about graphs from other domains? Tepper, CMU, April 4
(C) C. Faloutsos, 2017 But: How about graphs from other domains? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

22 More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic
(C) C. Faloutsos, 2017 More power laws: web hit counts [w/ A. Montgomery] users sites Web Site Traffic Count (log scale) Zipf ``ebay’’ in-degree (log scale) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

23 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] count
(C) C. Faloutsos, 2017 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000-people user (out) degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017

24 And numerous more # of sexual contacts
Income [Pareto] –’80-20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) Size of files of a user ‘Black swans’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017

25 Outline Introduction – Motivation Patterns in graphs Generators
Patterns in Static graphs Degree Triangles Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

26 Solution# S.3: Triangle ‘Laws’
Real social networks have a lot of triangles Tepper, CMU, April 4 (c) C. Faloutsos, 2017

27 Solution# S.3: Triangle ‘Laws’
Real social networks have a lot of triangles Friends of friends are friends Any patterns? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

28 Triangle Law: #S.3 [Tsourakakis ICDM 2008]
HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

29 Triangle Law: #S.3 [Tsourakakis ICDM 2008]
HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

30 Triangle Law: #S.4 [Tsourakakis ICDM 2008]
Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

31 Outline Introduction – Motivation Patterns in graphs Generators
Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

32 Observations on weighted graphs?
(C) C. Faloutsos, 2017 Observations on weighted graphs? A: yes - even more ‘laws’! M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

33 Observation W.1: Fortification
(C) C. Faloutsos, 2017 Observation W.1: Fortification Q: How do the weights of nodes relate to degree? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

34 Observation W.1: Fortification
(C) C. Faloutsos, 2017 Observation W.1: Fortification More donors, more $ ? ‘Reagan’ $10 $5 ‘Clinton’ $7 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

35 Observation W.1: fortification: Snapshot Power Law
(C) C. Faloutsos, 2017 Observation W.1: fortification: Snapshot Power Law Weight: super-linear on in-degree exponent ‘iw’: 1.01 < iw < 1.26 Orgs-Candidates More donors, even more $ e.g. John Kerry, $10M received, from 1K donors In-weights ($) $10 $5 Edges (# donors) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

36 Outline Introduction – Motivation Patterns in graphs Generators
Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

37 Problem: Time evolution
(C) C. Faloutsos, 2017 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell – CMU) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

38 T.1 Evolution of the Diameter
(C) C. Faloutsos, 2017 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow Tepper, CMU, April 4 (c) C. Faloutsos, 2017

39 T.1 Evolution of the Diameter
(C) C. Faloutsos, 2017 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow Tepper, CMU, April 4 (c) C. Faloutsos, 2017

40 T.1 Diameter – “Patents” Patent citation network 25 years of data
(C) C. Faloutsos, 2017 T.1 Diameter – “Patents” diameter Patent citation network 25 years of data @1999 2.9 M nodes 16.5 M edges time [years] Tepper, CMU, April 4 (c) C. Faloutsos, 2017

41 T.2 Temporal Evolution of the Graphs
(C) C. Faloutsos, 2017 T.2 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

42 T.2 Temporal Evolution of the Graphs
(C) C. Faloutsos, 2017 T.2 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017

43 T.2 Densification – Patent Citations
(C) C. Faloutsos, 2017 T.2 Densification – Patent Citations Citations among patents granted @1999 2.9 M nodes 16.5 M edges Each year is a datapoint E(t) 1.66 N(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

44 Outline Introduction – Motivation Patterns in graphs Generators
Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

45 More on Time-evolving graphs
M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

46 Observation T.3: NLCC behavior
Q: How do NLCC’s emerge and join with the GCC? (``NLCC’’ = non-largest conn. components) Do they continue to grow in size? or do they shrink? or stabilize? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

47 Observation T.3: NLCC behavior
After the gelling point, the GCC takes off, but NLCC’s remain ~constant (actually, oscillate). IMDB CC size Time-stamp Tepper, CMU, April 4 (c) C. Faloutsos, 2017

48 Generalized Iterated Matrix Vector Multiplication (GIMV)
(C) C. Faloutsos, 2017 Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). Tepper, CMU, April 4 (c) C. Faloutsos, 2017

49 Example: GIM-V At Work Connected Components Count Size
Tepper, CMU, April 4 (c) C. Faloutsos, 2017

50 Example: GIM-V At Work Connected Components ~0.7B singleton nodes
Count ~0.7B singleton nodes Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

51 Example: GIM-V At Work Connected Components Count Size
Tepper, CMU, April 4 (c) C. Faloutsos, 2017

52 Example: GIM-V At Work Connected Components Count Size 300-size cmpt
Why? 1100-size cmpt X 65. Why? Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

53 financial-advice sites
Example: GIM-V At Work Connected Components Count suspicious financial-advice sites (not existing now) Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

54 after the gelling point
GIM-V At Work Connected Components over Time LinkedIn: 7.5M nodes and 58M edges Stable tail slope after the gelling point Tepper, CMU, April 4 (c) C. Faloutsos, 2017

55 Timing for Blogs with Mary McGlohon (CMU)
Jure Leskovec (CMU->Stanford) Natalie Glance (now at Google) Mat Hurst (now at MSR) [SDM’07] Tepper, CMU, April 4 (c) C. Faloutsos, 2017

56 T.4 : popularity over time
(C) C. Faloutsos, 2017 T.4 : popularity over time # in links lag: days after post 1 2 3 @t Post popularity drops-off – exponentially? @t + lag Tepper, CMU, April 4 (c) C. Faloutsos, 2017 56

57 T.4 : popularity over time
(C) C. Faloutsos, 2017 T.4 : popularity over time # in links (log) days after post (log) 1 2 3 Post popularity drops-off – exponentially? POWER LAW! Exponent? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 57

58 T.4 : popularity over time
(C) C. Faloutsos, 2017 T.4 : popularity over time # in links (log) -1.6 days after post (log) 1 2 3 Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 close to -1.5: Barabasi’s stack model and like the zero-crossings of a random walk Tepper, CMU, April 4 (c) C. Faloutsos, 2017 58

59 Conclusions (part1) MANY patterns in real graphs
Skewed degree distributions Small (and shrinking) diameter Power-laws wrt triangles Oscillating size of connected components … and more Tepper, CMU, April 4 (c) C. Faloutsos, 2017

60 (C) C. Faloutsos, 2017 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

61 References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award). Tepper, CMU, April 4 (c) C. Faloutsos, 2017

62 Project info www.cs.cmu.edu/~pegasus
(C) C. Faloutsos, 2017 11/19/2018 Project info Chau, Polo McGlohon, Mary Tsourakakis, Babis Akoglu, Leman Prakash, Aditya Tong, Hanghang Kang, U Thanks to: NSF IIS , IIS , CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP Tepper, CMU, April 4 (c) C. Faloutsos, 2017

63 Part1 END Tepper, CMU, April 4 (c) C. Faloutsos, 2017


Download ppt "Part 1: Graph Mining – patterns"

Similar presentations


Ads by Google