Part 1: Graph Mining – patterns

Slides:



Advertisements
Similar presentations
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Advertisements

1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
Modeling Blog Dynamics Speaker: Michaela Götz Joint work with: Jure Leskovec, Mary McGlohon, Christos Faloutsos Cornell University Carnegie Mellon University.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
CMU SCS Mining Billion-node Graphs Christos Faloutsos CMU.
Weighted Graphs and Disconnected Components Patterns and a Generator Mary McGlohon, Leman Akoglu, Christos Faloutsos Carnegie Mellon University School.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Graph Mining and Influence Propagation Christos Faloutsos CMU.
CMU SCS Yahoo/Hadoop, 2008#1 Peta-Graph Mining Christos Faloutsos Prakash, Aditya Shringarpure, Suyash Tsourakakis, Charalampos Appel, Ana Chau, Polo Leskovec,
CMU SCS Multimedia and Graph mining Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #28: Graph mining - patterns Christos Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
Measurement and Evolution of Online Social Networks Review of paper by Ophir Gaathon Analysis of Social Information Networks COMS , Spring 2011,
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
CMU SCS Big (graph) data analytics Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Mining Billion-Node Graphs: Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
Butterfly model slides. Topological Model: “Butterfly” Objective: Develop model to help explain behavioral mechanisms that cause observed properties,
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS Graph Mining Christos Faloutsos CMU. CMU SCS iCAST, Jan. 09C. Faloutsos 2 Thank you! Prof. Hsing-Kuo Kenneth Pao Eric, Morgan, Ian, Teenet.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Graph Models Class Algorithmic Methods of Data Mining
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
NetMine: Mining Tools for Large Graphs
Lecture 13 Network evolution
15-826: Multimedia Databases and Data Mining
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Dynamics of Real-world Networks
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Algorithms for Large Graph Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Part 1: Graph Mining – patterns (C) C. Faloutsos, 2017 Part 1: Graph Mining – patterns Christos Faloutsos CMU

Our goal: Open source system for mining huge graphs: (C) C. Faloutsos, 2017 11/19/2018 Our goal: Open source system for mining huge graphs: PEGASUS project (PEta GrAph mining System) www.cs.cmu.edu/~pegasus code and papers Tepper, CMU, April 4 (c) C. Faloutsos, 2017

(C) C. Faloutsos, 2017 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 http://www.morganclaypool.com/doi/abs/10.2200/S00449ED1V01Y201209DMK006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Outline Introduction – Motivation Part#1: Patterns in graphs Part#2: Tools (Ranking, proximity) Conclusions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Graphs - why should we care? (C) C. Faloutsos, 2017 Graphs - why should we care? Internet Map [lumeta.com] Food Web [Martinez ’91] Friendship Network [Moody ’01] Protein Interactions [genomebiology.com] Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Graphs - why should we care? IR: bi-partite graphs (doc-terms) web: hyper-text graph ... and more: D1 DN T1 TM ... Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Graphs - why should we care? (C) C. Faloutsos, 2017 Graphs - why should we care? network of companies & board-of-directors members ‘viral’ marketing web-log (‘blog’) news propagation computer network security: email/IP traffic and anomaly detection .... Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Outline Introduction – Motivation Patterns in graphs Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Network and graph mining (C) C. Faloutsos, 2017 Network and graph mining How does the Internet look like? How does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Network and graph mining (C) C. Faloutsos, 2017 Network and graph mining How does the Internet look like? How does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? To spot anomalies (rarities), we have to discover patterns Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Network and graph mining (C) C. Faloutsos, 2017 Network and graph mining How does the Internet look like? How does FaceBook look like? What is ‘normal’/‘abnormal’? which patterns/laws hold? To spot anomalies (rarities), we have to discover patterns Large datasets reveal patterns/anomalies that may be invisible otherwise… Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Topology How does the Internet look like? Any rules? (Looks random – right?) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Graph mining Are real graphs random? Tepper, CMU, April 4 (C) C. Faloutsos, 2017 Graph mining Are real graphs random? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Laws and patterns Are real graphs random? A: NO!! (C) C. Faloutsos, 2017 Laws and patterns Are real graphs random? A: NO!! Diameter in- and out- degree distributions other (surprising) patterns So, let’s look at the data Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Laws – degree distributions Q: avg degree is ~2 - what is the most probable degree? count ?? 2 degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Laws – degree distributions Q: avg degree is ~2 - what is the most probable degree? degree count ?? WRONG ! 2 2 degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Solution S1 .Power-law: outdegree O Frequency Exponent = slope O = -2.15 -2.15 Nov’97 Outdegree The plot is linear in log-log scale [FFF’99] freq = degree (-2.15) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Solution# S.1’ Power law in the degree distribution [SIGCOMM99] (C) C. Faloutsos, 2017 Solution# S.1’ Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) -0.82 ibm.com log(rank) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Solution# S.2: Eigen Exponent E Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue A2: power law in the eigenvalues of the adjacency matrix Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Solution# S.2: Eigen Exponent E Eigenvalue Exponent = slope E = -0.48 May 2001 Rank of decreasing eigenvalue [Mihail, Papadimitriou ’02]: slope is ½ of rank exponent Tepper, CMU, April 4 (c) C. Faloutsos, 2017

But: How about graphs from other domains? Tepper, CMU, April 4 (C) C. Faloutsos, 2017 But: How about graphs from other domains? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

More power laws: web hit counts [w/ A. Montgomery] Web Site Traffic (C) C. Faloutsos, 2017 More power laws: web hit counts [w/ A. Montgomery] users sites Web Site Traffic Count (log scale) Zipf ``ebay’’ in-degree (log scale) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] count (C) C. Faloutsos, 2017 epinions.com who-trusts-whom [Richardson + Domingos, KDD 2001] count trusts-2000-people user (out) degree Tepper, CMU, April 4 (c) C. Faloutsos, 2017

And numerous more # of sexual contacts Income [Pareto] –’80-20 distribution’ Duration of downloads [Bestavros+] Duration of UNIX jobs (‘mice and elephants’) Size of files of a user … ‘Black swans’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Outline Introduction – Motivation Patterns in graphs Generators Patterns in Static graphs Degree Triangles … Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Solution# S.3: Triangle ‘Laws’ Real social networks have a lot of triangles Friends of friends are friends Any patterns? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Triangle Law: #S.3 [Tsourakakis ICDM 2008] HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Triangle Law: #S.3 [Tsourakakis ICDM 2008] HEP-TH ASN X-axis: # of Triangles a node participates in Y-axis: count of such nodes Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Triangle Law: #S.4 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles Epinions Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Outline Introduction – Motivation Patterns in graphs Generators Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Observations on weighted graphs? (C) C. Faloutsos, 2017 Observations on weighted graphs? A: yes - even more ‘laws’! M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Observation W.1: Fortification (C) C. Faloutsos, 2017 Observation W.1: Fortification Q: How do the weights of nodes relate to degree? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Observation W.1: Fortification (C) C. Faloutsos, 2017 Observation W.1: Fortification More donors, more $ ? ‘Reagan’ $10 $5 ‘Clinton’ $7 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Observation W.1: fortification: Snapshot Power Law (C) C. Faloutsos, 2017 Observation W.1: fortification: Snapshot Power Law Weight: super-linear on in-degree exponent ‘iw’: 1.01 < iw < 1.26 Orgs-Candidates More donors, even more $ e.g. John Kerry, $10M received, from 1K donors In-weights ($) $10 $5 Edges (# donors) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Outline Introduction – Motivation Patterns in graphs Generators Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Problem: Time evolution (C) C. Faloutsos, 2017 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell – sabb. @ CMU) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.1 Evolution of the Diameter (C) C. Faloutsos, 2017 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.1 Evolution of the Diameter (C) C. Faloutsos, 2017 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.1 Diameter – “Patents” Patent citation network 25 years of data (C) C. Faloutsos, 2017 T.1 Diameter – “Patents” diameter Patent citation network 25 years of data @1999 2.9 M nodes 16.5 M edges time [years] Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.2 Temporal Evolution of the Graphs (C) C. Faloutsos, 2017 T.2 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.2 Temporal Evolution of the Graphs (C) C. Faloutsos, 2017 T.2 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’ Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.2 Densification – Patent Citations (C) C. Faloutsos, 2017 T.2 Densification – Patent Citations Citations among patents granted @1999 2.9 M nodes 16.5 M edges Each year is a datapoint E(t) 1.66 N(t) Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Outline Introduction – Motivation Patterns in graphs Generators Patterns in Static graphs Patterns in Weighted graphs Patterns in Time evolving graphs Generators Tepper, CMU, April 4 (c) C. Faloutsos, 2017

More on Time-evolving graphs M. McGlohon, L. Akoglu, and C. Faloutsos Weighted Graphs and Disconnected Components: Patterns and a Generator. SIG-KDD 2008 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Observation T.3: NLCC behavior Q: How do NLCC’s emerge and join with the GCC? (``NLCC’’ = non-largest conn. components) Do they continue to grow in size? or do they shrink? or stabilize? Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Observation T.3: NLCC behavior After the gelling point, the GCC takes off, but NLCC’s remain ~constant (actually, oscillate). IMDB CC size Time-stamp Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Generalized Iterated Matrix Vector Multiplication (GIMV) (C) C. Faloutsos, 2017 Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Example: GIM-V At Work Connected Components Count Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Example: GIM-V At Work Connected Components ~0.7B singleton nodes Count ~0.7B singleton nodes Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Example: GIM-V At Work Connected Components Count Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Example: GIM-V At Work Connected Components Count Size 300-size cmpt Why? 1100-size cmpt X 65. Why? Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

financial-advice sites Example: GIM-V At Work Connected Components Count suspicious financial-advice sites (not existing now) Size Tepper, CMU, April 4 (c) C. Faloutsos, 2017

after the gelling point GIM-V At Work Connected Components over Time LinkedIn: 7.5M nodes and 58M edges Stable tail slope after the gelling point Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Timing for Blogs with Mary McGlohon (CMU) Jure Leskovec (CMU->Stanford) Natalie Glance (now at Google) Mat Hurst (now at MSR) [SDM’07] Tepper, CMU, April 4 (c) C. Faloutsos, 2017

T.4 : popularity over time (C) C. Faloutsos, 2017 T.4 : popularity over time # in links lag: days after post 1 2 3 @t Post popularity drops-off – exponentially? @t + lag Tepper, CMU, April 4 (c) C. Faloutsos, 2017 56

T.4 : popularity over time (C) C. Faloutsos, 2017 T.4 : popularity over time # in links (log) days after post (log) 1 2 3 Post popularity drops-off – exponentially? POWER LAW! Exponent? Tepper, CMU, April 4 (c) C. Faloutsos, 2017 57

T.4 : popularity over time (C) C. Faloutsos, 2017 T.4 : popularity over time # in links (log) -1.6 days after post (log) 1 2 3 Post popularity drops-off – exponentially? POWER LAW! Exponent? -1.6 close to -1.5: Barabasi’s stack model and like the zero-crossings of a random walk Tepper, CMU, April 4 (c) C. Faloutsos, 2017 58

Conclusions (part1) MANY patterns in real graphs Skewed degree distributions Small (and shrinking) diameter Power-laws wrt triangles Oscillating size of connected components … and more Tepper, CMU, April 4 (c) C. Faloutsos, 2017

(C) C. Faloutsos, 2017 References D. Chakrabarti, C. Faloutsos: Graph Mining – Laws, Tools and Case Studies, Morgan Claypool 2012 http://www.morganclaypool.com/doi/abs/10.2200/S00449ED1V01Y201209DMK006 Tepper, CMU, April 4 (c) C. Faloutsos, 2017

References Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award). Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Project info www.cs.cmu.edu/~pegasus (C) C. Faloutsos, 2017 11/19/2018 Project info www.cs.cmu.edu/~pegasus Chau, Polo McGlohon, Mary Tsourakakis, Babis Akoglu, Leman Prakash, Aditya Tong, Hanghang Kang, U Thanks to: NSF IIS-0705359, IIS-0534205, CTA-INARC; Yahoo (M45), LLNL, IBM, SPRINT, INTEL, HP Tepper, CMU, April 4 (c) C. Faloutsos, 2017

Part1 END Tepper, CMU, April 4 (c) C. Faloutsos, 2017