Graph and Tensor Mining for fun and profit

Slides:



Advertisements
Similar presentations
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Advertisements

CMU SCS Large Graph Mining - Patterns, tools and cascade analysis Christos Faloutsos CMU.
1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
Jurij Leskovec, CMU Jon Kleinberg, Cornell Christos Faloutsos, CMU
1 Realistic Graph Generation and Evolution Using Kronecker Multiplication Jurij Leskovec, CMU Deepay Chakrabarti, CMU/Yahoo Jon Kleinberg, Cornell Christos.
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
CMU SCS I2.2 Large Scale Information Network Processing INARC 1 Overview Goal: scalable algorithms to find patterns and anomalies on graphs 1. Mining Large.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #26: Graph mining - patterns Christos Faloutsos.
Lecture 21 Network evolution Slides are modified from Jurij Leskovec, Jon Kleinberg and Christos Faloutsos.
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
CMU SCS Mining Billion-Node Graphs - Patterns and Algorithms Christos Faloutsos CMU.
CS728 Lecture 5 Generative Graph Models and the Web.
Modeling Real Graphs using Kronecker Multiplication
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
CMU SCS KDD 2006Leskovec & Faloutsos1 ??. CMU SCS KDD 2006Leskovec & Faloutsos2 Sampling from Large Graphs poster# 305 Jurij (Jure) Leskovec Christos.
CMU SCS Mining Large Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos Carnegie Mellon University.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
CMU SCS Bio-informatics, Graph and Stream mining Christos Faloutsos CMU.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS Large Graph Mining – Patterns, Tools and Cascade analysis Christos Faloutsos CMU.
CMU SCS : Multimedia Databases and Data Mining Lecture #28: Graph mining - patterns Christos Faloutsos.
CMU SCS Data Mining in Streams and Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining - Patterns, Tools and Cascade Analysis Christos Faloutsos CMU.
Weighted Graphs and Disconnected Components Patterns and a Generator IDB Lab 현근수 In KDD 08. Mary McGlohon, Leman Akoglu, Christos Faloutsos.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec PhD: Machine Learning Department, CMU Now: Computer Science Department, Stanford University.
CMU SCS Mining Billion-node Graphs: Patterns, Generators and Tools Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
CMU SCS Mining Billion-Node Graphs Christos Faloutsos CMU.
CMU SCS Mining Billion-Node Graphs: Patterns and Algorithms Christos Faloutsos CMU.
CMU SCS Graph Mining - surprising patterns in real graphs Christos Faloutsos CMU.
CMU SCS Mining Billion Node Graphs Christos Faloutsos CMU.
CMU SCS Large Graph Mining Christos Faloutsos CMU.
CMU SCS Mining Large Graphs: Fraud Detection, and Algorithms Christos Faloutsos CMU.
Du, Faloutsos, Wang, Akoglu Large Human Communication Networks Patterns and a Utility-Driven Generator Nan Du 1,2, Christos Faloutsos 2, Bai Wang 1, Leman.
CMU SCS Graph Mining: patterns and tools for static and time-evolving graphs Christos Faloutsos CMU.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
CMU SCS Patterns, Anomalies, and Fraud Detection in Large Graphs Christos Faloutsos CMU.
CMU SCS Mining Large Social Networks: Patterns and Anomalies Christos Faloutsos CMU.
CMU SCS Graph Mining: Laws, Generators and Tools Christos Faloutsos CMU.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
CMU SCS Anomaly Detection in Large Graphs Christos Faloutsos CMU.
Dynamics of Real-world Networks
Graph Models Class Algorithmic Methods of Data Mining
Anomaly detection in large graphs
Modeling networks using Kronecker multiplication
Anomaly detection in large graphs
NetMine: Mining Tools for Large Graphs
Graph and Tensor Mining for fun and profit
Large Graph Mining: Power Tools and a Practitioner’s guide
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
15-826: Multimedia Databases and Data Mining
R-MAT: A Recursive Model for Graph Mining
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Dynamics of Real-world Networks
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
Algorithms for Large Graph Mining
Large Graph Mining: Power Tools and a Practitioner’s guide
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

Graph and Tensor Mining for fun and profit Faloutsos Graph and Tensor Mining for fun and profit Luna Dong, Christos Faloutsos Andrey Kan, Jun Ma, Subho Mukherjee

Roadmap Introduction – Motivation Part#1: Graphs Faloutsos Roadmap Introduction – Motivation Part#1: Graphs Part#2: Tensors and Knowledge Bases Conclusions KDD 2018 Dong+

Roadmap Introduction – Motivation Part#1: Graphs Faloutsos Roadmap Introduction – Motivation Part#1: Graphs P1.1: properties/patterns in graphs P1.2: node importance P1.3: community detection P1.4: fraud/anomaly detection P1.5: belief propagation ? KDD 2018 Dong+

Why care about patterns? KDD 2018 Dong+

Why care about patterns? Anomalies Faster algorithms Graph generators (‘what if’ scenarios) KDD 2018 Dong+

Why care about patterns? Anomalies Faster algorithms Graph generators (‘what if’ scenarios) Patterns anomalies KDD 2018 Dong+

Why care about patterns? Anomalies Faster algorithms Graph generators (‘what if’ scenarios) Patterns anomalies KDD 2018 Dong+

Why care about patterns? Anomalies Faster algorithms Graph generators (‘what if’ scenarios) Graph500.org Patterns anomalies KDD 2018 Dong+

‘Recipe’ Structure: Problem definition Short answer/solution LONG answer – details Conclusion/short-answer KDD 2018 Dong+

Problem definition Are real graphs random? S*: what do static graphs look like? T*: how do graphs evolve over time? KDD 2018 Dong+

Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification KDD 2018 Dong+

Power laws: y ~ xa Short answer(s) Take logarithms NOT Gaussians ? Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification Power laws: y ~ xa NOT Gaussians Take logarithms y (log scale) a x (log scale) KDD 2018 Dong+

Graph mining Are real graphs random? KDD 2018 Dong+ (C) C. Faloutsos, 2017 Graph mining Are real graphs random? KDD 2018 Dong+

Laws and patterns Q: Are real graphs random? A: NO!! Faloutsos Laws and patterns Q: Are real graphs random? A: NO!! S.0: Diameter (‘6 degrees’; ‘Kevin Bacon’) in- and out- degree distributions other (surprising) patterns So, let’s look at the data KDD 2018 Dong+

Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification KDD 2018 Dong+

S.1 - rank-degree plot Any pattern? KDD 2018 Dong+ (C) C. Faloutsos, 2017 S.1 - rank-degree plot Any pattern? KDD 2018 Dong+

(C) C. Faloutsos, 2017 S.1 - rank-degree plot Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) -0.82 ibm.com log(rank) KDD 2018 Dong+

(C) C. Faloutsos, 2017 S.1 - rank-degree plot Power law in the degree distribution [SIGCOMM99] internet domains att.com log(degree) -0.82 ibm.com log(rank) KDD 2018 Dong+

S.1 - Skewed distributions (C) C. Faloutsos, 2017 S.1 - Skewed distributions Zipf 80-20 Pareto Rich-get-richer Preferential attachment Matthew effect CRP … -0.82 att.com ibm.com KDD 2018 Dong+

Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification KDD 2018 Dong+

S.3: Triangle ‘Law’ Real social networks have a lot of triangles KDD 2018 Dong+

S.3: Triangle ‘Law’ Real social networks have a lot of triangles Friends of friends are friends Any patterns? 2x friends -> 2x triangles ? KDD 2018 Dong+

S.3: Triangle ‘Law’ Real social networks have a lot of triangles Friends of friends are friends Any patterns? 2x friends -> 2x triangles ? 3x KDD 2018 Dong+

Triangle Law: S.3 [Tsourakakis ICDM 2008] Reuters SN X-axis: degree Y-axis: mean # triangles n friends -> ~n1.6 triangles Epinions KDD 2018 Dong+

Anomalies? Patterns anomalies KDD 2018 Dong+

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] ? ? ? KDD 2018 Dong+ 31

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] KDD 2018 Dong+ 32

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] KDD 2018 Dong+ 33

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] KDD 2018 Dong+ 34

Triangle counting for large graphs? Anomalous nodes in Twitter(~ 3 billion edges) [U Kang, Brendan Meeder, +, PAKDD’11] KDD 2018 Dong+ 35

Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification KDD 2018 Dong+

Generalized Iterated Matrix Vector Multiplication (GIMV) (C) C. Faloutsos, 2017 Generalized Iterated Matrix Vector Multiplication (GIMV) PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations. U Kang, Charalampos E. Tsourakakis, and Christos Faloutsos. (ICDM) 2009, Miami, Florida, USA. Best Application Paper (runner-up). KDD 2018 Dong+

S.4: Conn. components Connected Components Count Size KDD 2018 Dong+

S.4: Conn. components Connected Components ~0.7B singleton nodes Count Size KDD 2018 Dong+

S.4: Conn. components Connected Components Count Size KDD 2018 Dong+

S.4: Conn. components Connected Components Count Size 300-size cmpt X 500. Why? 1100-size cmpt X 65. Why? Size KDD 2018 Dong+

financial-advice sites S.4: Conn. components Connected Components Count suspicious financial-advice sites (not existing now) Size KDD 2018 Dong+

Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification KDD 2018 Dong+

Problem: Time evolution (C) C. Faloutsos, 2017 Problem: Time evolution with Jure Leskovec (CMU -> Stanford) and Jon Kleinberg (Cornell – sabb. @ CMU) Jure Leskovec, Jon Kleinberg and Christos Faloutsos, Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations, KDD 2005 (Best Research paper award; test-of-time award). KDD 2018 Dong+

T.1 Evolution of the Diameter (C) C. Faloutsos, 2017 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(N 1/3 ) diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow 4 KDD 2018 Dong+

T.1 Evolution of the Diameter (C) C. Faloutsos, 2017 T.1 Evolution of the Diameter Prior work on Power Law graphs hints at slowly growing diameter: diameter ~ O(N 1/3 ) diameter ~ O(log N) diameter ~ O(log log N) What is happening in real data? Diameter shrinks over time Diameter first, DPL second Check diameter formulas As the network grows the distances between nodes slowly grow KDD 2018 Dong+

T.1 Diameter – “Patents” Patent citation network 25 years of data (C) C. Faloutsos, 2017 T.1 Diameter – “Patents” diameter Patent citation network 25 years of data @1999 2.9 M nodes 16.5 M edges time [years] KDD 2018 Dong+

T.2 Temporal Evolution of the Graphs (C) C. Faloutsos, 2017 T.2 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) KDD 2018 Dong+

T.2 Temporal Evolution of the Graphs (C) C. Faloutsos, 2017 T.2 Temporal Evolution of the Graphs N(t) … nodes at time t E(t) … edges at time t Suppose that N(t+1) = 2 * N(t) Q: what is your guess for E(t+1) =? 2 * E(t) A: over-doubled! But obeying the ``Densification Power Law’’ KDD 2018 Dong+

T.2 Densification – Patent Citations (C) C. Faloutsos, 2017 T.2 Densification – Patent Citations Citations among patents granted @1999 2.9 M nodes 16.5 M edges Each year is a datapoint E(t) 1.66 N(t) KDD 2018 Dong+

✔ ✔ ✔ ✔ ✔ ✔ MORE Graph Patterns RTG: A Recursive Realistic Graph Generator using Random Typing Leman Akoglu and Christos Faloutsos. PKDD’09. KDD 2018 Dong+

MORE Graph Patterns Mary McGlohon, Leman Akoglu, Christos Faloutsos. Statistical Properties of Social Networks. in "Social Network Data Analytics” (Ed.: Charu Aggarwal) Deepayan Chakrabarti and Christos Faloutsos, Graph Mining: Laws, Tools, and Case Studies Oct. 2012, Morgan Claypool. KDD 2018 Dong+

Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification KDD 2018 Dong+

Power laws: y ~ xa Short answer(s) Take logarithms NOT Gaussians ? Short answer(s) Are real graphs random? S*: what do static graphs look like? S.0: ‘six degrees’ S.1: skewed degree distribution S.2: skewed eigenvalues S.3: triangle power-laws S.4: GCC; and skewed distr. of conn. comp. T*: how do graphs evolve over time? T.1: diameters T.2: densification Power laws: y ~ xa NOT Gaussians Take logarithms y (log scale) a x (log scale) KDD 2018 Dong+

Roadmap Introduction – Motivation Part#1: Graphs Faloutsos Roadmap Introduction – Motivation Part#1: Graphs P1.1: properties/patterns in graphs P1.2: node importance P1.3: community detection P1.4: fraud/anomaly detection P1.5: belief propagation ? KDD 2018 Dong+