NetMine: Mining Tools for Large Graphs

Slides:



Advertisements
Similar presentations
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
Advertisements

1 Dynamics of Real-world Networks Jure Leskovec Machine Learning Department Carnegie Mellon University
CMU SCS Large Graph Mining - Patterns, Explanations and Cascade Analysis Christos Faloutsos CMU.
Power Laws By Cameron Megaw 3/11/2013. What is a Power Law?
Analysis and Modeling of Social Networks Foudalis Ilias.
CMU SCS Copyright: C. Faloutsos (2012)# : Multimedia Databases and Data Mining Lecture #27: Graph mining - Communities and a paradox Christos.
The Connectivity and Fault-Tolerance of the Internet Topology
Kronecker Graphs: An Approach to Modeling Networks Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani Presented.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
On Power-Law Relationships of the Internet Topology Michalis Faloutsos Petros Faloutsos Christos Faloutsos.
Mining and Searching Massive Graphs (Networks)
CMU SCS C. Faloutsos (CMU)#1 Large Graph Algorithms Christos Faloutsos CMU McGlohon, Mary Prakash, Aditya Tong, Hanghang Tsourakakis, Babis Akoglu, Leman.
NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch.
Lecture 21: Spectral Clustering
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Social Networks and Graph Mining Christos Faloutsos CMU - MLD.
Neighborhood Formation and Anomaly Detection in Bipartite Graphs Jimeng Sun Huiming Qu Deepayan Chakrabarti Christos Faloutsos Speaker: Jimeng Sun.
DISC-Finder: A distributed algorithm for identifying galaxy clusters.
Common Properties of Real Networks. Erdős-Rényi Random Graphs.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
On Power-Law Relationships of the Internet Topology CSCI 780, Fall 2005.
Analysis of the Internet Topology Michalis Faloutsos, U.C. Riverside (PI) Christos Faloutsos, CMU (sub- contract, co-PI) DARPA NMS, no
Presented by Ozgur D. Sahin. Outline Introduction Neighborhood Functions ANF Algorithm Modifications Experimental Results Data Mining using ANF Conclusions.
CMU SCS Graph and stream mining Christos Faloutsos CMU.
A scalable multilevel algorithm for community structure detection
1 AutoPart: Parameter-Free Graph Partitioning and Outlier Detection Deepayan Chakrabarti
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 7 May 14, 2006
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
CS8803-NS Network Science Fall 2013
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P0-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
Data Analysis in YouTube. Introduction Social network + a video sharing media – Potential environment to propagate an influence. Friendship network and.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
A deterministic near-linear time algorithm for finding minimum cuts in planar graphs Thank you, Steve, for presenting it for us!!! Parinya Chalermsook.
R-MAT: A Recursive Model for Graph Mining Deepayan Chakrabarti Yiping Zhan Christos Faloutsos.
Data Structures and Algorithms in Parallel Computing Lecture 7.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
1 Patterns of Cascading Behavior in Large Blog Graphs Jure Leskoves, Mary McGlohon, Christos Faloutsos, Natalie Glance, Matthew Hurst SDM 2007 Date:2008/8/21.
CMU SCS KDD'09Faloutsos, Miller, Tsourakakis P9-1 Large Graph Mining: Power Tools and a Practitioner’s guide Christos Faloutsos Gary Miller Charalampos.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
Cmpe 588- Modeling of Internet Emergence of Scale-Free Network with Chaotic Units Pulin Gong, Cees van Leeuwen by Oya Ünlü Instructor: Haluk Bingöl.
EE327 Final Representation Qianyang Peng F
Auburn University
Cohesive Subgraph Computation over Large Graphs
Applications of graph theory in complex systems research
PEGASUS: A PETA-SCALE GRAPH MINING SYSTEM
Community detection in graphs
Graph and Tensor Mining for fun and profit
Generative Model To Construct Blog and Post Networks In Blogosphere
CS120 Graphs.
Large Graph Mining: Power Tools and a Practitioner’s guide
15-826: Multimedia Databases and Data Mining
Part 1: Graph Mining – patterns
Lecture 13 Network evolution
R-MAT: A Recursive Model for Graph Mining
The likelihood of linking to a popular website is higher
Graph and Tensor Mining for fun and profit
Graph and Tensor Mining for fun and profit
3.3 Network-Centric Community Detection
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
GRAPHS Lecture 17 CS 2110 — Spring 2019.
Network Models Michael Goodrich Some slides adapted from:
Advanced Topics in Data Mining Special focus: Social Networks
Presentation transcript:

NetMine: Mining Tools for Large Graphs Deepayan Chakrabarti Yiping Zhan Daniel Blandford Christos Faloutsos Guy Blelloch

Introduction ► Graphs are ubiquitous Protein Interactions [genomebiology.com] Internet Map [lumeta.com] Food Web [Martinez ’91] ► Graphs are ubiquitous Friendship Network [Moody ’01]

Graph “Patterns” How does a real-world graph look like? Patterns/“Laws” Degree distributions/power-laws Count vs Outdegree Power Laws

Graph “Patterns” How does a real-world graph look like? Patterns/“Laws” Degree distributions/power-laws “Small-world” “Scree” plots and others… Effective Diameter Hop-plot

Graph “Patterns” Why do we like them? They capture interesting properties of graphs. They provide “condensed information” about the graph. They are needed to build/test realistic graph generators (useful for simulation studies extrapolations/sampling ) They help detect abnormalities and outliers.

Graph Patterns Degree-distributions  “power laws” “Small-world”  small diameter “Scree” plots … What else?

Our Work The NetMine toolkit  contains all the patterns mentioned before, and adds: The “min-cut” plot a novel pattern which carries interesting information about the graph. A-plots a tool to quickly find suspicious subgraphs/nodes.

Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

Two partitions of almost equal size “Min-cut” plot What is a min-cut? Minimizes the number of edges cut Size of mincut = 2 Two partitions of almost equal size

“Min-cut” plot Do min-cuts recursively. N nodes Mincut size = sqrt(N) log (mincut-size / #edges) Mincut size = sqrt(N) log (# edges) N nodes

“Min-cut” plot Do min-cuts recursively. N nodes New min-cut log (mincut-size / #edges) log (# edges) N nodes

For a d-dimensional grid, the slope is -1/d “Min-cut” plot Do min-cuts recursively. New min-cut log (mincut-size / #edges) Slope = -0.5 log (# edges) For a d-dimensional grid, the slope is -1/d N nodes

“Min-cut” plot For a d-dimensional grid, the slope is -1/d log (mincut-size / #edges) log (mincut-size / #edges) Slope = -1/d log (# edges) log (# edges) For a d-dimensional grid, the slope is -1/d For a random graph, the slope is 0

“Min-cut” plot Min-cut sizes have important effects on graph properties, such as efficiency of divide-and-conquer algorithms compact graph representation difference of the graph from well-known graph types for example, slope = 0 for a random graph

? “Min-cut” plot What does it look like for a real-world graph? log (mincut-size / #edges) ? log (# edges)

Experiments Datasets: Google Web Graph: 916,428 nodes and 5,105,039 edges Lucent Router Graph: Undirected graph of network routers from www.isi.edu/scan/mercator/maps.html; 112,969 nodes and 181,639 edges User  Website Clickstream Graph: 222,704 nodes and 952,580 edges

Experiments Used the METIS algorithm [Karypis, Kumar, 1995] Google Web graph Values along the y-axis are averaged We observe a “lip” for large edges Slope of -0.4, corresponds to a 2.5-dimensional grid! Slope~ -0.4 log (mincut-size / #edges) log (# edges)

Experiments Same results for other graphs too… Slope~ -0.57 log (mincut-size / #edges) log (mincut-size / #edges) log (# edges) log (# edges) Lucent Router graph Clickstream graph

Observations Linear slope for some range of values “Lip” for high #edges Far from random graphs (because slope ≠ 0)

Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

A-plots How can we find abnormal nodes or subgraphs? Visualization but most graph visualization techniques do not scale to large graphs!

A-plots However, humans are pretty good at “eyeballing” data  Our idea: Sort the adjacency matrix in novel ways and plot the matrix so that patterns become visible to the user We will demonstrate this on the SCAN+Lucent Router graph (284,805 nodes and 898,492 edges)

A-plots Three types of such plots for undirected graphs… RV-RV (RankValue vs RankValue)  Sort nodes based on their “network value” (~first eigenvector) Rank of Network Value Rank of Network Value

A-plots Three types of such plots for undirected graphs… RD-RD (RankDegree vs RankDegree)  Sort nodes based on their degree Rank of Degree of node Rank of Degree of node

A-plots Three types of such plots for undirected graphs… D-RV (Degree vs RankValue)  Sort nodes according to “network value”, and show their corresponding degree

RV-RV plot (RankValue vs RankValue) We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal (even though there are no self-loops)! Stripes Rank of Network Value Rank of Network Value

RV-RV plot (RankValue vs RankValue) The “teardrop” structure can be explained by degree-1 and degree-2 nodes Rank of Network Value 1 2 NV1 = 1/λ * NV2 Rank of Network Value

RV-RV plot (RankValue vs RankValue) Strong diagonal  nodes are more likely to connect to “similar” nodes Rank of Network Value Rank of Network Value

RD-RD (RankDegree vs RankDegree) Isolated dots  due to 2-node isolated components Rank of Degree of node Rank of Degree of node

D-RV (Degree vs RankValue) Rank of Network Value

D-RV (Degree vs RankValue) Why? Degree Rank of Network Value

Explanation of “Spikes” and “Stripes” RV-RV plot had stripes; D-RV plot shows spikes. Why? “Stripe” nodes degree-2 nodes connecting only to the “Spike” nodes “Spike” nodes  high degree, but all edges to “Stripe” nodes Stripe

A-plots They helped us detect a buried abnormal subgraph in a large real-world dataset which can then be taken to the domain experts.

Outline Problem definition “Min-cut” plots ( +experiments) A-plots ( +experiments) Conclusions

Conclusions We presented “Min-cut” plot A-plots A novel graph pattern with relevance for many algorithms and applications A-plots which help us find interesting abnormalities All the methods are scalable Their usage was demonstrated on large real-world graph datasets

RV-RV plot (RankValue vs RankValue) We can see a “teardrop” shape and also some blank “stripes” and a strong diagonal. Rank of Network Value Rank of Network Value

RD-RD (RankDegree vs RankDegree) Isolated dots  due to 2-node isolated components Rank of Degree of node Rank of Degree of node