Introduction to Graph Cluster Analysis. Outline Introduction to Cluster Analysis Types of Graph Cluster Analysis Algorithms for Graph Clustering  k-Spanning.

Slides:



Advertisements
Similar presentations
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Advertisements

Data Mining Cluster Analysis: Basic Concepts and Algorithms
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
C++ Programming: Program Design Including Data Structures, Third Edition Chapter 21: Graphs.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
HCS Clustering Algorithm
University of CreteCS4831 The use of Minimum Spanning Trees in microarray expression data Gkirtzou Ekaterini.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Greedy Algorithms Like dynamic programming algorithms, greedy algorithms are usually designed to solve optimization problems Unlike dynamic programming.
Chapter 15 Graph Theory © 2008 Pearson Addison-Wesley.
Minimum Spanning Trees. Subgraph A graph G is a subgraph of graph H if –The vertices of G are a subset of the vertices of H, and –The edges of G are a.
TECH Computer Science Graph Optimization Problems and Greedy Algorithms Greedy Algorithms  // Make the best choice now! Optimization Problems  Minimizing.
Minimum Spanning Trees What is a MST (Minimum Spanning Tree) and how to find it with Prim’s algorithm and Kruskal’s algorithm.
Minimum Spanning Trees and Clustering By Swee-Ling Tang April 20, /20/20101.
1.1 Data Structure and Algorithm Lecture 13 Minimum Spanning Trees Topics Reference: Introduction to Algorithm by Cormen Chapter 13: Minimum Spanning Trees.
COSC 2007 Data Structures II Chapter 14 Graphs III.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
Graphs. Definitions A graph is two sets. A graph is two sets. –A set of nodes or vertices V –A set of edges E Edges connect nodes. Edges connect nodes.
Data Structures & Algorithms Graphs
Clustering.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley. Ver Chapter 13: Graphs Data Abstraction & Problem Solving with C++
Data Structures & Algorithms Graphs Richard Newman based on book by R. Sedgewick and slides by S. Sahni.
© 2006 Pearson Addison-Wesley. All rights reserved 14 A-1 Chapter 14 Graphs.
Graphs Upon completion you will be able to:
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Lecture 19 Minimal Spanning Trees CSCI – 1900 Mathematics for Computer Science Fall 2014 Bill Pine.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Chapter 20: Graphs. Objectives In this chapter, you will: – Learn about graphs – Become familiar with the basic terminology of graph theory – Discover.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Graphs Definition: a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. The interconnected.
1 Data Structures and Algorithms Graphs. 2 Graphs Basic Definitions Paths and Cycles Connectivity Other Properties Representation Examples of Graph Algorithms:
Kruskal’s Algorithm for Computing MSTs Section 9.2.
Graph Search Applications, Minimum Spanning Tree
Introduction to Algorithms
Minimum Spanning Tree Chapter 13.6.
Graph theory Definitions Trees, cycles, directed graphs.
Community detection in graphs
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
CS120 Graphs.
Autumn 2016 Lecture 11 Minimum Spanning Trees (Part II)
Graph Algorithm.
Minimum-Cost Spanning Tree
Spanning Trees.
Minimum Spanning Tree.
Connected Components Minimum Spanning Tree
Graph Algorithm.
Minimum Spanning Tree.
Autumn 2015 Lecture 11 Minimum Spanning Trees (Part II)
CSE 373 Data Structures and Algorithms
Minimum-Cost Spanning Tree
Graph Operations And Representation
Minimum-Cost Spanning Tree
CS 583 Analysis of Algorithms
Introduction Wireless Ad-Hoc Network
CSE 373: Data Structures and Algorithms
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
Chapter 14 Graphs © 2011 Pearson Addison-Wesley. All rights reserved.
Graph Algorithm.
Chapter 9 Graph algorithms
Minimum-Cost Spanning Tree
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Introduction to Graph Cluster Analysis

Outline Introduction to Cluster Analysis Types of Graph Cluster Analysis Algorithms for Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 2

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 3

What is Cluster Analysis? The process of dividing a set of input data into possibly overlapping, subsets, where elements in each subset are considered related by some similarity measure 4 Clusters by Shape Clusters by Color 2 Clusters 3 Clusters

Applications of Cluster Analysis 5 Summarization – Provides a macro-level view of the data-set Clustering precipitation in Australia From Tan, Steinbach, Kumar Introduction To Data Mining, Addison-Wesley, Edition 1

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 6

What is Graph Clustering? Types – Between-graph Clustering a set of graphs – Within-graph Clustering the nodes/edges of a single graph 7

Between-graph Clustering Between-graph clustering methods divide a set of graphs into different clusters E.g., A set of graphs representing chemical compounds can be grouped into clusters based on their structural similarity 8

Within-graph Clustering Within-graph clustering methods divides the nodes of a graph into clusters E.g., In a social networking graph, these clusters could represent people with same/similar hobbies 9 Note: In this chapter we will look at different algorithms to perform within-graph clustering

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 10

k-Spanning Tree k-Spanning Tree k k groups of non-overlapping vertices 4 Minimum Spanning Tree STEPS: Obtains the Minimum Spanning Tree (MST) of input graph G Removes k-1 edges from the MST Results in k clusters

What is a Spanning Tree? A connected subgraph with no cycles that includes all vertices in the graph Weight = 17 2 Note: Weight can represent either distance or similarity between two vertices or similarity of the two vertices G

What is a Minimum Spanning Tree (MST)? G Weight = Weight = Weight = 17 2 The spanning tree of a graph with the minimum possible sum of edge weights, if the edge weights represent distance Note: maximum possible sum of edge weights, if the edge weights represent similarity

Algorithm to Obtain MST Prim’s Algorithm Given Input Graph G Select Vertex Randomly e.g., Vertex 5 5 Initialize Empty Graph T with Vertex 5 5 T Select a list of edges L from G such that at most ONE vertex of each edge is in T From L select the edge X with minimum weight Add X to T T 4 Repeat until all vertices are added to T

k-Spanning Tree Remove k-1 edges with highest weight 4 Minimum Spanning Tree Note: k – is the number of clusters E.g., k= Clusters

k-Spanning Tree R-code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(MST_Example) G = graph.data.frame(MST_Example,directed=FALSE) E(G)$weight=E(G)$V3 MST_PRIM = minimum.spanning.tree(G,weights=G$weight, algorithm = "prim") OutputList = k_clusterSpanningTree(MST_PRIM,3) Clusters = OutputList[[1]] outputGraph = OutputList[[2]] Clusters 16

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor Clustering  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 17

18 Shared Nearest Neighbor Clustering Shared Nearest Neighbor Graph (SNN) Shared Nearest Neighbor Clustering Groups of non-overlapping vertices STEPS: Obtains the Shared Nearest Neighbor Graph (SNN) of input graph G Removes edges from the SNN with weight less than τ τ

What is Shared Nearest Neighbor? (Refresher from Proximity Chapter) 19 u v Shared Nearest Neighbor is a proximity measure and denotes the number of neighbor nodes common between any given pair of nodes

Shared Nearest Neighbor (SNN) Graph G SNN Given input graph G, weight each edge (u,v) with the number of shared nearest neighbors between u and v 1 Node 0 and Node 1 have 2 neighbors in common: Node 2 and Node 3

Shared Nearest Neighbor Clustering Jarvis-Patrick Algorithm SNN graph of input graph G If u and v share more than τ neighbors Place them in the same cluster If u and v share more than τ neighbors Place them in the same cluster E.g., τ =3

SNN-Clustering R code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(SNN_Example) G = graph.data.frame(SNN_Example,directed=FALSE) tkplot(G) Output = SNN_Clustering(G,3) Output 22

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor Clustering  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 23

What is Betweenness Centrality? (Refresher from Proximity Chapter) Two types: – Vertex Betweenness – Edge Betweenness 24 Betweenness centrality quantifies the degree to which a vertex (or edge) occurs on the shortest path between all the other pairs of nodes

Vertex Betweenness 25 The number of shortest paths in the graph G that pass through a given node S G E.g., Sharon is likely a liaison between NCSU and DUKE and hence many connections between DUKE and NCSU pass through Sharon

Edge Betweenness The number of shortest paths in the graph G that pass through given edge (S, B) 26 E.g., Sharon and Bob both study at NCSU and they are the only link between NY DANCE and CISCO groups NCSU Vertices and Edges with high Betweenness form good starting points to identify clusters

Vertex Betweenness Clustering 27 Repeat until highest vertex betweenness ≤ μ Select vertex v with the highest betweenness E.g., Vertex 3 with value 0.67 Given Input graph GBetweenness for each vertex 1. Disconnect graph at selected vertex (e.g., vertex 3 ) 2. Copy vertex to both Components

Vertex Betweenness Clustering R code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(Betweenness_Vertex_Example) G = graph.data.frame(Betweenness_Vertex_Example,directed=FALSE) betweennessBasedClustering(G,mode="vertex",threshold=0.2) 28

29 Edge-Betweenness Clustering Girvan and Newman Algorithm 29 Repeat until highest edge betweenness ≤ μ Select edge with Highest Betweenness E.g., edge (3,4) with value Given Input Graph GBetweenness for each edge Disconnect graph at selected edge (E.g., (3,4 ))

Edge Betweenness Clustering R code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(Betweenness_Edge_Example) G = graph.data.frame(Betweenness_Edge_Example,directed=FALSE) betweennessBasedClustering(G,mode="edge",threshold=0.2) 30

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor Clustering  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 31

What is a Highly Connected Subgraph? Requires the following definitions – Cut – Minimum Edge Cut (MinCut) – Edge Connectivity (EC) 32

Cut The set of edges whose removal disconnects a graph Cut = {(0,1),(1,2),(1,3} Cut = {(3,5),(4,2)}

Minimum Cut MinCut = {(3,5),(4,2)} The minimum set of edges whose removal disconnects a graph

Edge Connectivity (EC) Minimum NUMBER of edges that will disconnect a graph MinCut = {(3,5),(4,2)} EC = | MinCut| = | {(3,5),(4,2)}| = 2 Edge Connectivity

Highly Connected Subgraph (HCS) A graph G =(V,E) is highly connected if EC(G)>V/ EC(G) > V/2 2 > 9/2 G G is NOT a highly connected subgraph

HCS Clustering Find the Minimum Cut MinCut (G) Given Input graph G (3,5),(4,2)} YES Return G NO G1 G2 Divide G using MinCut Is EC(G)> V/2 Process Graph G1 Process Graph G2

HCS Clustering R code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(HCS_Example) G = graph.data.frame(HCS_Example,directed=FALSE) HCSClustering(G,kappa=2) 38

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor Clustering  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 39

What is a Clique? A subgraph C of graph G with edges between all pairs of nodes Clique G C

What is a Maximal Clique? Clique Maximal Clique A maximal clique is a clique that is not part of a larger clique.

42 BK(C,P,N) C - vertices in current clique P – vertices that can be added to C N – vertices that cannot be added to C Condition: If both P and N are empty – output C as maximal clique Maximal Clique Enumeration Bron and Kerbosch Algorithm Input Graph G

Maximal Clique R code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(CliqueData) G = graph.data.frame(CliqueData,directed=FALSE) tkplot(G) maximalCliqueEnumerator (G) 43

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor Clustering  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 44

What is k-means? k-means is a clustering algorithm applied to vector data points k-means recap: – Select k data points from input as centroids 1.Assign other data points to the nearest centroid 2.Recompute centroid for each cluster 3.Repeat Steps 1 and 2 until centroids don’t change 45

k-means on Graphs Kernel K-means Basic algorithm is the same as k-means on Vector data We utilize the “kernel trick” (recall Kernel Chapter) “kernel trick” recap – We know that we can use within-graph kernel functions to calculate the inner product of a pair of vertices in a user- defined feature space. – We replace the standard distance/proximity measures used in k-means with this within-graph kernel function 46

Outline Introduction to Clustering Introduction to Graph Clustering Algorithms for Within Graph Clustering  k-Spanning Tree  Shared Nearest Neighbor Clustering  Betweenness Centrality Based  Highly Connected Components  Maximal Clique Enumeration  Kernel k-means Application 47

Application Functional modules in protein-protein interaction networks Subgraphs with pair-wise interacting nodes => Maximal cliques 48 R-code library(GraphClusterAnalysis) library(RBGL) library(igraph) library(graph) data(YeasPPI) G = graph.data.frame(YeasPPI,directed=FALSE) Potential_Protein_Complexes = maximalCliqueEnumerator (G) Potential_Protein_Complexes