Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
C&O 355 Lecture 23 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A A A A A A.
Max Cut Problem Daniel Natapov.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Modularity and community structure in networks
1 NP-Complete Problems. 2 We discuss some hard problems:  how hard? (computational complexity)  what makes them hard?  any solutions? Definitions 
Distance Metric Learning with Spectral Clustering By Sheil Kumar.
Information Networks Graph Clustering Lecture 14.
Normalized Cuts and Image Segmentation
Combinatorial Algorithms
Semi-Definite Algorithm for Max-CUT Ran Berenfeld May 10,2005.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
Chapter 10: Iterative Improvement The Maximum Flow Problem The Design and Analysis of Algorithms.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
HCS Clustering Algorithm
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Fast algorithm for detecting community structure in networks.
Approximation Algorithms
Segmentation Graph-Theoretic Clustering.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
A scalable multilevel algorithm for community structure detection
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
COMS Network Theory Week 6: February 28, 2008 Dragomir R. Radev Thursdays, 6-8 PM 233 Mudd Spring 2008.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Segmentation via Graph Cuts
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Finding dense components in weighted graphs Paul Horn
Edge Covering problems with budget constrains By R. Gandhi and G. Kortsarz Presented by: Alantha Newman.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Online Social Networks and Media
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Data Structures & Algorithms Graphs
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
New algorithms for Disjoint Paths and Routing Problems
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
NP-completeness NP-complete problems. Homework Vertex Cover Instance. A graph G and an integer k. Question. Is there a vertex cover of cardinality k?
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Cohesive Subgraph Computation over Large Graphs
Spectral partitioning works: Planar graphs and finite element meshes
Approximation algorithms
Minimum Spanning Tree 8/7/2018 4:26 AM
Greedy Algorithm for Community Detection
June 2017 High Density Clusters.
Segmentation Graph-Theoretic Clustering.
Grouping.
Coverage Approximation Algorithms
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
Instructor: Aaron Roth
Presentation transcript:

Online Social Networks and Media

Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted graphs: weight of edge captures the degree of similarity – Partitioning as an optimization problem: Partition the nodes in the graph such that nodes within clusters are well interconnected (high edge weights), and nodes across clusters are sparsely interconnected (low edge weights) most graph partitioning problems are NP hard

Measuring connectivity What does it mean that a set of nodes are well or sparsely interconnected? min-cut: the min number of edges such that when removed cause the graph to become disconnected – small min-cut implies sparse connectivity – U V-U This problem can be solved in polynomial time Min-cut/Max-flow algorithm

Measuring connectivity What does it mean that a set of nodes are well interconnected? min-cut: the min number of edges such that when removed cause the graph to become disconnected – not always a good idea! U U V-U

A bad example

Graph Bisection Since the minimum cut does always yield good results we need an extra constraints to make the problem meaningful. Graph Bisection refers to the problem of partitioning the nodes of the graph into two equal sets. Kernighan-Lin algorithm: Start with random equal partitions and then swap nodes to improve some quality metric (e.g., cut, modularity, etc).

Graph expansion Normalize the cut by the size of the smallest component Cut ratio: Graph expansion: Other Normalized Cut Ratio: Vol(U) = number of edges with one endpoint in U = total degree of nodes in U

Spectral analysis The Laplacian matrix L = D – A where – A = the adjacency matrix – D = diag(d 1,d 2,…,d n ) d i = degree of node i Therefore – L(i,i) = d i – L(i,j) = -1, if there is an edge (i,j)

Laplacian Matrix properties The matrix L is symmetric and positive semi- definite – all eigenvalues of L are positive The matrix L has 0 as an eigenvalue, and corresponding eigenvector w 1 = (1,1,…,1) – λ 1 = 0 is the smallest eigenvalue

The second smallest eigenvalue The second smallest eigenvalue (also known as Fielder value) λ 2 satisfies The eigenvector for eigenvalue λ 2 is called the Fielder vector. It minimizes where

Spectral ordering The values of x minimize For weighted matrices The ordering according to the x i values will group similar (connected) nodes together Physical interpretation: The stable state of springs placed on the edges of the graph

Spectral partition Partition the nodes according to the ordering induced by the Fielder vector If u = (u 1,u 2,…,u n ) is the Fielder vector, then split nodes according to a threshold value s – bisection: s is the median value in u – ratio cut: s is the value that minimizes α – sign: separate positive and negative values (s=0) – gap: separate according to the largest gap in the values of u This works well (provably for special cases)

Fielder Value The value λ 2 is a good approximation of the graph expansion For the minimum ratio cut of the Fielder vector we have that If the max degree d is bounded we obtain a good approximation of the minimum expansion cut d = maximum degree

MAXIMUM DENSEST SUBGRAPH Thanks to Aris Gionis

Finding dense subgraphs Dense subgraph: A collection of vertices such that there are a lot of edges between them – E.g., find the subset of users that talk the most between them – Or, find the subset of genes that are most commonly expressed together Similar to community identification but we do not require that the dense subgraph is sparsely connected with the rest of the graph.

Definitions

Min-Cut Problem * The graph may be weighted Min-Cut = Max-Flow: the minimum cut maximizes the flow that can be sent from s to t. There is a polynomial time solution.

Decision problem

Transform to min-cut

Transformation to min-cut

Algorithm (Goldberg) Given the input graph G, and value c 1.Create the min-cut instance graph 2.Compute the min-cut 3.If the set S is not empty, return YES 4.Else return NO How do we find the set with maximum density?

Min-cut algorithm The min-cut algorithm finds the optimal solution in polynomial time O(nm), but this is too expensive for real networks. We will now describe a simpler approximation algorithm that is very fast – Approximation algorithm: the ratio of the density of the set produced by our algorithm and that of the optimal is bounded. We will show that the ratio is at most ½ The optimal set is at most twice as dense as that of the approximation algorithm. Any ideas for the algorithm?

Greedy Algorithm

Example

Analysis

Upper bound This is true for any assignment of the edges!

Lower bound

The k-densest subgraph