Cohesive Subgraph Computation over Large Graphs

Slides:



Advertisements
Similar presentations
Community Detection and Graph-based Clustering
Advertisements

Approximation, Chance and Networks Lecture Notes BISS 2005, Bertinoro March Alessandro Panconesi University La Sapienza of Rome.
Mauro Sozio and Aristides Gionis Presented By:
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Clustering Prof. Navneet Goyal BITS, Pilani
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Advanced Topics in Data Mining Special focus: Social Networks.
Lecture 21: Spectral Clustering
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Balanced Graph Partitioning Konstantin Andreev Harald Räcke.
HCS Clustering Algorithm
Fast algorithm for detecting community structure in networks.
SCAN: A Structural Clustering Algorithm for Networks
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.
Finding dense components in weighted graphs Paul Horn
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
1 CSE 980: Data Mining Lecture 17: Density-based and Other Clustering Algorithms.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
A Graph-based Friend Recommendation System Using Genetic Algorithm
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Overlapping Community Detection in Networks
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
A randomized linear time algorithm for graph spanners Surender Baswana Postdoctoral Researcher Max Planck Institute for Computer Science Saarbruecken,
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Outline Introduction State-of-the-art solutions Equi-Truss Experiments
Graph clustering to detect network modules
Yu Wang1, Gao Cong2, Guojie Song1, Kunqing Xie1
Finding Dense and Connected Subgraphs in Dual Networks
Heuristic & Approximation
Groups of vertices and Core-periphery structure
Lecture 11 Graph Algorithms
Power Law Structure Weili Wu Ding-Zhu Du Univ of Texas at Dallas
Minimum Spanning Tree 8/7/2018 4:26 AM
June 2017 High Density Clusters.
Algorithms and networks
Community detection in graphs
CS120 Graphs.
CSE 373 Data Structures and Algorithms
Finding Subgraphs with Maximum Total Density and Limited Overlap
Algorithms and networks
Diversified Top-k Subgraph Querying in a Large Graph
Efficient Subgraph Similarity All-Matching
NP-Complete Problems.
3.3 Network-Centric Community Detection
Solving the Minimum Labeling Spanning Tree Problem
CSE 373: Data Structures and Algorithms
Alan Kuhnle*, Victoria G. Crawford, and My T. Thai
Modelling and Searching Networks Lecture 2 – Complex Networks
CSE572: Data Mining by H. Liu
Lecture 10 Graph Algorithms
Advanced Topics in Data Mining Special focus: Social Networks
Approximate Graph Mining with Label Costs
Complexity Theory: Foundations
Presentation transcript:

Cohesive Subgraph Computation over Large Graphs Lijun Chang DECRA Fellow @ Database Group CSE, UNSW 2017-05-08

Outline Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Graphs Graphs are everywhere Social Network Facebook, Twitter Road Network As you may know, nowadays, graphs are everywhere. For example, social networks such as Facebook and Twitter can be modeled as graphs, where each node in the graph represents a user, and an edge between two users denotes a friend relationship between the two users. Other graph examples are road networks, web graphs, and etc. Web Graph The Internet of Things

Big Graphs Graphs are Big (Volume) 302 million monthly active users 208 followers on average 1.4 billion users in 2014 0.4 trillion relationships in 2014

Properties of Real-world Graphs Real graphs are not random graphs (e.g., the Erdos-Renyi random graph model) have fascinating patterns and properties. The degree distribution is skewed, following a power- law The average distance is short (the small-world phenomenon) Edge density is inhomogeneous - groups of vertices with high concentration of edges within them and low concentration between different groups Globally sparse but locally dense

Cohesive Subgraph Computation Aims to identify the modules and, possibly, their hierarchical organization, by only using the information encoded in the graph topology. First attempt dates back to 1955 by Weiss and Jacobson searching for work groups within a government agency. Applications in various fields Any context that information is encoded as a graph

Application domains Communities in social networks Groups of web pages dealing with the same or related topics in World Wide Web Groups of proteins having the same specific function within the cell in biology related to functional modules such as cycles and pathways in metabolic networks Identify compartments in food webs

Cohesiveness Measures Minimum degree: core decomposition Minimum number of triangles each participates in: truss decomposition Edge connectivity: edge connectivity-based decomposition Clique: maximal clique enumeration Structural graph clustering …

Outline Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Core Decomposition k-core The largest subgraph in which every vertex has degree at least k within the subgraph Picture taken from Malliaros et al. 2016

Core Decomposition Another Example Picture taken from Malliaros et al. 2016

Peeling Algorithm Basic idea Iteratively remove the weakest vertex (i.e., the one with the smallest degree) Picture taken from Malliaros et al. 2016

Time complexity Linear to the number of edges (i.e., O(m)) use a bin-sort like data structure to dynamically maintain the vertex with the smallest degree

Linear Time Algorithm Picture taken from Malliaros et al. 2016

Linear Time Algorithm Picture taken from Malliaros et al. 2016

Linear Time Algorithm Picture taken from Malliaros et al. 2016

Linear Time Algorithm Picture taken from Malliaros et al. 2016

Linear Time Algorithm Picture taken from Malliaros et al. 2016

For Speeding Up Algorithms Densest subgraph detection 2-approximation of the densest subgraph. Density is the average degree prune vertices that are not in the densest subgraph, thus speed up the algorithm K-edge connected component computation. A k-edge connected component is a subgraph of a k- core

Influential Spreaders

Outline Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Truss Decomposition K-core decomposition often returns a relatively large number of candidate influential spreaders only a small fraction corresponds to truly highly influential vertices. k-truss decomposition further refines the set of the most influential vertices.

Truss Decomposition K-truss The maximal subgraph in which every edge participates in k-2 triangles Picture taken from Rossi et al. 2015

K-truss vs k-core k-truss is a subgraph of (k-1)-core k-truss represents the nucleus of a k-core filtering out less important information 3-core vs 4-truss Picture taken from Rossi et al. 2015

Truss Decomposition Algorithm Extend the peeling algorithm, which is designed for core decomposition, to truss decomposition That is, virtually transform the graph G to a new graph G’ Each edge (u,v) in G corresponds to a vertex in G’ Two vertices in G’ are connected by an edge if 1) their corresponding edges in G have one common vertex, and 2) the three vertices of the two corresponding edges form a triangle in G The algorithms runs in O(α(G)×m) time α(G) is the arboricity of G, and is small for real graphs

Outline Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Edge Connectivity A graph is k-edge connected if it is still connected after removing any set of (k-1) edges from it 2 1 6 8 7 3 4 5 11 9 10 2-edge connected

k-edge Connected Components Given a graph G and an integer k, computing all maximal subgraphs of G that are k-edge connected 2 1 6 8 7 3 4 5 11 9 10 2-edge connected components

Observation

Algorithm Given a graph G and an integer k If G is not k-edge connected, find a cut C with cardinality < k Remove edges in C from G, then the result will be two connected subgraphs Find k-edge connected components for each connected subgraph Otherwise, report that this subgraph is a k-edge connected component

Optimizations Optimizations [Chang et al. SIGMOD’13] k-core pruning Find multiple cuts at the same time Fast compute a cut The time complexity of computing k-edge connected components is O(h*l*|E|) h and l are bounded small constants

Edge-connectivity based decomposition [Chang et al. SIGMOD’15] Compute sc(u,v) for all edges (u,v) in the graph. sc(u,v): the maximum k such that a k-edge connected component containing edge (u,v) Use the existing techniques for computing k-edge connected component with two optimizations. Batch processing: assign sc(u,v) values for all edges in k-edge connected components Computation sharing: all edges removed during computing k-edge connected components are also removed when computing (k+1)- edge connected components Time complexity: O(α(G)×h×l×|E|), where α(G) is the arboricity of a graph G.

Outline Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Maximal Clique Enumeration … Graph Maximal Cliques Algorithmica’13: Fast Maximal Cliques Enumeration in Sparse Graphs

Backtracking Algorithm Compute the Maximum Clique is NP-hard Backtracking starting with a vertex C = {u} and all its neighbors as the candidate P Recursively add a vertex from P to C, and reduce P to the subset that are adjacent to all vertices of C A maximal clique is found if P is empty A clique is contained in the neighborhood- subgraph of a vertex In practice, the maximum clique can also be found efficiently for real graphs

Outline Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Structural Graph Clustering Structural graph clustering: SCAN [Xu+, KDD’07] Identifies clusters, hubs, and outliers at the same time Mimics DBSCAN [Ester+, KDD’96] for clustering spatial data

A Cluster = Cores + Borders Structural Similarity: Two vertices u and v are structure-similar if Connected Structural similarity ≥ ε (a given similarity threshold) Many: ≥ μ (a given size threshold)

Example (ε=0.0001, μ=3)

pSCAN (Chang et al. ICDE’17) A two-step framework + optimization techniques Step-I: Cluster core vertices Transitivity: if core u and core v are in the same cluster, core v and core w are in the same cluster, then core u and core w are in the same cluster Reduces the number of structural similarity computations Step-II: Cluster non-core vertices A non-core vertex belongs to the same cluster of a set of core vertices if it is structure-similar to one of the core vertices Time complexity is O(α(G)×m) pSCAN is worst-case optimal

Wrap up Introduction Core Decomposition Truss Decomposition Edge-connectivity based Decomposition Maximal Clique Enumeration Graph Structural Clustering

Thank you! Questions? ljchang@cse.unsw.edu.au