Yuzhou Zhang ﹡, Jianyong Wang #, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics.

Slides:



Advertisements
Similar presentations
Answering Approximate Queries over Autonomous Web Databases Xiangfu Meng, Z. M. Ma, and Li Yan College of Information Science and Engineering, Northeastern.
Advertisements

An Interactive-Voting Based Map Matching Algorithm
Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar
Graph Algorithms Carl Tropper Department of Computer Science McGill University.
Social network partition Presenter: Xiaofei Cao Partick Berg.
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Movement-Assisted Sensor Deployment Author : Guiling Wang, Guohong Cao, Tom La Porta Presenter : Young-Hwan Kim.
Distributed Breadth-First Search with 2-D Partitioning Edmond Chow, Keith Henderson, Andy Yoo Lawrence Livermore National Laboratory LLNL Technical report.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Social Influence Analysis in Large-scale Networks Jie Tang 1, Jimeng Sun 2, Chi Wang 1, and Zi Yang 1 1 Dept. of Computer Science and Technology Tsinghua.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
1 Complexity of Network Synchronization Raeda Naamnieh.
Scaling Distributed Machine Learning with the BASED ON THE PAPER AND PRESENTATION: SCALING DISTRIBUTED MACHINE LEARNING WITH THE PARAMETER SERVER – GOOGLE,
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
A scalable multilevel algorithm for community structure detection
Kyle Heath, Natasha Gelfand, Maks Ovsjanikov, Mridul Aanjaneya, Leo Guibas Image Webs Computing and Exploiting Connectivity in Image Collections.
Research at Intel Distributed Localization of Modular Robot Ensembles Robotics: Science and Systems 25 June 2008 Stanislav Funiak, Michael Ashley-Rollman.
Clustering Vertices of 3D Animated Meshes
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
Gwangju Institute of Science and Technology Intelligent Design and Graphics Laboratory Multi-scale tensor voting for feature extraction from unstructured.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
Pregel: A System for Large-Scale Graph Processing Presented by Dylan Davis Authors: Grzegorz Malewicz, Matthew H. Austern, Aart J.C. Bik, James C. Dehnert,
Algorithms for Triangulations of a 3D Point Set Géza Kós Computer and Automation Research Institute Hungarian Academy of Sciences Budapest, Kende u
Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.
Graph Algorithms. Definitions and Representation An undirected graph G is a pair (V,E), where V is a finite set of points called vertices and E is a finite.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Design Patterns for Efficient Graph Algorithms in MapReduce Jimmy Lin and Michael Schatz (Slides by Tyler S. Randolph)
Pregel: A System for Large-Scale Graph Processing Grzegorz Malewicz, Matthew H. Austern, Aart J. C. Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and.
1/52 Overlapping Community Search Graph Data Management Lab, School of Computer Science
Parallel graph algorithms Antonio-Gabriel Sturzu, SCPD Adela Diana Almasi, SCPD Adela Diana Almasi, SCPD Iulia Alexandra Floroiu, ISI Iulia Alexandra Floroiu,
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Graph Algorithms Ananth Grama, Anshul Gupta, George Karypis, and Vipin Kumar Adapted for 3030 To accompany the text ``Introduction to Parallel Computing'',
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
CS 484 Load Balancing. Goal: All processors working all the time Efficiency of 1 Distribute the load (work) to meet the goal Two types of load balancing.
Data Structures and Algorithms in Parallel Computing Lecture 4.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Data Structures and Algorithms in Parallel Computing
Overlapping Community Detection in Networks
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.
Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: Department of Computer Science and Engineering Shanghai Jiao Tong.
Biao Wang 1, Ge Chen 1, Luoyi Fu 1, Li Song 1, Xinbing Wang 1, Xue Liu 2 1 Shanghai Jiao Tong University 2 McGill University
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Department of Computer Science, Johns Hopkins University Pregel: BSP and Message Passing for Graph Computations EN Randal Burns 14 November 2013.
Dynamic Network Analysis Case study of PageRank-based Rewiring Narjès Bellamine-BenSaoud Galen Wilkerson 2 nd Second Annual French Complex Systems Summer.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
TensorFlow– A system for large-scale machine learning
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Density-based Hybrid Clustering
Greedy Algorithm for Community Detection
Data Structures and Algorithms in Parallel Computing
Latent Space Model for Road Networks to Predict Time-Varying Traffic
Replication-based Fault-tolerance for Large-scale Graph Processing
Introduction to locality sensitive approach to distributed systems
Finding Subgraphs with Maximum Total Density and Limited Overlap
SEG5010 Presentation Zhou Lanjun.
Presentation transcript:

Yuzhou Zhang ﹡, Jianyong Wang #, Yi Wang §, Lizhu Zhou ¶ Presented by Nam Nguyen Parallel Community Detection on Large Networks with Propinquity Dynamics § Google Beijing Research Beijing , China ﹡ # ¶ Department of Computer Science and Technology Tsinghua University Beijing , China

Introduction to Community Detection Problem Graph(Network) can model many real life data. Community structure: Highly intra connected subgraphs with relatively sparse connections to the leaving parts. Coherent. Ubiquitous, distinguishing property of real-life graph. Applications include: Online Social network Web linkage graph Biological network Wikipedia graph

How to Define the Community Structure Clique(complete connected graph): too strict to be realistic. Quasi-Clique: a relaxed version. Edge density: inner community edges exceed the inter community edges. Implicit definition: Optimize a quality function. Modularity: a popular one. Even no quality function: depends on the heuristic.

Related Works Graph partitioning: several parameters must be specified. Edge cutting: the key is defining edge centrality. Betweenness: edge betweenness; current-flow betweenness; Random-walk betweenness; Information centrality. Edge clustering coefficient. Modularity optimization: greedy algorithm; simulated annealing; extremal optimization; spectral optimization; Spectral(laplacian matrix, normal matrix). Other…

Motivation Many algorithms exist, why we propose another?? We need a more efficient one for web scale data. Existing algorithm ≥ O(|V| 2 ) on sparse graph. Ours ≈ O(k·|V|). We emphasize the scalability aspect. Parallelized. Incremental.

Brainstorm The community structures are right there, why mine? Can the communities claim themselves by increase the graph contrast? How…? Observation: In social network, communities are progressively and spontaneously formed by the collaborative local decision of each individual. Our solution: Continue with more aggressive criterion. Community structures can emerge naturally. Simulation of this process: Propinquity Dynamics.

Overview Input Network (Graph with low contrast) Refined Network (Graph with higher contrast) Propinquity dynamics Network communities Finding connected components Parallelization

Example Propinquity dynamics Finding networks communities: Via connected components

Propinquity Dynamics Propinquity: evaluate the probability that a pair of vertices are involved in a coherent community. Denoted by In global view: Contradict propinquity and topology. Update the topology to keep consistent with propinquity. In local view: Edge deletion and insertion. Local topology update criteria: α: cutting threshold β: emerging threshold

Propinquity Dynamics (high level description) Init: Term condition: The incremental version: Init: Converge condition: Not yet proven

Overlapping Community Extraction Connected components are simply the communities we want. We have a chance to extract the community overlap: Micro clustering the neighbors at each vertex. Breadth-first-search.

Coherent Neighborhood Propinquity A concrete propinquity definition. It reflects the connectivity of the maximum coherent sub graph involving a vertex pair; Locality; Three parts: Direct connection; Common neighbors; Conjugates;

Propinquity Calculation Complexity by definition: Efficient way: Angle propinquity: for-each-vertex; Conjugate propinquity: for-each-edge; The overall complexity can be reduced to Can the propinquity be updated via an adaptive algorithm?

Incremental Propinquity Update Single edge delete and insert is easy to handle. What if overall topology update is considered? Let N n (v) be the neighboring vertex set of v in topology T n. All the following formulations will be mapped to operations on the three set. Given two disjoint vertex sets, S 1, S 2 (S 1 ⊓ S 2 = ∅ ), : for each v i ∈ S 1 and each v j ∈ S 2, increase P(v i, v j ) by a unit propinquity; : for each v i, v j ∈ S(v i ≠v j ), add a unit propinquity to P(v 1, v 2 )

Update overview

Angle Propinquity Update Angle propinquity w.r.t a specific vertex v(omitted):

Conjugate Propinquity Update(D,I) It seems that, we have covered all the cases… Really? Conjugate propinquity update brought by a… Deleted edge: Inserted edge:

Conjugate Propinquity Update(R) The answer is NO!! The conjugate propinquity contributed by a remained edge may partly change by updated local neighborhood… Let : be the remained common neighbor set between v 1 and v 2 and can be similarly defined. This part of propinquity update can be calculated by: Which can be further calculated by………

Conjugate Propinquity Update(R)

Incremental propinquity update(all)

Parallelization Time complexity without sparse graph assumption: Real datasets are large and dense; Degree distribution is highly skewed; So we need HPC’s help…

Parallel Model (Vertex oriented BSP) Virtual processor(vertex); Physical machine execute virtual processor; Message passing; Bulk synchronous parallel (BSP) model: Computation proceeds in consecutive supersteps (1)Accessing the messages sent to it in the previous superstep; (2)Carrying out local computation and accessing local memory; (3)Send other processors messages, which will be available to the destination processor later in the next superstep. Barrier: Dump both memory and messages for fault recovery;

Parallel Implementation Message type: : propinquity update; : donate neighbors;

Parallel Implementation We use C XY to refer the resulting set calculated from operation at X column and Y row:

Performance issues Message flow control: Divide the macro superstep into micro superstep; Message size estimate & flow control strategy; Propinquity map buffering;

Experiments Dataset statistics: Different  and  for each network Wikipedia:  =400,  = 1000 eatRS:  = 5,  = 180 Erdos02:  =2,  = 20 Hep-th-new:  =20,  = 300

Experiments Overlapping community structures mined from word association network eatRS.

Experiments Overlapping community structures mined from the Erdos02 co-authorship network.

Experiments Selected community structures mined from Wikipedia linkage graph.

Experiments Speedups while running on neural-sized hep-thnew paper citation network. Speedups while running on large scale Wikipedia linkage graph.

Experiments The effectiveness of the incremental propinquity update(Wikipedia linkage graph). Topology and propinquity evolves with iteration.

Thank you for your attention