Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.

Slides:



Advertisements
Similar presentations
AI Pathfinding Representing the Search Space
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Efficient Cohesive Subgraph Detection in Parallel
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Generative Models for the Web Graph José Rolim. Aim Reproduce emergent properties: –Distribution site size –Connectivity of the Web –Power law distriubutions.
Spectrum Based RLA Detection Spectral property : the eigenvector entries for the attacking nodes,, has the normal distribution with mean and variance bounded.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
1 Complexity of Network Synchronization Raeda Naamnieh.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
CPSC 689: Discrete Algorithms for Mobile and Wireless Systems Spring 2009 Prof. Jennifer Welch.
Localized Techniques for Power Minimization and Information Gathering in Sensor Networks EE249 Final Presentation David Tong Nguyen Abhijit Davare Mentor:
HCS Clustering Algorithm
Scalable Application Layer Multicast Suman Banerjee Bobby Bhattacharjee Christopher Kommareddy ACM SIGCOMM Computer Communication Review, Proceedings of.
INFERRING NETWORKS OF DIFFUSION AND INFLUENCE Presented by Alicia Frame Paper by Manuel Gomez-Rodriguez, Jure Leskovec, and Andreas Kraus.
Scalable and Distributed GPS free Positioning for Sensor Networks Rajagopal Iyengar and Biplab Sikdar Department of ECSE, Rensselaer Polytechnic Institute.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
COVERTNESS CENTRALITY IN NETWORKS Michael Ovelgönne UMIACS University of Maryland 1 Chanhyun Kang, Anshul Sawant Computer Science Dept.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
Modeling Information Diffusion in Networks with Unobserved Links Quang Duong Michael P. Wellman Satinder Singh Computer Science and Engineering University.
Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop.
Finding dense components in weighted graphs Paul Horn
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.
June 21, 2007 Minimum Interference Channel Assignment in Multi-Radio Wireless Mesh Networks Anand Prabhu Subramanian, Himanshu Gupta.
COSC 2007 Data Structures II Chapter 14 Graphs III.
Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.
ACT: Attachment Chain Tracing Scheme for Virus Detection and Control Jintao Xiong Proceedings of the 2004 ACM workshop on Rapid malcode Presented.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Class 10: Introduction to CINET Using CINET for network analysis and visualization Network Science: Introduction to CINET 2015 Prof. Boleslaw K. Szymanski.
Optimal Link Bombs are Uncoordinated Sibel Adali Tina Liu Malik Magdon-Ismail Rensselaer Polytechnic Institute.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Union-find Algorithm Presented by Michael Cassarino.
Clustering.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail.
Copyright © 2007 OPNET Technologies, Inc. CONFIDENTIAL - RESTRICTED ACCESS: This information may not be disclosed, copied, or transmitted in any format.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Data Structures and Algorithms in Parallel Computing
Computer Science 1 Using Clustering Information for Sensor Network Localization Haowen Chan, Mark Luk, and Adrian Perrig Carnegie Mellon University
A Framework for Reliable Routing in Mobile Ad Hoc Networks Zhenqiang Ye Srikanth V. Krishnamurthy Satish K. Tripathi.
Melbourne, Australia, Oct., 2015 gSparsify: Graph Motif Based Sparsification for Graph Clustering Peixiang Zhao Department of Computer Science Florida.
SPARSE CERTIFICATES AND SCAN-FIRST SEARCH FOR K-VERTEX CONNECTIVITY
Discovering Hidden Groups in Communication Networks Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail William Wallace.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
A Protocol for Tracking Mobile Targets using Sensor Networks H. Yang and B. Sikdar Department of Electrical, Computer and Systems Engineering Rensselaer.
C OMMUNITIES AND B ALANCE IN S IGNED N ETWORKS : S PECTRAL A PPROACH -Pranay Anchuri*, Malik Magdon Ismail Rensselaer Polytechnic Institute, NY.
Efficient Placement and Dispatch of Sensors in a Wireless Sensor Network You-Chiun Wang, Chun-Chi Hu, and Yu-Chee Tseng IEEE Transactions on Mobile Computing.
Deploying Sensors for Maximum Coverage in Sensor Network Ruay-Shiung Chang Shuo-Hung Wang National Dong Hwa University IEEE International Wireless Communications.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Breadth-First Search (BFS)
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Minimum Spanning Tree 8/7/2018 4:26 AM
Greedy Algorithm for Community Detection
Community detection in graphs
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Minimum Spanning Tree.
Michael L. Nelson CS 495/595 Old Dominion University
Performance Comparison of Tarry and Awerbuch Algorithms
Overcoming Resolution Limits in MDL Community Detection
Diversified Top-k Subgraph Querying in a Large Graph
Malik Magdon-Ismail, Konstantin Mertsalov, Mark Goldberg
COMPUTER NETWORKS CS610 Lecture-16 Hammad Khalid Khan.
Lecture 10 Graph Algorithms
Constructing a m-connected k-Dominating Set in Unit Disc Graphs
The Impact of Changes in Network Structure on Diffusion of Warnings
Presentation transcript:

Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY

Outline Communities as clusters What is a cluster? Cluster seed procedure (LA) Cluster refinement procedure (IS 2 ) Experimental results Conclusions and future work

Communities as clusters Malicious groups use large communication networks for planning and coordination Their goal: remain undetected Our goal: sift through communications for suspicious patterns, using structure only, not content

Communities as clusters Detecting all social groups (malicious or not) will aide in searching for “hidden” groups Social groups tend to communicate densely Approach: Find social groups by finding clusters in the graph of the communication network actor A actor B A communicates with B likely a social group likely not a social group Add external edges

What is a cluster? Many partitioning algorithms exist Social groups often overlap Instead define clusters as locally optimal with respect to density partitioning overlapping clustering

Two-stage process seed procedure refinement procedure communication network seed clusters final clusters

Original procedures Rank Removal (RaRe) Iterative Scan (IS) communication network seed clusters final clusters Jeffrey Baumes, Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail, Nathan Preston. "Finding Communities by Clustering a Graph into Overlapping Subgraphs", International Conference on Applied Computing (IADIS 2005), Feb 22-25, Algarve, Portugal.

Proposed new procedures Link Aggregate (LA) Iterative Scan 2 (IS 2 ) communication network seed clusters final clusters

Link Aggregate (LA) Order the nodes (two routines are used) Pass through the nodes –For each node, add it to the clusters it improves, or start a new cluster

LA procedure

LA procedure

LA procedure

LA procedure

LA procedure

LA procedure

Iterative Scan (IS) Old refinement procedure –Traverses entire node list, adding / removing nodes which increase the density –Repeats the process until no improvements are possible May be inefficient in sparse networks\ Guaranteed to be locally optimal

Iterative Scan 2 (IS 2 ) New refinement procedure –Traverses neighborhood of cluster only, adding / removing nodes which increase the density –Repeats the process until no improvements are possible More efficient in sparse networks in spite of overhead, less efficient in dense networks

IS 2 procedure

Experimental results Compare run time of new vs. old Compare cluster quality of new vs. old Compare on different network types –Random –Preferential attachment –Real-world Compare possible actor orderings for LA

RaRe vs. LA run time New RaRe LA Original RaRe New RaRe LA

IS vs. IS 2 run time Define IS* = IS for dense graphs, IS 2 for sparse graphs

Old vs. new quality New RaRe → IS LA → IS 2 New RaRe → IS LA → IS 2

Preferential attachment New RaRe → IS LA → IS 2 New RaRe → IS LA → IS 2

Real-World Networks Ratio = new/old = (LA → IS*)/(RaRe → IS) IS 2 IS IS 2 IS* =

LA ordering

Conclusions and future work Overlapping clustering may be used to discover social groups in communication networks The new algorithm is more efficient in many cases, while keeping the same or better quality A unified algorithm should choose strategies and parameters based on network properties

Questions

Rank Removal Existing seed procedure –Removes highly connected nodes until network is broken into small clusters –Adds removed nodes back into clusters it is well- connected to Two main inefficiencies –Computed Page Rank at each iteration –Computed connected components at each iteration Page Rank could be computed once, but reprocessing connected components is crucial

LA procedure detail

IS 2 procedure detail

RaRe vs. LA

IS vs. IS 2

Run time RaRe vs. LA

Run time IS vs. IS 2

Cluster quality

Preferential attachment run time

Preferential attachment quality

LA ordering run time

LA ordering quality