An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic Farnaz Moradi, Tomas Olovsson, Philippas Tsigas Farnaz Moradi, Tomas Olovsson, Philippas Tsigas Distributed Computing and Systems
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 2 A community is a group of related nodes that –are densely interconnected –have fewer connections with the rest of the network Community
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 3 Many real networks have community structure –Social networks –Web graph –P2P networks –Biological networks – networks Community detection aims at unfolding the logical communities by only using the structral properties of the networks. Community Structure Zachary’s Karate Club
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 4 Separating legitimate (ham) and unsolicited (spam) in a large-scale network generated from real traffic. Assessing the quality of community detection algorithms in creating structural and logical communities. Separating legitimate (ham) and unsolicited (spam) in a large-scale network generated from real traffic. Assessing the quality of community detection algorithms in creating structural and logical communities.
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 5 Outline Community detection algorithms Quality functions –Structural quality –Logical quality Experimental evaluation –Real traffic
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 6 Community Detection Hierarchical Overlapping Flat
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 7 No consensus on which algorithm is more suitable for which type of network. Experimental evaluation on synthetic graphs is not completely realistic [Delling et al. 2006]: –Implicit dependencies between: community detection algorithms synthetic graph generators quality functions used to assess the performance of the algorithms Empirical studies on real-world networks are crucial. Motivation Experimental Evaluation
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 8 Blondel (Louvian method), [Blondel et al. 2008] –Fast Modularity Optimization –Hierarchical clustering –Blondel L1 : the first level of clustering hierarchy Infomap, [Rosvall & Bergstrom 2008] –Maps of Random Walks –Flow-based and information theoretic InfoH (InfoHiermap), [Rosvall & Bergstrom 2011] –Multilevel Compression of Random Walks –Hierarchical version of Infomap Community Detection Algorithms
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 9 RN, [Ronhovde & Nussinov 2009] –Potts Model Community Detection –Minimization of Hamiltonian of an Potts model spin system MCL, [Dongen 2000] –Markov Clustering –Random walks stay longer in dense clusters LC, [Ahn et al. 2010] –Link Community Detection –A community is redefined as a set of closely interrelated edges –Overlapping and hierarchical clustering Community Detection Algorithms
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 10 Used to assess the quality of the algorithms when the true community structure of the network is not known. There is no single perfect quality function. [Almedia et al. 2011] –Structural quality –Logical quality Quality Functions
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 11 Structural Quality Coverage Modularity Conductance Inter-cluster conductance Average conductance Community coverage Overlap coverage Overlapping Clusterings
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 12 We define the logical quality based on the type of the edges inside the communities. –Homogeneous communities have perfect logical quality –The percentage of homogeneous communities in a network can be used to assess the logical quality of the network. Logical Quality
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 13 Experimental Evaluation traffic was collected on a 10 Gbps backbone link during 14 days s were classified as: –Legitimate (Ham) –Unsolicited (Spam) Implicit social network were created: –Nodes: addresses –Edges: Transmitted s Daily and weekly networks were studied: –14 daily networks –2 weekly networks –1 complete network 1.6 million nodes and 2.8 million edges SUNET Customers Main Internet OptoSUNET Core Network Access Routers 2 Core Routers 40 Gb/s 10 Gb/s (x2) NORDUnet
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 14 Experimental Results Structural Quality Community and overlap coverage are used for assessing quality of LC Modularity Average conductance Inter-cluster conductance Coverage
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 15 Experimental Results Logical Quality Comparison of the percentage of spam, ham, and mix communities
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 16 Experimental Results Logical Quality The amount of spam and ham s that have been separated by community detection algorithms
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 17 Summary The algorithms that create coarse-grained communities achieve the best structural quality, but the worst logical quality. –Blondel and InfoH The algorithms that create communities with similar granularity, achieve similar structural and logical quality. –Blondel L1, MCL, and RN The algorithm that creates communities based on the edges of the network achieves the best logical quality. –LC
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 18 Conclusions Yielding high structural quality by community detection algorithms is not enough to unfold the true logical communities of the networks. Link community detection is the most suitable approach for separating spam and ham s into distinct communities. It is necessary to deploy more realistic measures for clustering real-world networks.