An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Email Traffic.

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Benchmarking traversal operations over graph databases Marek Ciglan 1, Alex Averbuch 2 and Ladialav Hluchý 1 1 Institute of Informatics, Slovak Academy.
ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
Analysis and Modeling of Social Networks Foudalis Ilias.
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
An Analysis of Social Network-Based Sybil Defenses Sybil Defender
Marios Iliofotou (UC Riverside) Brian Gallagher (LLNL)Tina Eliassi-Rad (Rutgers University) Guowu Xi (UC Riverside)Michalis Faloutsos (UC Riverside) ACM.
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Topology Generation Suat Mercan. 2 Outline Motivation Topology Characterization Levels of Topology Modeling Techniques Types of Topology Generators.
Funding Networks Abdullah Sevincer University of Nevada, Reno Department of Computer Science & Engineering.
Probabilistic Aggregation in Distributed Networks Ling Huang, Ben Zhao, Anthony Joseph and John Kubiatowicz {hling, ravenben, adj,
An Algebraic Approach to Practical and Scalable Overlay Network Monitoring Yan Chen, David Bindel, Hanhee Song, Randy H. Katz Presented by Mahesh Balakrishnan.
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Topologically biased random walks with application for community finding Vinko Zlatić Dep. Of Physics, “Sapienza”, Roma, Italia Theoretical Physics Division,
A scalable multilevel algorithm for community structure detection
Community Detection in a Large Real-World Social Network Karsten Steinhaeuser Nitesh V. Chawla DIAL Research Group University of Notre.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
SocialFilter: Introducing Social Trust to Collaborative Spam Mitigation Michael Sirivianos Telefonica Research Telefonica Research Joint work with Kyungbaek.
Community Structures. My T. Thai 2 What is Community Structure  Definition:  A community is a group of nodes in which:  There are.
Towards Modeling Legitimate and Unsolicited Traffic Using Social Network Properties 1 Towards Modeling Legitimate and Unsolicited Traffic Using.
Neighbourhood Sampling for Local Properties on a Graph Stream A. Pavan, Iowa State University Kanat Tangwongsan, IBM Research Srikanta Tirthapura, Iowa.
DEMON A Local-first Discovery Method For Overlapping Communities Giulio Rossetti 2,1,Michele Coscia 3, Fosca Giannotti 2, Dino Pedreschi 2,1 1 Computer.
Community Detection by Modularity Optimization Jooyoung Lee
Optimal serverless networks attacks, complexity and some approximate algorithms Carlos Aguirre Maeso Escuela Politécnica Superior Universidad Autónoma.
IP Switching for Scalable IP Services Hassan M. Ahmed Ross Callon Andrew G. Malis Hohn Moy Presented by Gao, Yun Shih, Pei-Shin Wei, ShuGuang.
Distributed Computing Rik Sarkar. Distributed Computing Old style: Use a computer for computation.
Community detection algorithms: a comparative analysis Santo Fortunato.
WALKING IN FACEBOOK: A CASE STUDY OF UNBIASED SAMPLING OF OSNS junction.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
MonNet – a project for network and traffic monitoring Detection of malicious Traffic on Backbone Links via Packet Header Analysis Wolfgang John and Tomas.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Complex Networks First Lecture TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA TexPoint fonts used in EMF. Read the.
Markov Cluster (MCL) algorithm Stijn van Dongen.
A Local Seed Selection Algorithm for Overlapping Community Detection 1 A Local Seed Selection Algorithm for Overlapping Community Detection Farnaz Moradi,
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Vulnerability in Socially-informed Peer-to-Peer Systems Jeremy Blackburn Nicolas Kourtellis Adriana Iamnitchi University of South Florida.
Relative Validity Criteria for Community Mining Evaluation
Overlapping Communities for Identifying Misbehavior in Network Communications 1 Overlapping Communities for Identifying Misbehavior in Network Communications.
6 December On Selfish Routing in Internet-like Environments paper by Lili Qiu, Yang Richard Yang, Yin Zhang, Scott Shenker presentation by Ed Spitznagel.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
RTM: Laws and a Recursive Generator for Weighted Time-Evolving Graphs Leman Akoglu, Mary McGlohon, Christos Faloutsos Carnegie Mellon University School.
Students: Aiman Md Uslim, Jin Bai, Sam Yellin, Laolu Peters Professors: Dr. Yung-Hsiang Lu CAM 2 Continuous Analysis of Many CAMeras The Problem Currently.
Effective Anomaly Detection with Scarce Training Data Presenter: 葉倚任 Author: W. Robertson, F. Maggi, C. Kruegel and G. Vigna NDSS
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems M. RosvallM. Rosvall and C. T. BergstromC.
Anonymous communication over social networks Shishir Nagaraja and Ross Anderson Security Group Computer Laboratory.
On the Placement of Web Server Replicas Yu Cai. Paper On the Placement of Web Server Replicas Lili Qiu, Venkata N. Padmanabhan, Geoffrey M. Voelker Infocom.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Twitter Community Discovery & Analysis Using Topologies Andrew McClain Karen Aguar.
Scaling Properties of the Internet Graph Aditya Akella With Shuchi Chawla, Arvind Kannan and Srinivasan Seshan PODC 2003.
A Place-based Model for the Internet Topology Xiaotao Cai Victor T.-S. Shi William Perrizo NDSU {Xiaotao.cai, Victor.shi,
Graph clustering to detect network modules
Data Streaming in Computer Networking
Markov Random Fields with Efficient Approximations
DEMON A Local-first Discovery Method For Overlapping Communities
Greedy Algorithm for Community Detection
Community detection in graphs
Verilog to Routing CAD Tool Optimization
Resolution Limit in Community Detection
Statistical properties of network community structure
Overcoming Resolution Limits in MDL Community Detection
Panagiotis G. Ipeirotis Luis Gravano
Affiliation Network Models of Clusters in Networks
Presentation transcript:

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic Farnaz Moradi, Tomas Olovsson, Philippas Tsigas Farnaz Moradi, Tomas Olovsson, Philippas Tsigas Distributed Computing and Systems

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 2 A community is a group of related nodes that –are densely interconnected –have fewer connections with the rest of the network Community

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 3 Many real networks have community structure –Social networks –Web graph –P2P networks –Biological networks – networks Community detection aims at unfolding the logical communities by only using the structral properties of the networks. Community Structure Zachary’s Karate Club

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 4 Separating legitimate (ham) and unsolicited (spam) in a large-scale network generated from real traffic. Assessing the quality of community detection algorithms in creating structural and logical communities. Separating legitimate (ham) and unsolicited (spam) in a large-scale network generated from real traffic. Assessing the quality of community detection algorithms in creating structural and logical communities.

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 5 Outline Community detection algorithms Quality functions –Structural quality –Logical quality Experimental evaluation –Real traffic

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 6 Community Detection Hierarchical Overlapping Flat

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 7 No consensus on which algorithm is more suitable for which type of network. Experimental evaluation on synthetic graphs is not completely realistic [Delling et al. 2006]: –Implicit dependencies between: community detection algorithms synthetic graph generators quality functions used to assess the performance of the algorithms Empirical studies on real-world networks are crucial. Motivation Experimental Evaluation

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 8 Blondel (Louvian method), [Blondel et al. 2008] –Fast Modularity Optimization –Hierarchical clustering –Blondel L1 : the first level of clustering hierarchy Infomap, [Rosvall & Bergstrom 2008] –Maps of Random Walks –Flow-based and information theoretic InfoH (InfoHiermap), [Rosvall & Bergstrom 2011] –Multilevel Compression of Random Walks –Hierarchical version of Infomap Community Detection Algorithms

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 9 RN, [Ronhovde & Nussinov 2009] –Potts Model Community Detection –Minimization of Hamiltonian of an Potts model spin system MCL, [Dongen 2000] –Markov Clustering –Random walks stay longer in dense clusters LC, [Ahn et al. 2010] –Link Community Detection –A community is redefined as a set of closely interrelated edges –Overlapping and hierarchical clustering Community Detection Algorithms

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 10 Used to assess the quality of the algorithms when the true community structure of the network is not known. There is no single perfect quality function. [Almedia et al. 2011] –Structural quality –Logical quality Quality Functions

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 11 Structural Quality Coverage Modularity Conductance Inter-cluster conductance Average conductance Community coverage Overlap coverage Overlapping Clusterings

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 12 We define the logical quality based on the type of the edges inside the communities. –Homogeneous communities have perfect logical quality –The percentage of homogeneous communities in a network can be used to assess the logical quality of the network. Logical Quality

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 13 Experimental Evaluation traffic was collected on a 10 Gbps backbone link during 14 days s were classified as: –Legitimate (Ham) –Unsolicited (Spam) Implicit social network were created: –Nodes: addresses –Edges: Transmitted s Daily and weekly networks were studied: –14 daily networks –2 weekly networks –1 complete network 1.6 million nodes and 2.8 million edges SUNET Customers Main Internet OptoSUNET Core Network Access Routers 2 Core Routers 40 Gb/s 10 Gb/s (x2) NORDUnet

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 14 Experimental Results Structural Quality Community and overlap coverage are used for assessing quality of LC Modularity Average conductance Inter-cluster conductance Coverage

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 15 Experimental Results Logical Quality Comparison of the percentage of spam, ham, and mix communities

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 16 Experimental Results Logical Quality The amount of spam and ham s that have been separated by community detection algorithms

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 17 Summary The algorithms that create coarse-grained communities achieve the best structural quality, but the worst logical quality. –Blondel and InfoH The algorithms that create communities with similar granularity, achieve similar structural and logical quality. –Blondel L1, MCL, and RN The algorithm that creates communities based on the edges of the network achieves the best logical quality. –LC

An Evaluation of Community Detection Algorithms on Large-Scale Traffic 18 Conclusions Yielding high structural quality by community detection algorithms is not enough to unfold the true logical communities of the networks. Link community detection is the most suitable approach for separating spam and ham s into distinct communities. It is necessary to deploy more realistic measures for clustering real-world networks.