Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado.

Slides:



Advertisements
Similar presentations
CSE 211 Discrete Mathematics
Advertisements

BURSTY SUBGRAPHS IN SOCIAL NETWORKS. Introduction 2.
Community Detection and Graph-based Clustering
Social network partition Presenter: Xiaofei Cao Partick Berg.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Social Networks 101 P ROF. J ASON H ARTLINE AND P ROF. N ICOLE I MMORLICA.
Lectures on Network Flows
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Graph & BFS.
Chapter 9 Graph algorithms. Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.
A scalable multilevel algorithm for community structure detection
DIDS part II The Return of dIDS 2/12 CIS GrIDS Graph based intrusion detection system for large networks. Analyzes network activity on networks.
Chapter 9: Graphs Basic Concepts
The Shortest Path Problem
Clustering Software Artifacts Based on Frequent common changes Presented by: Ashgan Fararooy Prepared by: Haroon Malik (Modified)
GDG DevFest Central Italy Joint work with J. Feldman, S. Lattanzi, V. Mirrokni (Google Research), S. Leonardi (Sapienza U. Rome), H. Lynch (Google)
Guillaume Erétéo, Michel Buffa, Fabien Gandon, Olivier Corby.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
CS492: Special Topics on Distributed Algorithms and Systems Fall 2008 Lab 3: Final Term Project.
CSV: Visualizing and Mining Cohesive Subgraphs Nan Wang Srinivasan Parthasarathy Kian-Lee Tan Anthony K. H. Tung School of Computing National University.
Homework 2. Problem 1 Families 1…..N go out for dinner together. To increase their social interaction, no two members of the same family use the same.
Outlier Detection Using k-Nearest Neighbour Graph Ville Hautamäki, Ismo Kärkkäinen and Pasi Fränti Department of Computer Science University of Joensuu,
Computer Science 112 Fundamentals of Programming II Introduction to Graphs.
Finding dense components in weighted graphs Paul Horn
Efficient Identification of Overlapping Communities Jeffrey Baumes Mark Goldberg Malik Magdon-Ismail Rensselaer Polytechnic Institute, Troy, NY.
Terminodes and Sybil: Public-key management in MANET Dave MacCallum (Brendon Stanton) Apr. 9, 2004.
Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:
Greedy Approximation Algorithms for finding Dense Components in a Graph Paper by Moses Charikar Presentation by Paul Horn.
Boundary Recognition in Sensor Networks by Topology Methods Yue Wang, Jie Gao Dept. of Computer Science Stony Brook University Stony Brook, NY Joseph S.B.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Based on slides by Y. Peng University of Maryland
Minimum Spanning Trees Prof. Sin-Min Lee Dept. of Computer Science, San Jose State University.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Lecture 19 Greedy Algorithms Minimum Spanning Tree Problem.
Random Graph Generator University of CS 8910 – Final Research Project Presentation Professor: Dr. Zhu Presented: December 8, 2010 By: Hanh Tran.
Exchange Deployment Planning Services Exchange 2010 Complementary Products.
Topics Paths and Circuits (11.2) A B C D E F G.
Social Network Analysis. Outline l Background of social networks –Definition, examples and properties l Data in social networks –Data creation, flow and.
Measuring Behavioral Trust in Social Networks
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Data Structures and Algorithms in Parallel Computing
Trees Dr. Yasir Ali. A graph is called a tree if, and only if, it is circuit-free and connected. A graph is called a forest if, and only if, it is circuit-free.
Selected Topics in Data Networking Explore Social Networks: Center and Periphery.
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Community Detection based on Distance Dynamics Reporter: Yi Liu Student ID: Department of Computer Science and Engineering Shanghai Jiao Tong.
Presented by: Siddhant Kulkarni Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Groups of vertices and Core-periphery structure
Distributed voting application for handheld devices
Approximating the MST Weight in Sublinear Time
Greedy Algorithm for Community Detection
June 2017 High Density Clusters.
Lectures on Network Flows
Special Graphs: Modeling and Algorithms
Community detection in graphs
Dieudo Mulamba November 2017
CS120 Graphs.
Michael L. Nelson CS 495/595 Old Dominion University
Chapter 9: Graphs Basic Concepts
Connectivity Section 10.4.
University of Washington, Autumn 2018
Chapter 9: Graphs Basic Concepts
GRAPHS.
Maximum Flow Problems in 2005.
Presentation transcript:

Information Flow using Edge Stress Factor Communities Extraction from Graphs Implied by an Instant Messages Corpus Franco Salvetti University of Colorado at Boulder 430 UCB Boulder, CO USA Savitha Srinivasan IBM Almaden Research Center 650 Harry Road San Jose, CA USA

This research shows how a corpus of instant messages can be employed to detect de facto communities of practice automatically. A novel algorithm based on the concept of Edge Stress Factor is proposed and validated. Results show that this approach is fast and effective in studying collaborative behavior. Abstract

Instant Messages & Communities  People frequently collaborate with others who are not part of their “official” team, and organize in spontaneous communities.  Corporate users have discovered that instant messaging helps them exchange small but often critical details; hence, a corpus of instant message logs (MLog) contains enough information to become a useful asset that can be exploited for community detection.  Community detection solutions based on “official” information are in general incomplete and not cost effective.

The graph implicitly defined by an MLog, where a node is a person and an edge represents an instant message exchange, is expected to have community structure; sets of nodes densely connected internally, but with lower density of external links. Community Graph

 k-core: a subgraph Q with n vertices is a k-core if any internal node is adjacent to at least k vertices in Q.  k-core-factor: the k-core-factor of a subgraph Q is the maximum k for which Q is a k-core.  community: A subgraph Q with n vertices and a certain k ‑ core ‑ factor is accepted as a community if k is greater or equal to  ∙(n-1) with  in [0,1]. What is a Community?

The Idea  The goal is to remove edges iteratively in the graph G implied by an MLog and to detect communities arising from this process. Consider the edge (a, b) in G and all the edges (k i, a) and (h j, b) such that k i is connected with a but not with b, and h j is connected with b but not with a.  Metaphorically a source nodes h j is trying to communicate a certain amount of information w h j,b to the destination a, and to do so it “stresses” the edge (b, a) with “capacity” w a,b, by passing through the mediator b.

Edge Stress Factor in Information Flow kmkm ab hnhn w a,b w k 1,a w k m,a w h 1,b w h n,b k1k1 h1h1 c The Edge Stress Factor (ESF) is a local version of the Betweenness Centrality measure which captures the degree of influence a node has over the flow of information in a network.

 The first contains 350 corporate instant messages contributed by 100 users.  The second ~220,000 of IBM’s global intranet instant messages produced in about two hours by ~100,000 employees. Datasets

 The IM corpus becomes an undirected graph.  The goal is to “cut” the edges between communities to isolate them.  Each edge carries a weight; the number of messages exchanged between the two persons.  The edge with the larger ESF is cut.  Extract any connected component which matches the definition of community.  The ESFs on the “new” graph is recomputed. Clustering with the ESF

 Subject matter experts organized the users in the first dataset into 5 distinct communities, then compared them with the algorithm’s results.  For the second dataset after the aggregation of the data by country, division and work location, the experts judged the appropriateness of the extracted communities.  The communities detected in the first dataset and in the different aggregations of the second were meaningful. Validation

 Users’ connectivity graph aggregated at the division level.  The algorithm with  =0.7 identifies the four communities (A, B, C, D) which correctly represent Sales, Integrated Supply Chain, Server Brand Management, and Storage Technology. Results

A novel algorithm based on computing an Edge Stress Factor has been proposed to extract communities from graphs implied by an instant messages corpus. The algorithm leverages an idea of local information flow, which is formally modeled as the Edge Stress Factor. The locality of the algorithm allows the use of weights in the graph and reduces computational cost. Results have been successfully validated against real data. Conclusion