Discovering Important Nodes through Graph Entropy Jitesh Shetty, Jafar Adibi [KDD’ 05] Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/09/18
Outline Introduction Order In Networks Graph Entropy Experimental Result Conclusions
Introduction A new challenge in the area of Link Discovery and Social Network Analysis To exploit communication pattern information and text information within knowledge discovery processes such as discovery of hidden organizational structure and selection of interesting prominent members
Introduction Email logs Graph entropy Prime importance and relevance in the study of information flow in an organization Evidence database for law enforcement and intelligence organizations to detect hidden groups in an organization which are engaged in illegal activities Graph entropy To determine the most prominent interesting people
Order In Networks A graph model might not be the best representation of organizations Such as drug dealers, terrorist organization, threat groups Usually ignore their hierarchy They are composed of leaders and followers
Order In Networks Example
Graph Entropy (1/6) To find prominent people in a network Need to aggregate links between them and discover which node has the most effect on network Entropy model can identify an entity that most effect on the graph entropy Transform the problem space into a multigraph Each node represents an entity, each link represents action between entities
Graph Entropy (2/6)
Graph Entropy (3/6) Let G = (V, E) be a graph. P is the probability distribution on the vertex set V(G) P(AemailB) =
Graph Entropy (4/6) A great concern in LD domain is that elements of data are not independent Ex: link AsendemailtoB and link BsendemailtoC are dependent to each other, means B may forward A’s email to C Three approach to discover dependency Examine the similarity of emails check
Graph Entropy (5/6) 3. Exploitation of Markov Blanket type of model Assume an event(link) between two nodes is only dependent to those node’s events
Graph Entropy (6/6)
Experiment Enron Email Dataset 151 users, mostly senior management of Enron contains 252,759 email messages Almost all users use folders to organize their emails
Experiment
Experiment Created an Enron dictionary Normalized all emails using porter stemming algorithm Compare the vectors using Jaccards Algorithm Ordered emails based on the time stamp
Experiment
Conclusions Defined and addressed the problem of important nodes and finding closed group around them Using event based entropy to find influential nodes in a graph and exhibit entropy model can act as a good means for detecting influential nodes