Automated Social Hierarchy Detection through Email Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor:

Slides:



Advertisements
Similar presentations
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Advertisements

Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale Network Theory: Computational Phenomena and Processes Social Network.
Date: 2014/05/06 Author: Michael Schuhmacher, Simon Paolo Ponzetto Source: WSDM’14 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Knowledge-based Graph Document.
Diversity Maximization Under Matroid Constraints Date : 2013/11/06 Source : KDD’13 Authors : Zeinab Abbassi, Vahab S. Mirrokni, Mayur Thakur Advisor :
CSE 5243 (AU 14) Graph Basics and a Gentle Introduction to PageRank 1.
Introduction to Network Theory: Modern Concepts, Algorithms
Analysis and Modeling of Social Networks Foudalis Ilias.
COLLABORATIVE FILTERING Mustafa Cavdar Neslihan Bulut.
CS 599: Social Media Analysis University of Southern California1 The Basics of Network Analysis Kristina Lerman University of Southern California.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Global topological properties of biological networks.
Spam Detection Jingrui He 10/08/2007. Spam Types  Spam Unsolicited commercial  Blog Spam Unwanted comments in blogs  Splogs Fake blogs.
The Very Small World of the Well-connected. (19 june 2008 ) Lada Adamic School of Information University of Michigan Ann Arbor, MI
The Shortest Path Problem
CS8803-NS Network Science Fall 2013
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Department of Computer Science, University of California, Irvine Site Visit for UC Irvine KD-D Project, April 21 st 2004 The Java Universal Network/Graph.
Dr. Marina Gavrilova 1.  Autocorrelation  Line Pattern Analyzers  Polygon Pattern Analyzers  Network Pattern Analyzes 2.
Section 8 – Ec1818 Jeremy Barofsky March 31 st and April 1 st, 2010.
Social Network Analysis: A Non- Technical Introduction José Luis Molina Universitat Autònoma de Barcelona
Alias Detection Using Social Network Analysis Ralf Holzer, Bradley Malin, Latanya Sweeney LinkKDD 2005 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei,
Harikrishnan Karunakaran Sulabha Balan CSE  Introduction  Database and Query Model ◦ Informal Model ◦ Formal Model ◦ Query and Answer Model 
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
COM1721: Freshman Honors Seminar A Random Walk Through Computing Lecture 2: Structure of the Web October 1, 2002.
Social Network Analysis (1) LING 575 Fei Xia 01/04/2011.
Science: Graph theory and networks Dr Andy Evans.
Network theory David Lusseau BIOL4062/5062
A Graph-based Friend Recommendation System Using Genetic Algorithm
Topology and Evolution of the Open Source Software Community Advisors: Dr. Vincent W. Freeh Dr. Kevin Bowyer Supported in part by the National Science.
Mining Social Network for Personalized Prioritization Language Techonology Institute School of Computer Science Carnegie Mellon University Shinjae.
Mining Social Networks for Personalized Prioritization Shinjae Yoo, Yiming Yang, Frank Lin, II-Chul Moon [KDD ’09] 1 Advisor: Dr. Koh Jia-Ling Reporter:
BotGraph: Large Scale Spamming Botnet Detection Yao Zhao, Yinglian Xie, Fang Yu, Qifa Ke, Yuan Yu, Yan Chen, and Eliot Gillum Speaker: 林佳宜.
Anomaly Detection in Data Mining. Hybrid Approach between Filtering- and-refinement and DBSCAN Eng. Ştefan-Iulian Handra Prof. Dr. Eng. Horia Cioc ârlie.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
The Structure of the Web. Getting to knowing the Web How big is the web and how do you measure it? How many people use the web? How many use search engines?
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.
CONTEXTUAL SEARCH AND NAME DISAMBIGUATION IN USING GRAPHS EINAT MINKOV, WILLIAM W. COHEN, ANDREW Y. NG SIGIR’06 Date: 2008/7/17 Advisor: Dr. Koh,
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
How to Analyse Social Network? Social networks can be represented by complex networks.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
LOGO Comments-Oriented Blog Summarization by Sentence Extraction Meishan Hu, Aixin Sun, Ee-Peng Lim (ACM CIKM’07) Advisor : Dr. Koh Jia-Ling Speaker :
Importance Measures on Nodes Lecture 2 Srinivasan Parthasarathy 1.
Topical Analysis and Visualization of (Network) Data Using Sci2 Ted Polley Research & Editorial Assistant Cyberinfrastructure for Network Science Center.
Spanning Trees Dijkstra (Unit 10) SOL: DM.2 Classwork worksheet Homework (day 70) Worksheet Quiz next block.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Ontology Engineering and Feature Construction for Predicting Friendship Links in the Live Journal Social Network Author:Vikas Bahirwani 、 Doina Caragea.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Ranking in social networks
Groups of vertices and Core-periphery structure
Empirical analysis of Chinese airport network as a complex weighted network Methodology Section Presented by Di Li.
Network analysis.
Community detection in graphs
Network Science: A Short Introduction i3 Workshop
Segmentation Graph-Theoretic Clustering.
Graphs All tree structures are hierarchical. This means that each node can only have one parent node. Trees can be used to store data which has a definite.
Department of Computer Science University of York
Discovery of Blog Communities based on Mutual Awareness
Korea University of Technology and Education
Graphs G = (V, E) V are the vertices; E are the edges.
Practical Applications Using igraph in R Roger Stanton
Algorithms Lecture # 27 Dr. Sohail Aslam.
Discovering Important Nodes through Graph Entropy
Presentation transcript:

Automated Social Hierarchy Detection through Network Analysis (SNAKDD07) Ryan Rowe, Germ´an Creamer, Shlomo Hershkop, Salvatore J Stolfo 1 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/12/11

Outline Introduction SNA algorithm Results and Discussion Conclusions and Future Work 2

Introduction The recent bankruptcy scandals in US companies such as Enron and WorldCom have increased the need to analyze electronic information – In order to define risk and identify any conflict of interest among the entities of a corporate household Identifying the relationships between entities, or corporate hierarchy is not a straightforward task – Can be extracted by analyzing the communication data 3

SNA Algorithm For each mail user – Analyze and calculate several statistics for each feature of each user Construct an network graph – Vertices represent accounts, edges represent communication between two accounts – Analysis cliques and other graph theoretical qualities – Combined to Social score 4

SNA Algorithm Two sets of statistics about user’s “importance” – Average response time The average time elapsed between a user sending an and later receiving an from that same user Considered a “response” if a received mail succeeds a sent mail within three days – Cliques(maximal complete subgraphs) find all cliques in a graph Assumptions: users associated with a larger set and frequency of cliques will be ranked higher 5

Cliques 6

Communication Networks Number of cliques – The number of cliques that the account is contained within Raw clique score – A score computed using the size of clique set Weighted clique score – A score computed using the “importance” of the people in each clique 7

Communication Networks Degree centrality – Deg(vi) = ∑ j a ij (a ij entry of adjacent matrix A of G) Clustering coefficient – how close the vertex and its neighbors are to being a clique 8

Communication Networks Mean of shortest path length from a specific vertex to all vertices in the graph G – where dij D, D is the geodesic distance matrix of G Betweeness centrality – Proportion of all geodesic distances of all other vertex that include vertex v i 9

Communication Networks “Hubs-and-authorities” importance – Calculates the “hubs-and-authorities” importance of each vertex J. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46,

Social Score Social score – Rank users from most important to least important – Group users which have similar social scores and clique connectivity – Determine n different levels of social hierarchy within which to place all the users 11

Compute Social Score Scale and normalize each statistics Social score – A score between 0 and

Results and Discussion Using EMT – Java based analysis engine built on a database back-end – JUNG library is used for the degree and centrality measures Present the analysis of the North American West Power Traders division of Enron Corporation 13

14

15

16

Conclusions and Future Work Enron dataset provides an excellent starting point of real world data By varying the feature weights, it is possible to – Pick out the most important individual – Group individuals with similar social qualities – Graphically draw an organization chart which approximately simulates the real social hierarchy 17

Conclusions and Future Work The concept of average response time can be reworked by considering the order of response Consider common usage times for each user and to adjust the received time of New grouping and division algorithms are being considered Graph edges should be considered into arrange users into different level 18