Download presentation
Presentation is loading. Please wait.
Published byGwendolyn Sharp Modified over 9 years ago
1
Email Alias Detection Using Social Network Analysis Ralf Holzer, Bradley Malin, Latanya Sweeney LinkKDD 2005 Advisor: Dr. Koh Jia-Ling Reporter: Che-Wei, Liang Date: 2008/08/14 1
2
Outline Introduction Alias Detection Method – Data Representation – Ranking Algorithms Experiment 2
3
Introduction Individuals use aliases for various communication purposes Alias detection – Useful to both legitimate and illegitimate applications – Important to understand the extent to which the process can be automated 3
4
Introduction Aliases are listed on the same webpage can indicate there exists some form of relationship between them Many people use several email addresses – This paper attempt to determine which email addresses correspond to the same entity 4
5
Introduction Email addresses, a type of alias, can be distilled from a large number of web pages – Such as class rosters, research papers, discussion boards Email addresses provide a unique mapping from address to a specific entity 5
6
Data Representation Let S represent the set of sources Modeled as an undirected graph G = (I, E) – I be the set of unique email addresses – C ab = |e ab | denote the number of sources associated with each edge connecting a and b 6
7
Ranking Algorithms Ranking method – Top-k list of possible aliases – Shortest path algorithm Used geodesic distance to generate a ranking of nodes closest to a given originating node Relationship strength is augmented with – Number of aliases on a source – Number of collocations of aliases 7
8
Ranking Algorithms Geodesic distance – Length of the shortest path from a to b – Potential aliases are ranked from lowest to highest geodesic distance Multiple Collocation – Two aliases which collocate on more than one webpage signifies a stronger relationship 8
9
Ranking Algorithms Source Size – Strength between two aliases in inversely correlated with the number of aliases in a source Combined – Integrates both of previous assumptions 9
10
Experiment Derived from CMU web pages – 1978 distinct email aliases Data Set Statistics 10
11
Experiment 11
12
Experiment Geodesic Alias Distances 12
13
Experiment 13
14
Experiment 14
15
Experiment 15
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.