Download presentation
Presentation is loading. Please wait.
Published byAndrea Skinner Modified over 9 years ago
1
Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail
2
Who is Using Multiple IDs? Consider two chatrooms, “letters” and “numbers” lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :)
3
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!
4
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!mack: i love 5
5
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5
6
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5
7
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t
8
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t
9
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules!
10
Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg/cattjoopmackanna
11
Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) dogg/cattjoopmackanna
12
Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results
13
Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results
14
A Model of a Public Forum Every actor has one response queue Common average and variance of response delay The server has a queue to process messages with very short delay
15
Model Friendship Graph anna catt mack doggjoop
16
Model Alias Graph anna catt mack doggjoop
17
Multi-ID Users One response queue for each actor dogg/cattjoopmackanna catt dogg
18
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! catt dogg anna dogg
19
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!mack: i love 5 cattdogg mack catt dogg
20
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 cattmack anna dogg
21
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 cattmack anna mack dogg
22
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t catt anna mack catt mack
23
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t catt mack catt dogg anna
24
Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! mackcattdogg joop catt
25
Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg/cattjoopmackanna cattdogg joop dogg mack
26
Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) dogg/cattjoopmackanna cattdogg catt joop
27
Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi-ID users Simulation results
28
First Algorithm 1.Collect information {‹time i, ID i ›} 2.Compute minimum time delay minD for every pair of IDs 3.Cluster the delays using k-means into two groups. Call pairs with larger center suspected to be the same actor (red) Call pairs with smaller center suspected to different (blue) 4.Connected components using red edges are ID groups representing one actor
29
1. Collect information anna catt mack dogg joop mack catt dogg time
30
2. minD(dogg, mack) mack dogg mack dogg
31
2. minD(dogg, catt) catt dogg catt dogg
32
3. Cluster minD values minD(dogg, mack) minD(dogg, catt) Different actors Same actor
33
3. Resulting Alias Graph anna catt mack doggjoop Contradictions
34
4. One Red Component anna catt mack doggjoop 9 false positives 10% accuracy
35
Refinement Algorithm 1. 2. 3. 4.Find groups connected with red edges 5.Color IDs into smaller ID groups using blue edges Follow original algorithm
36
4. One Red Component anna catt mack doggjoop
37
5. Color Blue Graph anna catt mack doggjoop
38
5. Color Blue Graph anna catt mack doggjoop 1 false positive 90% accuracy
39
Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results
40
The average number of messages per pair Accuracy without coloring (%) Accuracy with coloring (%) 0.51370.279770.9765 8.27497.314097.3473 87.6899.776299.7826 880.499.999599.9998 The dependence of Accuracy on the number of messages
41
Future Work Real-life testing More sophisticated model –Different types of users –Short and overlapping online appearances –Length of posts Algorithm improvements –Statistical evaluation of the new models –Lexical and semantic analysis
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.