Presentation is loading. Please wait.

Presentation is loading. Please wait.

Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail.

Similar presentations


Presentation on theme: "Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail."— Presentation transcript:

1 Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail

2 Who is Using Multiple IDs? Consider two chatrooms, “letters” and “numbers” lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :)

3 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!

4 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!mack: i love 5

5 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5

6 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5

7 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t

8 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t

9 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules!

10 Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg/cattjoopmackanna

11 Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) dogg/cattjoopmackanna

12 Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results

13 Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results

14 A Model of a Public Forum Every actor has one response queue Common average and variance of response delay The server has a queue to process messages with very short delay

15 Model Friendship Graph anna catt mack doggjoop

16 Model Alias Graph anna catt mack doggjoop

17 Multi-ID Users One response queue for each actor dogg/cattjoopmackanna catt dogg

18 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! catt dogg anna dogg

19 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks!mack: i love 5 cattdogg mack catt dogg

20 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: i love 5 cattmack anna dogg

21 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 cattmack anna mack dogg

22 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? mack: i love 5 catt: i don’t catt anna mack catt mack

23 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t catt mack catt dogg anna

24 Multi-ID Users lettersnumbers dogg/cattjoopmackanna dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much mack: i love 5 catt: i don’t joop: 27 rules! mackcattdogg joop catt

25 Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! dogg/cattjoopmackanna cattdogg joop dogg mack

26 Multi-ID Users lettersnumbers dogg: hey anna, Z rocks! anna: hey dogg whats new mack: what’s that dogg? dogg: not much dogg: Z is the best mack: i love 5 catt: i don’t joop: 27 rules! catt: i agree joop :) dogg/cattjoopmackanna cattdogg catt joop

27 Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi-ID users Simulation results

28 First Algorithm 1.Collect information {‹time i, ID i ›} 2.Compute minimum time delay minD for every pair of IDs 3.Cluster the delays using k-means into two groups. Call pairs with larger center suspected to be the same actor (red) Call pairs with smaller center suspected to different (blue) 4.Connected components using red edges are ID groups representing one actor

29 1. Collect information anna catt mack dogg joop mack catt dogg time

30 2. minD(dogg, mack) mack dogg mack dogg

31 2. minD(dogg, catt) catt dogg catt dogg

32 3. Cluster minD values minD(dogg, mack) minD(dogg, catt) Different actors Same actor

33 3. Resulting Alias Graph anna catt mack doggjoop Contradictions

34 4. One Red Component anna catt mack doggjoop 9 false positives 10% accuracy

35 Refinement Algorithm 1. 2. 3. 4.Find groups connected with red edges 5.Color IDs into smaller ID groups using blue edges Follow original algorithm

36 4. One Red Component anna catt mack doggjoop

37 5. Color Blue Graph anna catt mack doggjoop

38 5. Color Blue Graph anna catt mack doggjoop 1 false positive 90% accuracy

39 Overview A model of a public forum Two efficient statistics-based algorithms for identification of multi- ID users Simulation results

40 The average number of messages per pair Accuracy without coloring (%) Accuracy with coloring (%) 0.51370.279770.9765 8.27497.314097.3473 87.6899.776299.7826 880.499.999599.9998 The dependence of Accuracy on the number of messages

41 Future Work Real-life testing More sophisticated model –Different types of users –Short and overlapping online appearances –Length of posts Algorithm improvements –Statistical evaluation of the new models –Lexical and semantic analysis


Download ppt "Identifying Multi-ID Users in Open Forums Hung-Ching Chen Mark Goldberg Malik Magdon-Ismail."

Similar presentations


Ads by Google