Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine

Similar presentations


Presentation on theme: "Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine"— Presentation transcript:

1 Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine malmisha,gts@ics.uci.edu

2 Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010

3 category Rating

4 Rising Awareness of Privacy

5 How Privacy apply to Reviews? Traceability Linkability of Ad hoc Reviews Linkablility of Several Accounts

6 Contribution Extensive Study to Measure privacy/linakability in user reviews Propose models that adequately identify authors

7 Settings & Problem Formulation

8

9

10 IR: Identified Record IRIR IRIR IRIR IRIR AR AR: Anonymous Record

11 Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60

12 Dataset 1 Million Reviews 2000 Users more than 300 review

13 Methodology Naïve Bayesian Model Kullback-Leibler Model Symmetric Version

14 Methodology Anonymous Record AR -> Identified Record IR Naïve Bayesian Model, NB Max IRi P(AR|IR i ) Kullback-Leibler Divergence, KLD Distance(AR, IR_i) and return IR_i with MIN

15 Naïve Bayesian (NB) Identified Record (IR) Anonymous Record (AR) Decreasing Sorted List of IRs

16 Naïve Bayesian Identified Record Anonymous Record Sorted List of IRs

17 Kullback-Leibler Divergence (KLD) Identified Record (IR) Anonymous Record (AR) Increasing Sorted List of IRs

18 Maximum Likelihood Estimation

19 Tokens Unigram: a, ….z Digram: aa, ab,…,zz Rating :1,2,3,4,5 Category: restaurant, Beauty and Spa, Education

20 Lexical Token Results

21 NB -Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

22 KLD - Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

23 NB Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1

24 KLD Digram Size 60, LR 99%/ Top-1 Size 30, LR 75%/ Top-1

25 Improvement (1): Combining Lexical and non- Lexical ones

26 Combining in NB model Straightforward P(Rating|IR), P(Category|IR) But for KLD? Weighted Average

27 First, Combine Rating and Category Second, Combine non-lexical and lexical 0.5 0.997/0.97 for Unigram/Digram

28 Rating and Category Beta Value of 0.5

29 Non-lexical and Unigram Alpha Value of 0.997

30 Non-Lexical and Digram Alpha Value of 0.97

31 Token Combining Results

32 Rating, Category, and Unigram - NB Gain, up to 20% Size 30, 60 % To 80% Size 60, 83 % To 96%

33 Rating, Category, and Unigram - KLD Gain, up to 12% Size 40, 68 % To 80% Size 60, 83 % To 92%

34 Rating, Category, and Digram - NB

35 Rating, Category, and Digram - KLD

36 What about Restricting Identified Record (IR) Size?

37 Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

38 Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

39 Restricted IR - NB Affected by IR size

40 Restricted IR - KLD Performed better for smaller IR Size 20 or less, improved The rest, comparable

41 What about Matching All ARs at once?

42 Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

43 Anonymous Records (ARs) Identified Records (IRs) Matching Model

44 Improvement (2): Matching All IRs At Once

45

46

47 MatchAll - Restricted Gain, up to 16% Size 30, From 74% To 90%

48 Matchall - Full Gain, up to 23% Size 20, From 35% To 55%

49 Improvement (3): For Small IR Size

50 Changing it to: 0.5 + Review Length

51 Results – Improvement (3) Size 10, 89% To 92% Size 7, 79% To 84% Gain up to 5%

52 Discussion Implications Cross-Referencing Review Spam Non-Prolific Users Gradually becomes prolific IR of 20, Link Around 70% Anonymous Record Size Linkability high even for small (92% for AR of 10) 60 only 20% of min user contribution

53 Discussion (cont.) Unigram Token Very Comparable for larger AR Entail less resources in the attach 26 VS 676

54 Future Directions Improving more for Small ARs Other Probabilistic Models Using Stylometry Exploring Linkability in other Preference Databases More than one AR for different Users: Exploring it more

55 Conclusion Extensive Study to Assess Linkability of User Reviews For large set of users Using very simple features Users are very exposed even with simple features and large number of authors

56 Thank you all!


Download ppt "Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine"

Similar presentations


Ads by Google