Download presentation
Presentation is loading. Please wait.
Published byChris Libby Modified over 10 years ago
1
Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine malmisha,gts@ics.uci.edu
2
Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010
3
category Rating
4
Rising Awareness of Privacy
5
How Privacy apply to Reviews? Traceability Linkability of Ad hoc Reviews Linkablility of Several Accounts
6
Contribution Extensive Study to Measure privacy/linakability in user reviews Propose models that adequately identify authors
7
Settings & Problem Formulation
10
IR: Identified Record IRIR IRIR IRIR IRIR AR AR: Anonymous Record
11
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60
12
Dataset 1 Million Reviews 2000 Users more than 300 review
13
Methodology Naïve Bayesian Model Kullback-Leibler Model Symmetric Version
14
Methodology Anonymous Record AR -> Identified Record IR Naïve Bayesian Model, NB Max IRi P(AR|IR i ) Kullback-Leibler Divergence, KLD Distance(AR, IR_i) and return IR_i with MIN
15
Naïve Bayesian (NB) Identified Record (IR) Anonymous Record (AR) Decreasing Sorted List of IRs
16
Naïve Bayesian Identified Record Anonymous Record Sorted List of IRs
17
Kullback-Leibler Divergence (KLD) Identified Record (IR) Anonymous Record (AR) Increasing Sorted List of IRs
18
Maximum Likelihood Estimation
19
Tokens Unigram: a, ….z Digram: aa, ab,…,zz Rating :1,2,3,4,5 Category: restaurant, Beauty and Spa, Education
20
Lexical Token Results
21
NB -Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10
22
KLD - Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10
23
NB Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1
24
KLD Digram Size 60, LR 99%/ Top-1 Size 30, LR 75%/ Top-1
25
Improvement (1): Combining Lexical and non- Lexical ones
26
Combining in NB model Straightforward P(Rating|IR), P(Category|IR) But for KLD? Weighted Average
27
First, Combine Rating and Category Second, Combine non-lexical and lexical 0.5 0.997/0.97 for Unigram/Digram
28
Rating and Category Beta Value of 0.5
29
Non-lexical and Unigram Alpha Value of 0.997
30
Non-Lexical and Digram Alpha Value of 0.97
31
Token Combining Results
32
Rating, Category, and Unigram - NB Gain, up to 20% Size 30, 60 % To 80% Size 60, 83 % To 96%
33
Rating, Category, and Unigram - KLD Gain, up to 12% Size 40, 68 % To 80% Size 60, 83 % To 92%
34
Rating, Category, and Digram - NB
35
Rating, Category, and Digram - KLD
36
What about Restricting Identified Record (IR) Size?
37
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10
38
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10
39
Restricted IR - NB Affected by IR size
40
Restricted IR - KLD Performed better for smaller IR Size 20 or less, improved The rest, comparable
41
What about Matching All ARs at once?
42
Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10
43
Anonymous Records (ARs) Identified Records (IRs) Matching Model
44
Improvement (2): Matching All IRs At Once
47
MatchAll - Restricted Gain, up to 16% Size 30, From 74% To 90%
48
Matchall - Full Gain, up to 23% Size 20, From 35% To 55%
49
Improvement (3): For Small IR Size
50
Changing it to: 0.5 + Review Length
51
Results – Improvement (3) Size 10, 89% To 92% Size 7, 79% To 84% Gain up to 5%
52
Discussion Implications Cross-Referencing Review Spam Non-Prolific Users Gradually becomes prolific IR of 20, Link Around 70% Anonymous Record Size Linkability high even for small (92% for AR of 10) 60 only 20% of min user contribution
53
Discussion (cont.) Unigram Token Very Comparable for larger AR Entail less resources in the attach 26 VS 676
54
Future Directions Improving more for Small ARs Other Probabilistic Models Using Stylometry Exploring Linkability in other Preference Databases More than one AR for different Users: Exploring it more
55
Conclusion Extensive Study to Assess Linkability of User Reviews For large set of users Using very simple features Users are very exposed even with simple features and large number of authors
56
Thank you all!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.