Download presentation
Presentation is loading. Please wait.
Published byAmanda Copeland Modified over 9 years ago
1
Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik University of California, Irvine
2
Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion
3
Motivation Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010
4
Example category Rating
5
Motivation Rising awareness of privacy
6
Motivation How is it applied? Traceability/Linkability Linkability of Ad hoc Reviews Linkablility of Several Accounts
7
Goal Assess the linkability in user reviews
8
Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion
9
Data Set 1 Million Reviews 2000 Users more than 300 reviews
10
Problem Settings
12
IR: Identified Record IRIR IRIR IRIR IRIR AR AR: Anonymous Record Problem Formulation
13
Anonymous Record (AR) Identified Records (IR’s) Matching Model TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60 Problem Settings
14
Methodologies (1) Naïve Bayesian Model (2) Kullback-Leibler Divergence (KLD) Decreasing Sorted List of IRs Increasing Sorted List of IRs Maximum-Likelihood Estimation
15
Tokens Unigram: “privacy”: “p”, “r”, “i”, “v”, “a”, “c”, “y” 26 values Digram “privacy”: “pr”, “ri”, “iv”, “va”, “ac”, “cy” 676 values Rating 5 values Category 28 values
16
Naïve Bayesian Identified Record Anonymous Record Decreasing Sorted List of IRs
17
Kullback-Leibler Divergence (KLD ) Identified Record (IR) Anonymous Record (AR) Increasing Sorted List of IRs
18
Maximum Likelihood Estimation
19
Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion
20
NB -Unigram Unigram Results Anonymous Record Size Linkability Ratio Size 60, LR 83%/ Top-1 LR 96% Top-10
21
Digram Results NB -Digram Linkability Ratio Anonymous Record Size Size 20, LR 97%/ Top- 1 Size10, LR 88%/ Top- 1
22
Improvement (1): Combining Lexical and non-Lexical ones NB Model Anonymous Record Size Linkability Ratio Gain, up to 20% Size 60, 83 % To 96% Size 30, 60 % To 80%
23
First, Combine Rating and Category Second, Combine non-lexical and lexical 0.5 0.997/0.97 for Unigram/Digram KLD Weighted Average
24
Rating and Category Beta Value of 0.5
25
Non-lexical and Unigram Alpha Value of 0.997
26
Non-Lexical and Digram Alpha Value of 0.97
27
What about Restricting Identified Record (IR) Size? NB Model KLD Model Anonymous Record Size Linkability Ratio Anonymous Record Size Linkability Ratio Affected by IR size Performed better for smaller IR Size 20 or less, improved
28
✔ ✔ ✔ ✔ ✖ ✖ ✖ ✖ ✖ ✖ v1 v3 v2 v4 v7 v5 v6 v8 v9 v10 v11 v12 v13 v14 v15 v16 Improvement (2): Matching All IR’s At Once
29
Matching All Results Restricted IRFull IR Anonymous Record Size Linkability Ratio Anonymous Record Size Linkability Ratio Gain, up to 16% Size 30, From 74% To 90% Gain, up to 23% Size 20, From 35% To 55%
30
Improvement (3): For Small IR Size Changing it to:0.5+ Review Length Anonymous Record Size Linkability Ratio Size 10, 89% To 92% Size 7, 79% To 84% Gain up to 5%
31
Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion
32
Discussion o Unigram and Scalability o 26 VS 676 o 59 VS 676 o Less than 10% o Prolific Users o On the long run, will be prolific o Anonymous Record Size o A set of 60 reviews, less than 20% of minimum contribution o Detecting Spam Reviews
33
Roadmap 1.Introduction 2.Data Set & Problem Settings 3.Linkability Results & Improvements 4.Discussion 5.Future Work & Conclusion
34
Future Work o Improving more for Small AR’s o Other Probabilistic Models o Using Stylometry o Review Anonymization o Exploring Linkability in other Preference Databases
35
Conclusion o Extensive Study to Assess Linkability of User Reviews o For large set of users o Using very simple features o Users are very exposed even with simple features and large number of authors Reviews can be accurately de-anonymized using alphabetical letter distributions Takeaway Point:
36
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.