Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine

Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine malmisha,gts@ics.uci.edu

Increasing Popularity of Reviewing Sites Yelp, more than 39M visitors and 15M reviews in 2010

category Rating

Rising Awareness of Privacy

How Privacy apply to Reviews? Traceability Linkability of Ad hoc Reviews Linkablility of Several Accounts

Contribution Extensive Study to Measure privacy/linakability in user reviews Propose models that adequately identify authors

Settings & Problem Formulation

IR: Identified Record IRIR IRIR IRIR IRIR AR AR: Anonymous Record

Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10 1, 5, 10, 20,…60

Dataset 1 Million Reviews 2000 Users more than 300 review

Methodology Naïve Bayesian Model Kullback-Leibler Model Symmetric Version

Methodology Anonymous Record AR -> Identified Record IR Naïve Bayesian Model, NB Max IRi P(AR|IR i ) Kullback-Leibler Divergence, KLD Distance(AR, IR_i) and return IR_i with MIN

Naïve Bayesian (NB) Identified Record (IR) Anonymous Record (AR) Decreasing Sorted List of IRs

Naïve Bayesian Identified Record Anonymous Record Sorted List of IRs

Kullback-Leibler Divergence (KLD) Identified Record (IR) Anonymous Record (AR) Increasing Sorted List of IRs

Maximum Likelihood Estimation

Tokens Unigram: a, ….z Digram: aa, ab,…,zz Rating :1,2,3,4,5 Category: restaurant, Beauty and Spa, Education

Lexical Token Results

NB -Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

KLD - Unigram Size 60, LR 83%/ Top-1 LR 96% Top-10

NB Digram Size 20, LR 97%/ Top-1 Size10, LR 88%/ Top-1

KLD Digram Size 60, LR 99%/ Top-1 Size 30, LR 75%/ Top-1

Improvement (1): Combining Lexical and non- Lexical ones

Combining in NB model Straightforward P(Rating|IR), P(Category|IR) But for KLD? Weighted Average

First, Combine Rating and Category Second, Combine non-lexical and lexical 0.5 0.997/0.97 for Unigram/Digram

Rating and Category Beta Value of 0.5

Non-lexical and Unigram Alpha Value of 0.997

Non-Lexical and Digram Alpha Value of 0.97

Token Combining Results

Rating, Category, and Unigram - NB Gain, up to 20% Size 30, 60 % To 80% Size 60, 83 % To 96%

Rating, Category, and Unigram - KLD Gain, up to 12% Size 40, 68 % To 80% Size 60, 83 % To 92%

Rating, Category, and Digram - NB

Rating, Category, and Digram - KLD

What about Restricting Identified Record (IR) Size?

Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

Restricted IR - NB Affected by IR size

Restricted IR - KLD Performed better for smaller IR Size 20 or less, improved The rest, comparable

What about Matching All ARs at once?

Anonymous Record Size (AR) Identified Record Size (IR) Matching Model TOP-X Linkability X: 1 and 10

Anonymous Records (ARs) Identified Records (IRs) Matching Model

Improvement (2): Matching All IRs At Once

MatchAll - Restricted Gain, up to 16% Size 30, From 74% To 90%

Matchall - Full Gain, up to 23% Size 20, From 35% To 55%

Improvement (3): For Small IR Size

Changing it to: 0.5 + Review Length

Results – Improvement (3) Size 10, 89% To 92% Size 7, 79% To 84% Gain up to 5%

Discussion Implications Cross-Referencing Review Spam Non-Prolific Users Gradually becomes prolific IR of 20, Link Around 70% Anonymous Record Size Linkability high even for small (92% for AR of 10) 60 only 20% of min user contribution

Discussion (cont.) Unigram Token Very Comparable for larger AR Entail less resources in the attach 26 VS 676

Future Directions Improving more for Small ARs Other Probabilistic Models Using Stylometry Exploring Linkability in other Preference Databases More than one AR for different Users: Exploring it more

Conclusion Extensive Study to Assess Linkability of User Reviews For large set of users Using very simple features Users are very exposed even with simple features and large number of authors

Thank you all!

Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine

Similar presentations

Presentation on theme: "Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine

Similar presentations

Presentation on theme: "Exploring Linkability of User Reviews Mishari Almishari and Gene Tsudik Computer Science Department University of California, Irvine"— Presentation transcript:

Similar presentations

About project

Feedback