Download presentation
Published byNathaniel Mason Modified over 9 years ago
1
Anonymity and Privacy Issues --- re-identification
Yimeng Zhang 12/4/07
2
Index Views on Privacy of Social Media Overview of Re-identification
You are What You Say: Privacy Risks of Public Mentions, Frankowski et al. SIGIR06
3
Improper Use of Personal Information Online
4
Top Privacy Concerns
5
Remaining Anonymous
6
True Information Provide While Registering
7
Ability to Remain Anonymous
8
Importance of Controlling Personal Information
9
Specifying Who Can View Personal Information
10
Conclusion Around 40% of people would like to remain anonymous on social media or social networking sites Most people provide their true personal information while registering Most people think it is important to have the control of personal information online Re-identification Techniques can identify the users of an anonymous dataset
11
Privacy Loss through Re-identification
Re-identification: Linkage of datasets with explicit identifiers with datasets without explicit identifiers through common attributes Datasets without explicit identifiers Public data which are made anonymous by users Public data by research groups (after suitable anonymizing) Public data from government agencies (census) People wish to keep private
12
Example of Re-identification
Voter register list of Massachusetts purchased with only 20$ Public by Group Insurance Commission of Massachusetts 87% of Population in US are likely to be uniquely identified based on only on Zip, Birth and Sex Sweeney, 2002
13
Governor’s medical records!
The Rebus Form + = Governor’s medical records! From Frankowski, SIGIR06
14
Example of face identification
Without explicit identified profiles With explicit identified profiles Friendster Facebook Identity violation! Face Recognizer Gross and Acquisti, WPES 05
15
You Are What You Say: Privacy Risks of Public Mentions
Dan Frankowski, Dan Cosley, Shilad Sen, Loren Terveen, John Riedl University of Minnesota SIGIR 2006
16
Main Idea People can be identified by their preferences and what they talk about Reviews of books, movies, songs Mentions on forums or blogs Friend list on Facebook Wish or purchase list on Amazon Method for Re-identification Datasets are represented in Sparse Relation Spaces Re-identification can be done by matching two Sparse Relation Spaces
17
Sparse Relation Space Relates people to items
Sparse: have few relationships recorded per person Dataset that can be represented in a Sparse Relation Space is vulnerable i1 i2 i3 … p1 X p2 p3
18
Research Questions Risks of dataset release Altering the dataset
What are the risks to user privacy when releasing a dataset Altering the dataset How can dataset owners alter the dataset to preserve user privacy Self defense How can users protect their own privacy
19
Experiment Dataset: MovieLens
Dataset1: Movie Ratings Users do not allow to reveal Released for research use “Anonymous Dataset” Dataset2: Movies Reviews Public
20
Feature of the dataset Both ratings and mentions follow a power law
Important feature for real world sparse relation space Frankowski, SIGIR 06
21
Evaluation Measure Ratings Mentions Mentions by User t Re-identify
Algorithm Top k ratings users ranked by the likelihood they are user t K-identified: t is in the k users returned by the algorithm K-identification rate: the fraction of k-identified users
22
Set Intersection Algorithm for Re-identification
Likely list: Users in the rating database who have rated every movie mentions by user t Problem Users mention movies but do not rate them
23
TF-IDF Algorithm Mentions of a user: vector of the movies the user mentioned Ratings of a user: vector of the movies the user rated Likelihood: TF-IDF cosine similarity
24
Scoring Algorithm Scoring:
emphasize the mentions of rarely rated movies de-emphasize the number of ratings a user has Score for one mention/movie of a user: Fraction of users who have not rated mention m Score for a user: Multiplication of scores for all mentions of this user
25
Scoring Algorithm with Ratings
Suppose we have an magic analyzer which can guess the rating of a movie from the mention Eg. Using the context of that mention Algorithms ExactRating: the analyzer can perfectly determine the rating FuzzingRaing: the analyzer can guess the rating value within +/-1
26
Percent of users identified by different algorithms
27
1-identification rate
28
RQ2: Altering the dataset
How can dataset owners alter the dataset they release to preserve user privacy Data Suppression Algorithm: Drop rarely rated movies Not big problem for industry, but harmful for research
29
Dataset level Suppression
Do not work!
30
RQ3: Self Defence How can users protect their own privacy Suppression
Not to mention movies rated rarely Misdirection Mention items they have not rated
31
User Level Suppression
Do not work!
32
Misdirection Works when user mention popular items
33
Conclusion Simple data mining algorithms can identify the users who mention in a sparse relation space and think they are anonymous Use the algorithms: eg. find paper reviewers (Future work of Frankowski) Privacy risks for users on Social Media sites Hard to preserve privacies Don’t reveal your privacies even if it seems to be anonymous
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.