1 Privacy Enhancing Technologies Elaine Shi Lecture 2 Attack slides partially borrowed from Narayanan, Golle and Partridge.

Slides:



Advertisements
Similar presentations
Challenges in Making Tomography Practical
Advertisements

Measurement and Analysis of Online Social Networks 1 A. Mislove, M. Marcon, K Gummadi, P. Druschel, B. Bhattacharjee Presentation by Shahan Khatchadourian.
Correctness of Gossip-Based Membership under Message Loss Maxim GurevichIdit Keidar Technion.
Vote Elicitation with Probabilistic Preference Models: Empirical Estimation and Cost Tradeoffs Tyler Lu and Craig Boutilier University of Toronto.
- A Powerful Computing Technology Department of Computer Science Wayne State University 1.
CS525: Special Topics in DBs Large-Scale Data Management
Campaign Overview Mailers Mailing Lists
Chapter 1 Object Oriented Programming 1. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
1 Generating Network Topologies That Obey Power LawsPalmer/Steffan Carnegie Mellon Generating Network Topologies That Obey Power Laws Christopher R. Palmer.
On the Optimal Placement of Mix Zones Julien Freudiger, Reza Shokri and Jean-Pierre Hubaux PETS, 2009.
Purnamrita Sarkar (Carnegie Mellon) Deepayan Chakrabarti (Yahoo! Research) Andrew W. Moore (Google, Inc.)
1 Developing a Predictive Model for Internet Video Quality-of-Experience Athula Balachandran, Vyas Sekar, Aditya Akella, Srinivasan Seshan, Ion Stoica,
RecMax – Can we combine the power of Social Networks and Recommender Systems? Amit Goyal and L. RecMax: Exploting Recommender Systems for Fun and Profit.
Website Success It isn’t Creative, if it Doesn’t Sell.
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu Lecture 27 – Overview of probability concepts 1.
Local Search Jim Little UBC CS 322 – CSP October 3, 2014 Textbook §4.8
Choosing an Order for Joins
Self-training with Products of Latent Variable Grammars Zhongqiang Huang, Mary Harper, and Slav Petrov.
Cisco Confidential 1 © 2010 Cisco and/or its affiliates. All rights reserved. Security in Banking Emmanuel van de Geer Senior Architect Governance, Risk,
Amit Goyal Laks V. S. Lakshmanan RecMax: Exploiting Recommender Systems for Fun and Profit University of British Columbia
Adaptive Segmentation Based on a Learned Quality Metric
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Ragib Hasan Johns Hopkins University en Spring 2010 Lecture 3 02/15/2010 Security and Privacy in Cloud Computing.
De-anonymizing social networks Arvind Narayanan, Vitaly Shmatikov.
Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
The End of Anonymity Vitaly Shmatikov. Tastes and Purchases slide 2.
Oct 14, 2014 Lirong Xia Recommender systems acknowledgment: Li Zhang, UCSC.
Sean Blong Presents: 1. What are they…?  “[…] specific type of information filtering (IF) technique that attempts to present information items (movies,
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
1 Collaborative Filtering and Pagerank in a Network Qiang Yang HKUST Thanks: Sonny Chee.
Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Structure based Data De-anonymization of Social Networks and Mobility Traces Shouling Ji, Weiqing Li, and Raheem Beyah Georgia Institute of Technology.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
Preserving Link Privacy in Social Network Based Systems Prateek Mittal University of California, Berkeley Charalampos Papamanthou.
FaceTrust: Assessing the Credibility of Online Personas via Social Networks Michael Sirivianos, Kyungbaek Kim and Xiaowei Yang in collaboration with J.W.
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Stenography.
Privacy risks of collaborative filtering Yuval Madar, June 2012 Based on a paper by J.A. Calandrino, A. Kilzer, A. Narayanan, E. W. Felten & V. Shmatikov.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2012 Lecture 4 09/10/2013 Security and Privacy in Cloud Computing.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
KNN CF: A Temporal Social Network kNN CF: A Temporal Social Network Neal Lathia, Stephen Hailes, Licia Capra University College London RecSys ’ 08 Advisor:
Anonymity and Privacy Issues --- re-identification
Bloom Cookies: Web Search Personalization without User Tracking Authors: Nitesh Mor, Oriana Riva, Suman Nath, and John Kubiatowicz Presented by Ben Summers.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Graph Data Management Lab, School of Computer Science Personalized Privacy Protection in Social Networks (VLDB2011)
The world’s libraries. Connected. Managing your Private and Public Data: Bringing down Inference Attacks against your Privacy Group Meeting in 2015.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
1 Link Privacy in Social Networks Aleksandra Korolova, Rajeev Motwani, Shubha U. Nabar CIKM’08 Advisor: Dr. Koh, JiaLing Speaker: Li, HueiJyun Date: 2009/3/30.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Announcements Paper presentation Project meet with me ASAP
University of Texas at El Paso
Semi-Supervised Clustering
ACHIEVING k-ANONYMITY PRIVACY PROTECTION USING GENERALIZATION AND SUPPRESSION International Journal on Uncertainty, Fuzziness and Knowledge-based Systems,
Statistical Identification of Encrypted Web-Browsing Traffic
SocialMix: Supporting Privacy-aware Trusted Social Networking Services
Differential Privacy in Practice
Q4 : How does Netflix recommend movies?
Personalized Privacy Protection in Social Networks
Lecture 27: Privacy CS /7/2018.
Published in: IEEE Transactions on Industrial Informatics
TELE3119: Trusted Networks Week 4
Exploiting Unintended Feature Leakage in Collaborative Learning
Presentation transcript:

1 Privacy Enhancing Technologies Elaine Shi Lecture 2 Attack slides partially borrowed from Narayanan, Golle and Partridge

2 The uniqueness of high-dimensional data In this class: How many male: How many 1st year: How many work in PL: How many satisfy all of the above:

How many bits of information needed to identify an individual? World population: 7 billion log 2 (7 billion) = 33 bits!

Attack or “privacy != removing PII” GenderYearAreaSensitive attribute … … … Male1stPL(some value) … … Adversary’s auxiliary information

5 “Straddler attack” on recommender system Amazon People who bought also bought

Where to get “auxiliary information” Personal knowledge/communication Your Facebook page!! Public datasets –(Online) white pages –Scraping webpages Stealthy –Web trackers, history sniffing –Phishing attacks or social engineering attacks in general

Linkage attack! 87% of US population have unique date of birth, gender, and postal code! [Golle and Partridge 09]

Uniqueness of live/work locations [Golle and Partridge 09]

Attackers Global surveillance Phishing Nosy friend Advertising/marketing

11 Case Study: Netflix dataset

Linkage attack on the netflix dataset Netflix: online movie rental service In October 2006, released real movie ratings of 500,000 subscribers –10% of all Netflix users as of late 2005 –Names removed, maybe perturbed

The Netflix dataset Movie 1Movie 2Movie 3… AliceRating/ timestamp Rating/ timestamp Rating/ timestamp …… Bob Charles David Evelyn … … 500K users 17K movies – high dimensional! Average subscriber has 214 dated ratings

Netflix Dataset: Nearest Neighbor Considering just movie names, for 90% of records there isn’t a single other record which is more than 30% similar similarity Curse of dimensionality

15 Deanonymizing the Netflix Dataset How many does the attacker need to know to identify his target’s record in the dataset? –Two is enough to reduce to 8 candidate records –Four is enough to identify uniquely (on average) –Works even better with relatively rare ratings “The Astro-Zombies” rather than “Star Wars” Fat Tail effect helps here: most people watch obscure crap (really!)

16 Challenge: Noise Noise: data omission, data perturbation Can’t simply do a join between 2 DBs Lack of ground truth –No oracle to tell us that deaonymization succeeded! –Need a metric of confidence?

Scoring and Record Selection Score(aux,r’) = min i  supp(aux) Sim(aux i,r’ i ) –Determined by the least similar attribute among those known to the adversary as part of Aux –Heuristic:  i  supp(aux) Sim(aux i,r’ i ) / log(|supp(i)|) Gives higher weight to rare attributes Selection: pick at random from all records whose scores are above threshold –Heuristic: pick each matching record r’ with probability ce score(aux,r’)/  Selects statistically unlikely high scores

18 How Good Is the Match? It’s important to eliminate false matches –We have no deanonymization oracle, and thus no “ground truth” “Self-test” heuristic: difference between best and second-best score has to be large relative to the standard deviation –(max-max 2 ) /    Eccentricity

19 Eccentricity in the Netflix Dataset Algorithm is given Aux of a record in the dataset … Aux of a record not in the dataset max-max2 aux score

Avoiding False Matches Experiment: after algorithm finds a match, remove the found record and re-run With very high probability, the algorithm now declares that there is no match

Case study: Social network deanonymization Where “high-dimensionality” comes from graph structure and attributes

Motivating scenario: Overlapping networks Social networks A and B have overlapping memberships Owner of A releases anonymized, sanitized graph –say, to enable targeted advertising Can owner of B learn sensitive information from released graph A’?

Releasing social net data: What needs protecting? ΩάΩά ∆↙ð∆↙ð ð Đð Ω ð ↙Λ↙Λ ΛΞά Ξ Ξ Ω Node attributes SSN Sexual orientation Edge attributes Date of creation Strength Edge existence

24 IJCNN/Kaggle Social Network Challenge

A B A B C D E C D F E F J 1 K 1 J 2 K 2 J 3 K 3 Training GraphTest Set IJCNN/Kaggle Social Network Challenge

Deanonymization: Seed Identification Anonymized Competition Graph Crawled Flickr Graph

Propagation of Mappings Graph 1 Graph 2 “Seeds”

29 Challenges: Noise and missing info Both graphs are subgraphs of Flickr Not even induced subgraph Some nodes have very little information Loss of InformationGraph Evolution A small constant fraction of nodes/edges have changed

Similarity measure

Combining De-anonymization with Link Prediction

Case study: Amazon attack Where “high-dimensionality” comes from temporal dimension

Item-to-item recommendations

34 Selecting an item makes it and past choices more similarThus, output changes in response to transactions Modern Collaborative Filtering Recommender System Item-Based and Dynamic

35 Based on those changes, we infer transactionsWe can see the recommendation lists for auxiliary itemsToday, Alice watches a new show (we don’t know this) Inferring Alice’s Transactions...and we can see changes in those lists

Summary for today High dimensional data is likely unique –easy to perform linkage attacks What this means for privacy –Attacker background knowledge is important in formally defining privacy notions –We will cover formal privacy definitions in later lectures, e.g., differential privacy

37 Homework The Netflix attack is a linkage attack by correlating multiple data sources. Can you think of another application or other datasets where such a linkage attack might be exploited to compromise privacy? The Memento and the web application paper are examples of side-channel attacks. Can you think of other potential side channels that can be exploited to leak information in unintended ways?

38 Reading list [Suman and Vitaly 12] Memento: Learning Secrets from Process FootprintsMemento: Learning Secrets from Process Footprints [Arvind and Vitaly 09] De-anonymizing Social NetworksDe-anonymizing Social Networks [Arvind and Vitaly 07] How to Break Anonymity of the Netflix Prize Dataset.How to Break Anonymity of the Netflix Prize Dataset. [Shuo et.al. 10] Side-Channel Leaks in Web Applications: a Reality Today, a Challenge TomorrowSide-Channel Leaks in Web Applications: a Reality Today, a Challenge Tomorrow [Joseph et.al. 11] “You Might Also Like:” Privacy Risks of Collaborative Filtering“You Might Also Like:” Privacy Risks of Collaborative Filtering [Tom et. al. 09] Hey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute CloudsHey, You, Get Off of My Cloud: Exploring Information Leakage in Third-Party Compute Clouds [Zhenyu et.al. 12] Whispers in the Hyper-space: High-speed Covert Channel Attacks in the CloudWhispers in the Hyper-space: High-speed Covert Channel Attacks in the Cloud