Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating.

Slides:



Advertisements
Similar presentations
Online Recommendations
Advertisements

Jeremiah Blocki CMU Ryan Williams IBM Almaden ICALP 2010.
Random Forest Predrag Radenković 3237/10
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Ragib Hasan Johns Hopkins University en Spring 2011 Lecture 8 04/04/2011 Security and Privacy in Cloud Computing.
Iowa State University Department of Computer Science Center for Computational Intelligence, Learning, and Discovery Harris T. Lin and Vasant Honavar. BigData2013.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
The End of Anonymity Vitaly Shmatikov. Tastes and Purchases slide 2.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Intro to RecSys and CCF Brian Ackerman 1. Roadmap Introduction to Recommender Systems & Collaborative Filtering Collaborative Competitive Filtering 2.
UTEPComputer Science Dept.1 University of Texas at El Paso Privacy in Statistical Databases Dr. Luc Longpré Computer Science Department Spring 2006.
Leting Wu Xiaowei Ying, Xintao Wu Dept. Software and Information Systems Univ. of N.C. – Charlotte Reconstruction from Randomized Graph via Low Rank Approximation.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Memory-Based Recommender Systems : A Comparative Study Aaron John Mani Srinivasan Ramani CSCI 572 PROJECT RECOMPARATOR.
Probability based Recommendation System Course : ECE541 Chetan Tonde Vrajesh Vyas Ashwin Revo Under the guidance of Prof. R. D. Yates.
1 Jun Wang, 2 Sanjiv Kumar, and 1 Shih-Fu Chang 1 Columbia University, New York, USA 2 Google Research, New York, USA Sequential Projection Learning for.
1 Preserving Privacy in Collaborative Filtering through Distributed Aggregation of Offline Profiles The 3rd ACM Conference on Recommender Systems, New.
Anatomy: Simple and Effective Privacy Preservation Israel Chernyak DB Seminar (winter 2009)
Suppose I learn that Garth has 3 friends. Then I know he must be one of {v 1,v 2,v 3 } in Figure 1 above. If I also learn the degrees of his neighbors,
Privacy Policy, Law and Technology Carnegie Mellon University Fall 2007 Lorrie Cranor 1 Data Privacy.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Sparsity, Scalability and Distribution in Recommender Systems
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
C LOAKING AND M ODELING T ECHNIQUES FOR LOCATION P RIVACY PROTECTION Ying Cai Department of Computer Science Iowa State University Ames, IA
Database Laboratory Regular Seminar TaeHoon Kim.
Baik Hoh Marco Gruteser Hui Xiong Ansaf Alrabady All images are credited to “ACM” Hoh et al (2007), pp
Private Analysis of Graphs
Liang Xiang, Quan Yuan, Shiwan Zhao, Li Chen, Xiatian Zhang, Qing Yang and Jimeng Sun Institute of Automation Chinese Academy of Sciences, IBM Research.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Ragib Hasan University of Alabama at Birmingham CS 491/691/791 Fall 2011 Lecture 16 10/11/2011 Security and Privacy in Cloud Computing.
By Rachsuda Jiamthapthaksin 10/09/ Edited by Christoph F. Eick.
WEMAREC: Accurate and Scalable Recommendation through Weighted and Ensemble Matrix Approximation Chao Chen ⨳ , Dongsheng Li
Publishing Microdata with a Robust Privacy Guarantee
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Protecting Sensitive Labels in Social Network Data Anonymization.
Chengjie Sun,Lei Lin, Yuan Chen, Bingquan Liu Harbin Institute of Technology School of Computer Science and Technology 1 19/11/ :09 PM.
Badrul M. Sarwar, George Karypis, Joseph A. Konstan, and John T. Riedl
Matching Users and Items Across Domains to Improve the Recommendation Quality Created by: Chung-Yi Li, Shou-De Lin Presented by: I Gde Dharma Nugraha 1.
Mingyang Zhu, Huaijiang Sun, Zhigang Deng Quaternion Space Sparse Decomposition for Motion Compression and Retrieval SCA 2012.
Temporal Diversity in Recommender Systems Neal Lathia, Stephen Hailes, Licia Capra, and Xavier Amatriain SIGIR 2010 April 6, 2011 Hyunwoo Kim.
A Content-Based Approach to Collaborative Filtering Brandon Douthit-Wood CS 470 – Final Presentation.
On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge NDSS 2015, Shouling Ji, Georgia Institute of Technology.
Figure 6. Parameter Calculation. Parameters R, T, f, and c are found from m ij. Patient f : camera focal vector along optical axis c : camera center offset.
Privacy-preserving data publishing
Amanda Lambert Jimmy Bobowski Shi Hui Lim Mentors: Brent Castle, Huijun Wang.
Department of Automation Xiamen University
Community-enhanced De-anonymization of Online Social Networks Shirin Nilizadeh, Apu Kapadia, Yong-Yeol Ahn Indiana University Bloomington CCS 2014.
m-Privacy for Collaborative Data Publishing
Model Fusion and its Use in Earth Sciences R. Romero, O. Ochoa, A. A. Velasco, and V. Kreinovich Joint Annual Meeting NSF Division of Human Resource Development.
Yue Xu Shu Zhang.  A person has already rated some movies, which movies he/she may be interested, too?  If we have huge data of user and movies, this.
Location Privacy Protection for Location-based Services CS587x Lecture Department of Computer Science Iowa State University.
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
Probabilistic km-anonymity (Efficient Anonymization of Large Set-valued Datasets) Gergely Acs (INRIA) Jagdish Achara (INRIA)
Jinfang Jiang, Guangjie Han, Lei Shu, Han-Chieh Chao, Shojiro Nishio
Company LOGO MovieMiner A collaborative filtering system for predicting Netflix user’s movie ratings [ECS289G Data Mining] Team Spelunker: Justin Becker,
Item-Based Collaborative Filtering Recommendation Algorithms Badrul Sarwar, George Karypis, Joseph Konstan, and John Riedl GroupLens Research Group/ Army.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
The Wisdom of the Few Xavier Amatrian, Neal Lathis, Josep M. Pujol SIGIR’09 Advisor: Jia Ling, Koh Speaker: Yu Cheng, Hsieh.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Xiaowei Ying, Kai Pan, Xintao Wu, Ling Guo Univ. of North Carolina at Charlotte SNA-KDD June 28, 2009, Paris, France Comparisons of Randomization and K-degree.
Homework 1 Tutorial Instructor: Weidong Shi (Larry), PhD
Discovering Functional Communities in Social Media
Approximating the Community Structure of the Long Tail
ITEM BASED COLLABORATIVE FILTERING RECOMMENDATION ALGORITHEMS
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Recommender Systems Group 6 Javier Velasco Anusha Sama
SHUFFLING-SLICING IN DATA MINING
Presentation transcript:

Malicious parties may employ (a) structure-based or (b) label-based attacks to re-identify users and thus learn sensitive information about their rating history. Recommender systems like Netflix store a database of users and items, and corresponding ratings. Being able to predict which items a user will enjoy can lead to happier customers and better business. The Netflix Challenge: Given the Netflix dataset, devise an algorithm with high prediction accuracy. Problem: Netflix replaces names with random ID #s before publishing to protect privacy, but users can be re-identified by comparing with external sources. How can Netflix ensure users’ privacy without degrading accuracy of prediction results? Challenges: The sparsity of recommendation data presents a two-fold problem: existing anonymization techniques do not perform well on sparse data; simultaneously, users are more easily identifiable. [Narayanan et. al.] Algorithms must be scalable: the Netflix Challenge dataset is a matrix with over 8.5 billion cells! Privacy-Aware Publishing of Netflix Data Motivation Model Anonymization Algorithm References Our experiments show that Padded Anonymization yields high prediction accuracy, as well as providing theoretical privacy guarantees. However, sparsity is reduced in the released dataset, which affects the integrity of the data. Future Work: How to maintain sparsity without loss in prediction accuracy. Extend our methods to prevent link disclosure attacks. Chart: Distribution of ratings before and after padding. Brian Thompson 1, Chih-Cheng Chang 1, Hui (Wendy) Wang 2, Danfeng Yao 1 Table: Prediction accuracy before and after anonymization. We develop a novel approach, Predictive Anonymization, that effectively anonymizes sparse recommendation datasets by introducing a padding phase which reduces sparsity while maintaining the integrity of the data. Our Approach To eliminate the sparsity problem, we perform a pre- processing step to replace null entries with predicted values before anonymizing. Analysis 1 Department of Computer Science, Rutgers University 2 Department of Computer Science, Stevens Institute of Technology Padding Clustering Homogenization null values AlienBatmanRocky VWall-E John Smith352? Godfather English Patient Ben Star Wars English Patient Tim 5 1 (a)(b) Experiment SeriesRMSE Original Data Padded Anonymization (k=5) Padded Anonymization (k=50) Simple Anonymization (k=5) Simple Anonymization (k=50) Conclusions and Future Work Our Predictive Anonymization algorithm guarantees node re-identification privacy in the form of k-anonymity against the stronger label-based attack. Our methods also provide privacy against link disclosure attacks. Details and further discussion can be found in our technical report. The Netflix dataset has 480,189 users and 17,770 movies. We use singular value decomposition (SVD) for the padding step, and also for prediction when necessary. We measure prediction accuracy by comparing the root mean squared error (RMSE) of the published anonymized datasets with that of the original data. We model recommender systems as labeled bipartite review graphs. We say a user is k-anonymous if there are at least k-1 other users with identical item ratings. A. Narayanan, V. Shmatikov. “Robust De-anonymization of Large Sparse Datasets.” S&P C. Chang, B. Thompson, W. Wang, D. Yao. “Predictive Anonymization: Utility-Preserving Publishing of Sparse Recommendation Data.” Technical Report DCS-tr-647, April To increase efficiency, we apply a sampling procedure: 1.Choose a random sample from the padded dataset. 2.Apply the Bounded t-Means Algorithm to the sample to generate t same-size clusters. Put each user in the data set into one of t bins corresponding to its nearest cluster. 3.Once again, apply the Bounded t-Means Algorithm to partition each bin into smaller clusters, which will correspond to the final anonymization groups. To defend against structure-based and label-based attacks, we homogenize the k users in each anonymization group so that they have identical review graphs, including the ratings. Predictive padding helps uncover latent trends in the original data, leading to more accurate clustering results. In Simple Anonymization (left), we average original movie ratings over users in a group. In Padded Anonymization (right), the average is taken over the padded data.