Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.

Slides:



Advertisements
Similar presentations
Anonymity for Continuous Data Publishing
Advertisements

Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.
Anonymizing Location-based data Jarmanjit Singh Jar_sing(at)encs.concordia.ca Harpreet Sandhu h_san(at)encs.concordia.ca Qing Shi q_shi(at)encs.concordia.ca.
Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca
Secure Distributed Framework for Achieving -Differential Privacy Dima Alhadidi, Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi Concordia Institute.
Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
Application of Stacked Generalization to a Protein Localization Prediction Task Melissa K. Carroll, M.S. and Sung-Hyuk Cha, Ph.D. Pace University, School.
Privacy-Preserving Data Mashup Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Fast Data Anonymization with Low Information Loss 1 National University of Singapore 2 Hong Kong University
1 Privacy Preserving Data Publishing Prof. Ravi Sandhu Executive Director and Endowed Chair March 29, © Ravi.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
Chapter 5 Data mining : A Closer Look.
Database Access Control & Privacy: Is There A Common Ground? Surajit Chaudhuri, Raghav Kaushik and Ravi Ramamurthy Microsoft Research.
Database Laboratory Regular Seminar TaeHoon Kim.
Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Data mining and machine learning A brief introduction.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Integrating Private Databases for Data Analysis IEEE ISI 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke Wang Simon Fraser.
Thwarting Passive Privacy Attacks in Collaborative Filtering Rui Chen Min Xie Laks V.S. Lakshmanan HKBU, Hong Kong UBC, Canada UBC, Canada Introduction.
Protecting Sensitive Labels in Social Network Data Anonymization.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Disclosure risk when responding to queries with deterministic guarantees Krish Muralidhar University of Kentucky Rathindra Sarathy Oklahoma State University.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
Data Anonymization (1). Outline  Problem  concepts  algorithms on domain generalization hierarchy  Algorithms on numerical data.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Privacy-preserving data publishing
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
1 Fuzzy Versus Quantitative Association Rules: A Fair Data-Driven Comparison Shih-Ming Bai and Shyi-Ming Chen Department of Computer Science and Information.
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Anonymizing Data with Quasi-Sensitive Attribute Values Pu Shi 1, Li Xiong 1, Benjamin C. M. Fung 2 1 Departmen of Mathematics and Computer Science, Emory.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
What Is Cluster Analysis?
Privacy-Preserving Data Mining
Private Data Management with Verification
Differential Privacy in Practice
12/2/2018.
Differential Privacy (2)
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Published in: IEEE Transactions on Industrial Informatics
Presentation transcript:

Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada + University of Illinois at Chicago, IL, USA Introduction Privacy-preserving data publishing (PPDP) studies how to transform raw data into a version that is immunized against privacy attacks but that still supports effective data mining tasks. Figure 1: Overview of Privacy-preserving data publishing Differential privacy is one of the strongest privacy definitions.. A differentially-private mechanism ensures that the probability of any output (released data) is equally likely from all nearly identical input data sets and thus guarantees that all outputs are insensitive to any individual’s data. Current techniques: Table 1: Raw data Table 2: Contingency table Our approach: Table 3: Generalized Contingency table Figure 2: Taxonomy tree for Job The proposed method is the first generalization-based algorithm for differentially private data release that preserves information for classification analysis. Figure 3: Tree for partitioning records 1.C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In TCC, A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. Privacy- preserving data publishing: A survey of recent developments. ACM Computing Surveys, 42(4):1–53, June B. C. M. Fung, K. Wang, and P. S. Yu. Anonymizing classification data for privacy preservation. IEEE TKDE, 19(5):711–725, May References Anonymization Algorithm Conclusions For high- dimensional data, noise is too big 0 + Lap(1/ε) Figure 4: Classification accuracy for Max Figure 5: Comparison We employ the publicly available Adult data set. Adult has 45, 222 census records with 6 numerical attributes, 8 categorical attributes, and a binary class column representing two income levels, ≤50K or >50K. We observe two general trends from the experiments. First, the privacy budget has a direct impact on the classification accuracy. Second, the classification accuracy initially increases with the increase of the number of specializations. However, after a certain threshold the accuracy decreases with the increase of the number of specializations. We compare DiffGen with DiffP-C4.5 [2], a differentially private interactive algorithm for building a classifier, and with the top-down specialization (TDS) approach [4] that publishes k-anonymous data for classification analysis. The proposed solution connects the classical generalization technique with output perturbation to effectively anonymize raw data. Experimental results suggest that the proposed solution provides better utility than the interactive approach and the k-anonymous data, and that it is more effective than publishing a contingency table. Acknowledgments: The research is supported in part by the Discovery Grants ( ), Strategic Project Grants ( ), and Canada Graduate Scholarships from the Natural Sciences and Engineering Research Council of Canada, and the NSF grant IIS Experimental Evaluation Consider the raw data set of Table 1. Step 1: The algorithm creates a generalized root partition. Step 2: At each iteration the algorithm uses exponential mechanism to select a candidate for specialization Step 3: The algorithm outputs each leaf partition along with their noisy counts using Laplace mechanism.