Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal,

Slides:



Advertisements
Similar presentations
PRIVACY IN NETWORK TRACES Ilya Mironov Microsoft Research (Silicon Valley Campus)
Advertisements

I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Anonymity for Continuous Data Publishing
Publishing Set-Valued Data via Differential Privacy Rui Chen, Concordia University Noman Mohammed, Concordia University Benjamin C. M. Fung, Concordia.
The Role of History and Prediction in Data Privacy Kristen LeFevre University of Michigan May 13, 2009.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System ` Introduction With the deployment of smart card automated.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Hani AbuSharkh Benjamin C. M. Fung fung (at) ciise.concordia.ca
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Secure Distributed Framework for Achieving -Differential Privacy Dima Alhadidi, Noman Mohammed, Benjamin C. M. Fung, and Mourad Debbabi Concordia Institute.
Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Anonymizing Healthcare Data: A Case Study on the Blood Transfusion Service Benjamin C.M. Fung Concordia University Montreal, QC, Canada
Privacy-Preserving Data Mashup Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Database Implementation of a Model-Free Classifier Konstantinos Morfonios ADBIS 2007 University of Athens.
Privacy Enhancing Technologies
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
An brief tour of Differential Privacy Avrim Blum Computer Science Dept Your guide:
Finding Personally Identifying Information Mark Shaneck CSCI 5707 May 6, 2004.
Shuchi Chawla, Cynthia Dwork, Frank McSherry, Adam Smith, Larry Stockmeyer, Hoeteck Wee From Idiosyncratic to Stereotypical: Toward Privacy in Public Databases.
Privacy Preserving OLAP Rakesh Agrawal, IBM Almaden Ramakrishnan Srikant, IBM Almaden Dilys Thomas, Stanford University.
Differential Privacy (2). Outline  Using differential privacy Database queries Data mining  Non interactive case  New developments.
Large-Scale Content-Based Image Retrieval Project Presentation CMPT 880: Large Scale Multimedia Systems and Cloud Computing Under supervision of Dr. Mohamed.
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
Multiplicative Weights Algorithms CompSci Instructor: Ashwin Machanavajjhala 1Lecture 13 : Fall 12.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Private Analysis of Graphs
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
CS573 Data Privacy and Security Statistical Databases
Privacy-Aware Personalization for Mobile Advertising
Differentially Private Data Release for Data Mining Noman Mohammed*, Rui Chen*, Benjamin C. M. Fung*, Philip S. Yu + *Concordia University, Montreal, Canada.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
SFU Pushing Sensitive Transactions for Itemset Utility (IEEE ICDM 2008) Presenter: Yabo, Xu Authors: Yabo Xu, Benjam C.M. Fung, Ke Wang, Ada. W.C. Fu,
CHAN Siu Lung, Daniel CHAN Wai Kin, Ken CHOW Chin Hung, Victor KOON Ping Yin, Bob SPRINT: A Scalable Parallel Classifier for Data Mining.
The Sparse Vector Technique CompSci Instructor: Ashwin Machanavajjhala 1Lecture 12 : Fall 12.
Personalized Social Recommendations – Accurate or Private? A. Machanavajjhala (Yahoo!), with A. Korolova (Stanford), A. Das Sarma (Google) 1.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Preservation of Proximity Privacy in Publishing Numerical Sensitive Data J. Li, Y. Tao, and X. Xiao SIGMOD 08 Presented by Hongwei Tian.
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Mining Dependency Relations for Query Expansion in Passage Retrieval Renxu Sun, Chai-Huat Ong, Tat-Seng Chua National University of Singapore SIGIR2006.
1/3/ A Framework for Privacy- Preserving Cluster Analysis IEEE ISI 2008 Benjamin C. M. Fung Concordia University Canada Lingyu.
A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.
Differential Privacy (1). Outline  Background  Definition.
Differential Privacy Xintao Wu Oct 31, Sanitization approaches Input perturbation –Add noise to data –Generalize data Summary statistics –Means,
Private Release of Graph Statistics using Ladder Functions J.ZHANG, G.CORMODE, M.PROCOPIUC, D.SRIVASTAVA, X.XIAO.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Basic Data Mining Techniques Chapter 3-A. 3.1 Decision Trees.
Transforming Data to Satisfy Privacy Constraints 컴퓨터교육 전공 032CSE15 최미희.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Private Data Management with Verification
Understanding Generalization in Adaptive Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Differential Privacy in Practice
Foundations of Privacy Lecture 7
Differential Privacy (2)
Similarity Search: A Matching Based Approach
Walking in the Crowd: Anonymizing Trajectory Data for Pattern Analysis
Published in: IEEE Transactions on Industrial Informatics
Relaxing Join and Selection Queries
Query Specific Ranking
Differential Privacy (1)
Presentation transcript:

Differentially Private Data Release for Data Mining Benjamin C.M. Fung Concordia University Montreal, QC, Canada Noman Mohammed Concordia University Montreal, QC, Canada Rui Chen Concordia University Montreal, QC, Canada Philip S. Yu University of Illinois at Chicago, IL, USA

2 Outline  Overview  Differential privacy  Related Work  Our Algorithm  Experimental results  Conclusion 2

3 Overview 3 Privacy model Anonymization algorithm Data utility

4 Contributions  Proposed an anonymization algorithm that provides differential privacy guarantee  G eneralization-based algorithm for differentially private data release  Proposed algorithm can handle both categorical and numerical attributes  Preserves information for classification analysis 4

5 Outline  Overview  Differential privacy  Related Work  Our Algorithm  Experimental results  Conclusion 5

6 Differential Privacy [DMNS06] 6 A non-interactive privacy mechanism A gives ε -differential privacy if for all neighbour D and D’, and for any possible sanitized database D* Pr A [A(D) = D*] ≤ exp(ε) × Pr A [A(D’) = D*] DD’ D and D’ are neighbors if they differ on at most one record

7 Laplace Mechanism [DMNS06] 7 For example, for a single counting query Q over a dataset D, returning Q(D) + Laplace(1/ε) maintains ε -differential privacy. ∆f = max D,D’ ||f(D) – f(D’)|| 1 For a counting query f: ∆f =1

8  Given a utility function u : ( D × T ) → R for a database instance D, the mechanism A,  A(D, u) = return t with probability proportional to exp(ε×u(D, t)/2 ∆u) gives ε -differential privacy. Exponential Mechanism [MT07] 8

9 Composition properties 9 Sequential composition ∑ i ε i –differential privacy Parallel composition max( ε i )–differential privacy

10 Outline  Overview  Differential privacy  Related Work  Our Algorithm  Experimental results  Conclusion 10

11 Two Frameworks  Interactive: Multiple questions asked/answered adaptively Anonymizer

12 Two Frameworks  Interactive: Multiple questions asked/answered adaptively Anonymizer  Non-interactive: Data is anonymized and released

13 Related Work 13  A. Blum, C. Dwork, F. McSherry, and K. Nissim. Practical privacy: The SuLQ framework. In PODS,  A. Friedman and A. Schuster. Data mining with differential privacy. In SIGKDD, Is it possible to release data for classification analysis ?

14 Why Non-interactive framework ? 14  Disadvantages of interactive approach:  Database can answer a limited number of queries  Big problem if there are many data miners  Provide less flexibility to perform data analysis

15 Non-interactive Framework 0 + Lap(1/ ε ) 15

16 For high-dimensional data, noise is too big 0 + Lap(1/ ε ) 16 Non-interactive Framework

17 Non-interactive Framework

18 Outline  Overview  Differential privacy  Related Work  Our Algorithm  Experimental results  Conclusion 18

19 JobAgeClassCount Any_Job[18-65)4Y4N8 Artist[18-65)2Y2N4 Professional[18-65)2Y2N4 Age [18-65) [18-40)[40-65) Artist[18-40)2Y2N4Artist[40-65)0Y0N0 Anonymization Algorithm [18-30)[30-40) 19 Professional[18-40)2Y1N3Professional[40-65)0Y1N1 Job Any_Job ProfessionalArtist EngineerLawyerDancerWriter

20 Candidate Selection  we favor the specialization with maximum Score value  First utility function: ∆u =  Second utility function: ∆u = 1 20

21 Split Value  The split value of a categorical attribute is determined according to the taxonomy tree of the attribute  How to determine the split value for numerical attribute ? 21

22 Split Value  The split value of a categorical attribute is determined according to the taxonomy tree of the attribute  How to determine the split value for numerical attribute ? AgeClass 60 Y 30 N 25 Y 40 N 25 Y 40 N 45 N 25 Y

23 Anonymization Algorithm O(A pr x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1) 23

24 Anonymization Algorithm O(A pr x|D|log|D|) O(|candidates|) O(|D|) O(|D|log|D|) O(1) O((A pr +h)x|D|log|D|) 24

25 Outline  Overview  Differential privacy  Related Work  Our Algorithm  Experimental results  Conclusion 25

26 Experimental Evaluation  Adult: is a Census data (from UCI repository) 6 continuous attributes. 8 categorical attributes. 45,222 census records 26

27 Data Utility for Max 27

28 Data Utility for InfoGain 28

29 Comparison 29

30 Scalability 30

31 Outline  Overview  Differential privacy  Related Work  Our Algorithm  Experimental results  Conclusion 31

32  Differentially Private Data Release  Generalization-based differentially private algorithm  Provides better utility than existing techniques Conclusions 32

33  Q&A Thank You Very Much 33