1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.

Slides:



Advertisements
Similar presentations
Data Mining: Potentials and Challenges Rakesh Agrawal & Jeff Ullman.
Advertisements

Template-Based Privacy Preservation in Classification Problems IEEE ICDM 2005 Benjamin C. M. Fung Simon Fraser University BC, Canada Ke.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
PRIVACY AND SECURITY ISSUES IN DATA MINING P.h.D. Candidate: Anna Monreale Supervisors Prof. Dino Pedreschi Dott.ssa Fosca Giannotti University of Pisa.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #21 Privacy March 29, 2005.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
SAK 5609 DATA MINING Prof. Madya Dr. Md. Nasir bin Sulaiman
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Data Security against Knowledge Loss *) by Zbigniew W. Ras University of North Carolina, Charlotte, USA.
SAC’06 April 23-27, 2006, Dijon, France On the Use of Spectral Filtering for Privacy Preserving Data Mining Songtao Guo UNC Charlotte Xintao Wu UNC Charlotte.
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
CSE 634 Data Mining Techniques Association Rules Hiding (Not Mining) Prateek Duble ( ) Course Instructor: Prof. Anita Wasilewska State University.
Privacy-Preserving Data Mining Rakesh Agrawal Ramakrishnan Srikant IBM Almaden Research Center 650 Harry Road, San Jose, CA Published in: ACM SIGMOD.
An Experimental Study of Association Rule Hiding Techniques Emmanuel Pontikakis* Dept. of Computer Engineering and Informatics.
CS573 Data Privacy and Security
Chapter 16 DATA SECURITY, PRIVACY AND DATA MINING Cios / Pedrycz / Swiniarski / Kurgan.
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Introduction to: 1.  Goal[DEN83]:  Provide frequency, average, other statistics of persons  Challenge:  Preserving privacy[DEN83]  Interaction between.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.
Accuracy-Constrained Privacy-Preserving Access Control Mechanism for Relational Data.
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Privacy Preserving Mining of Association Rules Alexandre Evfimievski, Ramakrishnan Srikant, Rakesh Agrawal, Johannes Gehrke IBM Almaden Research Center.
Privacy preserving data mining Li Xiong CS573 Data Privacy and Anonymity.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Security Control Methods for Statistical Database Li Xiong CS573 Data Privacy and Security.
Additive Data Perturbation: the Basic Problem and Techniques.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
Randomization in Privacy Preserving Data Mining Agrawal, R., and Srikant, R. Privacy-Preserving Data Mining, ACM SIGMOD’00 the following slides include.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy-Preserving K-means Clustering over Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.
Illustration: 3-Party Secure Sum Compare, match, and analyze data from different organizations without disclosing the private data to any other party Experimental.
Data Mining, Security and Privacy Prof. Bhavani Thuraisingham Prof. Murat Kantarcioglu Ms Li Liu (PhD Student – completing December 2007) The University.
Privacy-preserving data publishing
Privacy preserving data mining – multiplicative perturbation techniques Li Xiong CS573 Data Privacy and Anonymity.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
Secure Data Outsourcing
Data Warehousing Data Mining Privacy. Reading FarkasCSCE Spring
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Large-Scale Record Linkage Support for Cloud Computing Platforms Yuan Xue, Bradley Malin, Elizabeth Durham EECS Department, Biomedical Informatics Department,
1 Maintaining Data Privacy in Association Rule Mining Speaker: Minghua ZHANG Oct. 11, 2002 Authors: Shariq J. Rizvi Jayant R. Haritsa VLDB 2002.
DECISION TREE INDUCTION CLASSIFICATION AND PREDICTION What is classification? what is prediction? Issues for classification and prediction. What is decision.
Data Mining, Security and Privacy Prof. Bhavani Thuraisingham Prof. Murat Kantarcioglu Ms Li Liu (PhD Student – completing December 2007) The University.
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Privacy-Preserving Data Mining
Privacy Preserving Data Mining Seminar By Nita Dimble
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Differential Privacy in Practice
Privacy Preserving Data Mining
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Prof. Bhavani Thuraisingham The University of Texas at Dallas
Presentation transcript:

1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty

2 Motivation: Inherent tension in mining sensitive databases: We want to release aggregate information about the data, without leaking individual information about participants. Aggregate info: Number of A students in a school district. Individual info: If a particular student is an A student. Problem: Exact aggregate info may leak individual info. Eg: Number of A students in district, and Number of A students in district not named Dan Waymel Goal: Method to protect individual info, release aggregate info.

3 A growing number of data mining applications need to deal with data sources that are distributed, possibly proprietary, and sensitive to privacy. Financial transactions, health-care records, and network communication traffic are a few examples. Privacy is also becoming an increasingly important issue in data mining applications for counter-terrorism and homeland defense that may require creating profiles, constructing social network models, detecting terrorist communications from distributed privacy sensitive multi-party data. Combining such diverse data sets belonging to different parties may violate the privacy laws. Therefore we need algorithms that can mine the data while guaranteeing that the privacy of the data is not compromised. This has resulted in the development of several privacy- preserving data mining techniques. Many of these techniques work using randomized techniques to perturb the data and preserve the data privacy while still guaranteeing the invariance of the underlying patterns.

4 Goal: Distort data while still preserve some properties for data mining propose. − Additive Based − Multiplicative Based − Condensation based − Decomposition − Data Swapping

5 Randomization approach Hide the original data by randomly modifying the data values using some additive noise still preserving the patterns of the original data (preserving the underlying probabilistic properties) Reconstruct the distribution of original data values from the perturbed data. Cannot reconstruct original values A decision tree classifier is built from the perturbed data from this reconstructed distribution. Privacy breaches Cryptographic approach – Party X –owns Database D1, Party Y –owns Database D2 Build a decision tree built on D1 and D2 without revealing information about D1 to party Y and about D2 to party X except what might be revealed from the decision tree. Horizontally partitioned data - Records (entities) split across parties Vertically partitioned data - Attributes split across parties

6

7 Agrawal R., Srikant R. Privacy-Preserving Data Mining. ACM SIGMOD Conference, “Random Data Perturbation Techniques and Privacy Preserving Data Mining”–Hillol Kargupta, SouptikGupta, QiWang, Krishnamoorthy Sivakumar C. Clifton, M. Kantarcioglu, J. Vaidya, X. Lin, and M. Zhu, Tools for Privacy Preserving Distributed Data Mining, ACM SIGKDD Explorations 4(2), January Privacy Preserving Cooperative Statistical Analysis – WenliangDu, MikhailJ. Atallah Defining Privacy for Data Mining –Chris Clifton, MuratKantarcioglu, JaideepVaidya Data Mining : Concepts and Techniques –JiaweiHan, MichelineKamber

8 Privacy is a personal choice, so should enable individual adaptable (Liu, Kantarcioglu and Thuraisingham ICDM’06)