Privacy-Preserving Clustering

Slides:



Advertisements
Similar presentations
Mix and Match: A Simple Approach to General Secure Multiparty Computation + Markus Jakobsson Bell Laboratories Ari Juels RSA Laboratories.
Advertisements

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
ITIS 6200/ Secure multiparty computation – Alice has x, Bob has y, we want to calculate f(x, y) without disclosing the values – We can only do.
Data Mining Classification: Alternative Techniques
Agglomerative Hierarchical Clustering 1. Compute a distance matrix 2. Merge the two closest clusters 3. Update the distance matrix 4. Repeat Step 2 until.
Fine-grained Private Matching for Proximity-based Mobile Social Networking INFOCOM 2012 Rui Zhang, Yanchao Zhang Jinyuan (Stella) Sun Arizona State University.
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
What Crypto Can Do for You: Solutions in Search of Problems Anna Lysyanskaya Brown University.
Yan Huang, Jonathan Katz, David Evans University of Maryland, University of Virginia Efficient Secure Two-Party Computation Using Symmetric Cut-and-Choose.
6/3/2015 T.K. Cocx, Prediction of criminal careers through 2- dimensional Extrapolation W. Kosters et al.
CS 590M Fall 2001: Security Issues in Data Mining Lecture 3: Classification.
Proactive Secure Mobile Digital Signatures Work in progress. Ivan Damgård and Gert Læssøe Mikkelsen University of Aarhus.
1 Life-and-Death Problem Solver in Go Author: Byung-Doo Lee Dept of Computer Science, Univ. of Auckland Presented by: Xiaozhen Niu.
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
Parallel K-Means Clustering Based on MapReduce The Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences Weizhong Zhao, Huifang.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
31.1 Chapter 31 Network Security Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display.
CS573 Data Privacy and Security
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Tools for Privacy Preserving Distributed Data Mining
Cryptographic methods for privacy aware computing: applications.
1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Privacy-Preserving Linear Programming Olvi Mangasarian UW Madison & UCSD La Jolla UCSD – Center for Computational Mathematics Seminar January 11, 2011.
Ahmed Osama Research Assistant. Presentation Outline Winc- Nile University- Privacy Preserving Over Network Coding 2  Introduction  Network coding 
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Privacy-Preserving K-means Clustering over Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Privacy-Preserving Self- Organizing Map Shuguo Han and Wee Keong Ng Center for Advanced Information Systems, School of Computer Engineering,Nanyang Technological.
1 Limiting Privacy Breaches in Privacy Preserving Data Mining In Proceedings of the 22 nd ACM SIGACT – SIGMOD – SIFART Symposium on Principles of Database.
CLIQUE FINDER By Ryan Lange, Thomas Dvornik, Wesley Hamilton, and Bill Hess.
CLUSTERING HIGH-DIMENSIONAL DATA Elsayed Hemayed Data Mining Course.
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Multi-Party Computation r n parties: P 1,…,P n  P i has input s i  Parties want to compute f(s 1,…,s n ) together  P i doesn’t want any information.
LCA1 Erman Ayday, Jean Louis Raisaro and Jean-Pierre Hubaux Privacy-Enhancing Technologies for Medical Tests and Personalized Medicine Laboratory for Computer.
CS480 Cryptography and Information Security Huiping Guo Department of Computer Science California State University, Los Angeles 14. Digital signature.
Presented by Edith Ngai MPhil Term 3 Presentation
Computer Communication & Networks
Computing and Compressive Sensing in Wireless Sensor Networks
Open Problems in Streaming
Privacy Preserving Similarity Evaluation of Time Series Data
Introduction to security goals and usage of cryptographic algorithms
Cryptography.
Information Security message M one-way hash fingerprint f = H(M)
Course Business I am traveling April 25-May 3rd
Finding Communities by Clustering a Graph into Overlapping Subgraphs
Private and Secure Secret Shared MapReduce
PART VII Security.
Security in Network Communications
Privacy Preserving Data Mining
Privacy Preserving analytics Private Set Intersection(PSI)
KMeans Clustering on Hadoop Fall 2013 Elke A. Rundensteiner
Outline H. Murase, and S. K. Nayar, “Visual learning and recognition of 3-D objects from appearance,” International Journal of Computer Vision, vol. 14,
3. Brute Force Selection sort Brute-Force string matching
Cryptography Reference: Network Security
Secure How do you do it? Need to worry about sniffing, modifying, end-user masquerading, replaying. If sender and receiver have shared secret keys,
Privacy preserving cloud computing
3. Brute Force Selection sort Brute-Force string matching
Interactive Proofs and Secure Multi-Party Computation
Chapter 3 - Public-Key Cryptography & Authentication
Mingzhen Mo and Irwin King
Algorithms Lecture # 01 Dr. Sohail Aslam.
Helen: Maliciously Secure Coopetitive Learning for Linear Models
EM Algorithm and its Applications
Secure Diffie-Hellman Algorithm
Key Exchange, Man-in-the-Middle Attack
A Light-weight Oblivious Transfer Protocol Based on Channel Noise
3. Brute Force Selection sort Brute-Force string matching
Presentation transcript:

Privacy-Preserving Clustering

Outline Introduction Related Work Preliminaries Secure Multi-Party Computation Data Sanitization Preliminaries Yao’s Millionaires’ Problem Homomorphic Encryption Privacy-Preserving K-Means Clustering Conclusion

Introduction Why needs privacy-preserving? Data sharing in today's globally networked systems poses a threat to individual privacy and organizational confidentiality. The privacy problem is not data mining, but the way data mining is done. So, privacy and data mining can coexist. An important data mining problem: clustering.

Related Work Privacy-preserving clustering: Secure multi-party computation. High computation and communication costs. Data sanitization. Lost of accuracy. Dimensionality reduction. Model-based solutions.

Yao’s Millionaires’ Problem Two millionaires wish to know who is richer; however, they do not want to find out any additional information about each other’s wealth.

Solutions Suppose Alice has i millions. Bob has j millions. 1 < i, j < 10.

Solutions Suppose (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2. Alice: i = 5, Bob: j = 3. (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2. x Ea(x) 1 7 2 3 5 4 8 6 9 10

Solutions (A) y1 = Da(2) = Da(k - j + 1) = 6. y2 = Da(3) = 2. yj = y3 = Da(4) = Da(k - j + j) = 7. y4 = Da(5) = 3. y5 = Da(6) = 5. y6 = Da(7) = 1. y7 = Da(8) = 4. … x Ea(x) 1 7 2 3 5 4 8 6 9 10

Solutions (A) 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. z1 = y1 = 6. z2 = y2 = 2. z3 = y3 = Da(k - j + j) = 7. z4 = y4 = 3. z5 = yi = y5 = 5. z6 = yi + 1 = y5 + 1 = 6. z7 = y6 + 1 = 2. … 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. If no, means that i < j.

Homomorphic Encryption If there is an algorithm ⊕ to compute H(x⊕y) from H(x) and H(y) that does not reveal x or y. H(x⊕y) = H(x) ⊙ H(y) RSA, … Additive homomorphic: H(x+y) = H(x) * H(y) Paillier, …

Homomorphic Encryption

Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data SIGKDD, 2003

Problem Definition Goal: Input: Output: Cluster the known set of common entities without revealing any value that the clustering is based on. Input: Each user provides one attribute of all items. Output: Assignment of entities to clusters. Cluster centers themselves.

K-Means Clustering

new center computation K-Means Clustering cluster decision new center computation distance matrix

Vertically Partitioned Data User 1 User 2

Terminology r: # of users, each having different attributes for the same set of items. n: # of the common items. k: # of clusters required. ui: each cluster mean, i = 1, …, k. uij: projection of the mean of cluster i on user j. Final result for user j: The final value / position of uij, i = 1, …, k. Cluster assignments: clusti for all points i = 1, …, n.

Privacy-Preserving K-Means Clustering

Securely Finding the Closest Cluster

Securely Finding the Closest Cluster The security of the algorithm is based on three key ideas. Disguise the site components of the distance with random values that cancel out when combined. Permute the order of clusters so the real meaning of the comparison results is unknown. Compare distances so only the comparison result is learned; no party knows the distances being compared.

Securely Finding the Closest Cluster

Securely Finding the Closest Cluster

Securely Finding the Closest Cluster

Check Threshold m j

Conclusion Horizontally partitioned data: User 1 User 2