Download presentation
Presentation is loading. Please wait.
1
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton
2
Overview Global Problem –Privacy Preserving Distributed Data Mining Specific Problem –Clustering (K-Means) For –Vertically Partitioned Data Using –Cryptographic Tools
3
Clustering Grouping similar objects/instances into clusters Issues Data is often distributed Privacy/Security Concerns Individual Privacy Entity Privacy Scalability Outline Vertical Data Partitioning Motivation Brief Introduction to PPDM / SMC K-Means Algorithm Privacy Preserving K- Means Algorithm Communication Cost Conclusions Security Proofs (Disclaimer!)
4
Outline Vertical Partitioning of Data Motivation Brief Introduction to PPDM / SMC K-Means Algorithm Privacy Preserving K-Means Algorithm –Closest Cluster Computation –When to stop Communication Cost Conclusions Security Proofs (Disclaimer!)
5
Medical Records RPJBrain TumorDiabetic CACNo TumorNon-Diabetic PTRNo TumorDiabetic Cell Phone Data RPJ5210Li/Ion CACnone PTR3650NiCd Global Database View TIDBrain Tumor?Diabetes?ModelBattery
6
Medical Records RPJYesDiabetic CACNo TumorNo PTRNo TumorDiabetic Cell Phone Data RPJ5210Li/Ion CACnone PTR3650NiCd Global Database View TIDBrain Tumor?Diabetes?ModelBattery Vertical Partitioning of Data
7
Is the problem trivial?
8
Privacy Preserving Data Mining Perturbation –Agrawal & Srikant, Agrawal & Aggarwal, –Rizvi & Haritsa, Evfimievski et al. Cryptographic –Lindell & Pinkas, Du & Zhan –Vaidya & Clifton, Kantarcioglu & Clifton
9
Secure Multiparty Computation (SMC) Given a function f and n inputs, distributed at n sites, compute the result while revealing nothing to any site except its own input(s) and the result.
10
Results Cluster assignment for entities –Not private Cluster centers –Semi-private 2.3341915.55210Li/IonPiezo
11
Secure K-means clustering Arbitrarily select k starting points Repeat –Assign to respectively –(re)assign each object to closest cluster based on distance from mean –Re-compute the cluster means Until no change K-means clustering
12
Assigning objects to closest cluster
13
Key Idea Disguise site components with random values Compare distances while revealing only comparison result Permute order of clusters to conceal meaning of comparison results
14
Closest Cluster Computation 3 special sites, P 1, P 2 and P r P 1 generates –r random vectors such that –Permutation π (over 1.. K)
15
Permutation Protocol Du and Atallah ’01 A B Homomorphic encryption: E k (x)*E k (y) = E k (x+y)
16
Closest Cluster Computation P1P1 P2P2 PrPr Stage 1 P1P1 P r-1 P3P3 PrPr Stage 2
17
Closest Cluster Computation Stage 3 –P 2 and P r determine i, the index of the cluster with minimum distance Stage 4 –P 1 computes and broadcasts
18
When to stop? Locally compute difference in means Globally known threshold Use simple random-adding technique to disguise actual values –First party adds random value to its distance and sends to next party –Each party adds its value to total and sends on –Last party compares with first party’s random +threshold
19
Communication Cost r parties, n data elements, m bit distances Basic algorithm – O(knr) bits, O(r+k) rounds Optimized Version – O(kmr) bits, O(r) rounds Generic Method – O(kmr 3 ), 1 round Non-secure Method – O(n) bits, 1 round
20
Communication Cost r parties, n data elements, m bit distances BitsRounds Basic Algorithm O(knr)O(r+k) Optimized Algorithm O(kmr)O(r) Generic Method O(kmnr 3 )1 Non-Secure Method O(n)1
21
Conclusion Presented a solution for Privacy Preserving K-Means Clustering problem How to use clusters? Will parties share required information for the possible benefits? Improve Efficiency Working on EM-Clustering, implementations
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.