Download presentation
Presentation is loading. Please wait.
1
Privacy-Preserving Clustering
2
Outline Introduction Related Work Preliminaries
Secure Multi-Party Computation Data Sanitization Preliminaries Yao’s Millionaires’ Problem Homomorphic Encryption Privacy-Preserving K-Means Clustering Conclusion
3
Introduction Why needs privacy-preserving?
Data sharing in today's globally networked systems poses a threat to individual privacy and organizational confidentiality. The privacy problem is not data mining, but the way data mining is done. So, privacy and data mining can coexist. An important data mining problem: clustering.
4
Related Work Privacy-preserving clustering:
Secure multi-party computation. High computation and communication costs. Data sanitization. Lost of accuracy. Dimensionality reduction. Model-based solutions.
5
Yao’s Millionaires’ Problem
Two millionaires wish to know who is richer; however, they do not want to find out any additional information about each other’s wealth.
6
Solutions Suppose Alice has i millions. Bob has j millions.
1 < i, j < 10.
7
Solutions Suppose (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2.
Alice: i = 5, Bob: j = 3. (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2. x Ea(x) 1 7 2 3 5 4 8 6 9 10
8
Solutions (A) y1 = Da(2) = Da(k - j + 1) = 6. y2 = Da(3) = 2.
yj = y3 = Da(4) = Da(k - j + j) = 7. y4 = Da(5) = 3. y5 = Da(6) = 5. y6 = Da(7) = 1. y7 = Da(8) = 4. … x Ea(x) 1 7 2 3 5 4 8 6 9 10
9
Solutions (A) 5. (B) Check if z3 = x or not. If yes, means that i ≧ j.
z1 = y1 = 6. z2 = y2 = 2. z3 = y3 = Da(k - j + j) = 7. z4 = y4 = 3. z5 = yi = y5 = 5. z6 = yi + 1 = y5 + 1 = 6. z7 = y6 + 1 = 2. … 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. If no, means that i < j.
10
Homomorphic Encryption
If there is an algorithm ⊕ to compute H(x⊕y) from H(x) and H(y) that does not reveal x or y. H(x⊕y) = H(x) ⊙ H(y) RSA, … Additive homomorphic: H(x+y) = H(x) * H(y) Paillier, …
11
Homomorphic Encryption
12
Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data
SIGKDD, 2003
13
Problem Definition Goal: Input: Output:
Cluster the known set of common entities without revealing any value that the clustering is based on. Input: Each user provides one attribute of all items. Output: Assignment of entities to clusters. Cluster centers themselves.
14
K-Means Clustering
15
new center computation
K-Means Clustering cluster decision new center computation distance matrix
16
Vertically Partitioned Data
User 1 User 2
17
Terminology r: # of users, each having different attributes for the same set of items. n: # of the common items. k: # of clusters required. ui: each cluster mean, i = 1, …, k. uij: projection of the mean of cluster i on user j. Final result for user j: The final value / position of uij, i = 1, …, k. Cluster assignments: clusti for all points i = 1, …, n.
18
Privacy-Preserving K-Means Clustering
19
Securely Finding the Closest Cluster
20
Securely Finding the Closest Cluster
The security of the algorithm is based on three key ideas. Disguise the site components of the distance with random values that cancel out when combined. Permute the order of clusters so the real meaning of the comparison results is unknown. Compare distances so only the comparison result is learned; no party knows the distances being compared.
21
Securely Finding the Closest Cluster
22
Securely Finding the Closest Cluster
23
Securely Finding the Closest Cluster
24
Check Threshold m j
25
Conclusion Horizontally partitioned data: User 1 User 2
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.