Presentation is loading. Please wait.

Presentation is loading. Please wait.

Privacy-Preserving Clustering

Similar presentations


Presentation on theme: "Privacy-Preserving Clustering"— Presentation transcript:

1 Privacy-Preserving Clustering

2 Outline Introduction Related Work Preliminaries
Secure Multi-Party Computation Data Sanitization Preliminaries Yao’s Millionaires’ Problem Homomorphic Encryption Privacy-Preserving K-Means Clustering Conclusion

3 Introduction Why needs privacy-preserving?
Data sharing in today's globally networked systems poses a threat to individual privacy and organizational confidentiality. The privacy problem is not data mining, but the way data mining is done. So, privacy and data mining can coexist. An important data mining problem: clustering.

4 Related Work Privacy-preserving clustering:
Secure multi-party computation. High computation and communication costs. Data sanitization. Lost of accuracy. Dimensionality reduction. Model-based solutions.

5 Yao’s Millionaires’ Problem
Two millionaires wish to know who is richer; however, they do not want to find out any additional information about each other’s wealth.

6 Solutions Suppose Alice has i millions. Bob has j millions.
1 < i, j < 10.

7 Solutions Suppose (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2.
Alice: i = 5, Bob: j = 3. (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2. x Ea(x) 1 7 2 3 5 4 8 6 9 10

8 Solutions (A) y1 = Da(2) = Da(k - j + 1) = 6. y2 = Da(3) = 2.
yj = y3 = Da(4) = Da(k - j + j) = 7. y4 = Da(5) = 3. y5 = Da(6) = 5. y6 = Da(7) = 1. y7 = Da(8) = 4. x Ea(x) 1 7 2 3 5 4 8 6 9 10

9 Solutions (A) 5. (B) Check if z3 = x or not. If yes, means that i ≧ j.
z1 = y1 = 6. z2 = y2 = 2. z3 = y3 = Da(k - j + j) = 7. z4 = y4 = 3. z5 = yi = y5 = 5. z6 = yi + 1 = y5 + 1 = 6. z7 = y6 + 1 = 2. 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. If no, means that i < j.

10 Homomorphic Encryption
If there is an algorithm ⊕ to compute H(x⊕y) from H(x) and H(y) that does not reveal x or y. H(x⊕y) = H(x) ⊙ H(y) RSA, … Additive homomorphic: H(x+y) = H(x) * H(y) Paillier, …

11 Homomorphic Encryption

12 Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data
SIGKDD, 2003

13 Problem Definition Goal: Input: Output:
Cluster the known set of common entities without revealing any value that the clustering is based on. Input: Each user provides one attribute of all items. Output: Assignment of entities to clusters. Cluster centers themselves.

14 K-Means Clustering

15 new center computation
K-Means Clustering cluster decision new center computation distance matrix

16 Vertically Partitioned Data
User 1 User 2

17 Terminology r: # of users, each having different attributes for the same set of items. n: # of the common items. k: # of clusters required. ui: each cluster mean, i = 1, …, k. uij: projection of the mean of cluster i on user j. Final result for user j: The final value / position of uij, i = 1, …, k. Cluster assignments: clusti for all points i = 1, …, n.

18 Privacy-Preserving K-Means Clustering

19 Securely Finding the Closest Cluster

20 Securely Finding the Closest Cluster
The security of the algorithm is based on three key ideas. Disguise the site components of the distance with random values that cancel out when combined. Permute the order of clusters so the real meaning of the comparison results is unknown. Compare distances so only the comparison result is learned; no party knows the distances being compared.

21 Securely Finding the Closest Cluster

22 Securely Finding the Closest Cluster

23 Securely Finding the Closest Cluster

24 Check Threshold m j

25 Conclusion Horizontally partitioned data: User 1 User 2


Download ppt "Privacy-Preserving Clustering"

Similar presentations


Ads by Google