Privacy-Preserving Clustering

Privacy-Preserving Clustering

Outline Introduction Related Work Preliminaries
Secure Multi-Party Computation Data Sanitization Preliminaries Yao’s Millionaires’ Problem Homomorphic Encryption Privacy-Preserving K-Means Clustering Conclusion

Introduction Why needs privacy-preserving?
Data sharing in today's globally networked systems poses a threat to individual privacy and organizational confidentiality. The privacy problem is not data mining, but the way data mining is done. So, privacy and data mining can coexist. An important data mining problem: clustering.

Related Work Privacy-preserving clustering:
Secure multi-party computation. High computation and communication costs. Data sanitization. Lost of accuracy. Dimensionality reduction. Model-based solutions.

Yao’s Millionaires’ Problem
Two millionaires wish to know who is richer; however, they do not want to find out any additional information about each other’s wealth.

Solutions Suppose Alice has i millions. Bob has j millions.
1 < i, j < 10.

Solutions Suppose (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2.
Alice: i = 5, Bob: j = 3. (B) x = 7, Ea(x) = 4 = k. (B) k - j + 1 = 2. x Ea(x) 1 7 2 3 5 4 8 6 9 10

Solutions (A) y1 = Da(2) = Da(k - j + 1) = 6. y2 = Da(3) = 2.
yj = y3 = Da(4) = Da(k - j + j) = 7. y4 = Da(5) = 3. y5 = Da(6) = 5. y6 = Da(7) = 1. y7 = Da(8) = 4. … x Ea(x) 1 7 2 3 5 4 8 6 9 10

Solutions (A) 5. (B) Check if z3 = x or not. If yes, means that i ≧ j.
z1 = y1 = 6. z2 = y2 = 2. z3 = y3 = Da(k - j + j) = 7. z4 = y4 = 3. z5 = yi = y5 = 5. z6 = yi + 1 = y5 + 1 = 6. z7 = y6 + 1 = 2. … 5. (B) Check if z3 = x or not. If yes, means that i ≧ j. If no, means that i < j.

Homomorphic Encryption
If there is an algorithm ⊕ to compute H(x⊕y) from H(x) and H(y) that does not reveal x or y. H(x⊕y) = H(x) ⊙ H(y) RSA, … Additive homomorphic: H(x+y) = H(x) * H(y) Paillier, …

Homomorphic Encryption

Privacy-Preserving K-Means Clustering Over Vertically Partitioned Data
SIGKDD, 2003

Problem Definition Goal: Input: Output:
Cluster the known set of common entities without revealing any value that the clustering is based on. Input: Each user provides one attribute of all items. Output: Assignment of entities to clusters. Cluster centers themselves.

K-Means Clustering

new center computation
K-Means Clustering cluster decision new center computation distance matrix

Vertically Partitioned Data
User 1 User 2

Terminology r: # of users, each having different attributes for the same set of items. n: # of the common items. k: # of clusters required. ui: each cluster mean, i = 1, …, k. uij: projection of the mean of cluster i on user j. Final result for user j: The final value / position of uij, i = 1, …, k. Cluster assignments: clusti for all points i = 1, …, n.

Privacy-Preserving K-Means Clustering

Securely Finding the Closest Cluster

The security of the algorithm is based on three key ideas. Disguise the site components of the distance with random values that cancel out when combined. Permute the order of clusters so the real meaning of the comparison results is unknown. Compare distances so only the comparison result is learned; no party knows the distances being compared.

Check Threshold m j

Conclusion Horizontally partitioned data: User 1 User 2

Privacy-Preserving Clustering

Similar presentations

Presentation on theme: "Privacy-Preserving Clustering"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Privacy-Preserving Clustering

Similar presentations

Presentation on theme: "Privacy-Preserving Clustering"— Presentation transcript:

Similar presentations

About project

Feedback