Download presentation
Presentation is loading. Please wait.
Published byJocelin Mosley Modified over 9 years ago
1
Cryptographic methods for privacy aware computing: applications
2
Outline Review: three basic methods Two applications Distributed decision tree with horizontally partitioned data Distributed k-means with vertically partitioned data
3
Three basic methods 1-out-K Oblivious Transfer Random share Homomorphic encryption * Cost is the major concern
4
Two example protocols The basic idea is Do not release original data Exchange intermediate result Applying the three basic methods to securely combine them
5
Building decision trees over horizontally partitioned data Horizontally partitioned data Entropy-based information gain Major ideas in the protocol
6
Horizontally Partitioned Data Table with key and r set of attributes keyX1…Xd keyX1…Xd Site 1 keyX1…Xd Site 2 keyX1…Xd Site r… K1 k2 kn K1 k2 ki K i+1 k i+2 kj K m+1 k m+2 kn
7
Review decision tree algorithm (ID3 algorithm) Find the cut that maximizes gain certain attribute Ai, sorted v1…vn Certain value in the attribute For categorical data we use Ai=vi For numerical data we use Ai<vi Ailabel v1 v2 vn l1 l2 ln cut E(): Entropy of label distribution Choose the attribute/value that gives the highest gain! Ai<vi? yes no Aj<vj? …
8
Key points Calculating entropy Ailabel v1 v2 vn l1 l2 ln cutThe key is calculating x log x, where x is the sum of values from the two parties P1 and P2, i.e., x1 and x2, respectively -decomposed to several steps -Each step each party knows only a random share of the result
9
steps Step1: compute shares for w1 +w2= (x1+x2)ln(x1+x2) * a major protocol is used to compute ln(x1+x2) Step 2: for a condition (Ai, vi), find the random shares for E(S), E(S1) and E(S2) respectively. Step3: repeat step1&2 to all possible (Ai, vi) pairs Step4: a circuit gate to determine which (Ai, vi) pair results in maximum gain. x1 x2 w11 w12 w21 w22 ………… (Ai,vi) with Maximum gain
10
2. K-means over vertically partitioned data Vertically partitioned data Normal K-means algorithm Applying secure sum and secure comparison among multi-sites in the secure distributed algorithm
11
Vertically Partitioned Data Table with key and r set of attributes keyX1…Xi Xi+1…Xj … Xm+1…Xd keyX1…Xi Site 1 keyXi+1…Xj Site 2 keyXm+1…Xd Site r…
12
Motivation Naïve approach: send all data to a trusted site and do k-mean clustering there Costly Trusted third party? Preferable: distributed privacy preserving k-means
13
Basic K-means algorithm 4 main steps: step1.Randomly select k initial cluster centers (k means) repeat step 2. Assign any point i to its closest cluster center step 3. Recalculate the k means with the new point assignment Until step 4. the k means do not change
14
Distributed k-means Why k-means can be done over vertically partitioned data All of the 4 steps are decomposable ! The most costly part (step 2 and 3) can be done locally We will focus on the step 2 ( Assign any point i to its closest cluster center)
15
step 1 All sites share the index of the initial random k records as the centroids µ11 … µ1i Site 1Site 2Site r… µk1 … µki µ1i+1 … µ1j µki+1 … µkj µ1m …µ1d µkm … µkd µ1µ1 µkµk
16
Step 2: Assign any point x to its closest cluster center 1.Calculate distance of point X (X1, X2, … Xd) to each cluster center µ k -- each distance calculation is decomposable! d 2 = [(X 1 - µ k1 ) 2 +… (X i - µ ki ) 2 ] + [(X i+1 - µ ki+1 ) 2 +… (X j - µ kj ) 2 ] + … 2. Compare the k full distances to find the minimum one Site1site2 Partial distances: d1+ d2 + … For each X, each site has a k-element vector that is the result for the partial distance to the k centroids, notated as Xi
17
Privacy concerns for step 2 Some concerns: Partial distances d1, d2 … may breach privacy (the X i and µ ki ) – need to hide it distance of a point to each cluster may breach privacy – need hide it Basic ideas to ensure security Disguise the partial distances Compare distances so that only the comparison result is learned Permute the order of clusters so the real meaning of the comparison results is unknown. Need 3 non-colluding sites (P1, P2, Pr)
18
Secure Computing of Step 2 Stage1: prepare for secure sum of partial distances p1 generate V1+V2 + …Vr = 0, Vi is random k-element vector, used to hide the partial distance for site i Use “Homomorphic encryption” to do randomization: E i (Xi)E i (Vi) = E i (Xi+Vi) Stage2: calculate secure sum for r-1 parties P1, P3, P4… Pr-1 send their perturbed and permuted partial distances to Pr Pr sums up the r-1 partial distances (including its own part)
19
Secure Computing of Step 2 * X i contains the partial distances to the k partial centroids at site i * E i (Xi)E i (Vi) = E i (Xi+Vi) : Homomorphic encryption, E i is public key * (X i ) : permutation function, perturb the order of elements in Xi * V1+V2 + …Vr = 0, Vi is used to hide the partial distances Stage 1Stage 2
20
Stage 3: secure_add_and_compare to find the minimum distance Involves only Pr and P2 Use a standard Secure Multiparty Computation protocol to find the result Stage 4: the index of minimum distance (permuted cluster id) is sent back to P1. P1 knows the permutation function thus knows the original cluster id. P1 broadcasts the cluster id to all parties. K-1 comparisons:
21
Step 3: can also be done locally Update partial means µi locally according to the new cluster assignments. X11 … X1i Site 1Site 2Site r… Xn1 … Xni X1i+1 … X1j Xni+1 … Xnj X1m …X1d Xnm … Xnd Cluster 2 Cluster k Cluster labels X21 … X2i Cluster k
22
Extra communication cost O(nrk) n : # of records r: # of parties k: # of means Also depends on # of iterations
23
Conclusion It is appealing to have cryptographic privacy preserving protocols The cost is the major concern It can be reduced using novel algorithms
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.