Download presentation
Presentation is loading. Please wait.
1
Practical Private Computation and Zero- Knowledge Tools for Privacy-Preserving Distributed Data Mining Yitao Duan and John Canny http://www.cs.berkeley.edu/~duan Berkeley Institute of Design Computer Science Division University of California, Berkeley
2
Goal To provide practical solutions with provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
3
Goal To provide practical solutions with provable privacy and adequate efficiency in a realistic adversary model at reasonably large scale
4
The Scenario Two data miners mine data from n users The data miners are semi-honest: follow the protocol but try to get more info Some fraction of users can be malicious: they may input bogus data to disrupt the computation A more realistic adversary model than most existing privacy-preserving data mining schemes
5
Model …… u1u1 d1d1 u2u2 d2d2 u n-1 d n-1 unun dndn f Challenge: standard cryptographic tools not feasible at large scale Must be obfuscated d i in Z φ m φ: < 32 or 64-bit
6
A Practical Solution Provable privacy: Cryptography Efficiency: VSS over small field. Minimize the number of expensive primitives and rely on probabilistic guarantee Realistic adversary model: An extremely efficient zero-knowledge proof to bound the L2-norm of a user’s vector. An effective way to limit the influence malicious users could have on the computation
7
Basic Approach …… u1u1 d1d1 u2u2 d2d2 u n-1 d n-1 unun dndn Σ f = No leakage beyond final result for many algorithms or differential privacy [Dwork06] Cryptographic privacy
8
The Power of Addition A large number of popular algorithms can be run with addition-only steps Linear algorithms: voting and summation, nonlinear algorithm: regression, SVD, PCA, k-means, ID3, EM etc All algorithms in the statistical query model [Kearns 93] Many other gradient-based numerical algorithms A trick used a lot for parallelization in distributed computing [Chu 06, Das 07] Addition-only framework has very efficient private implementation in cryptography and admits efficient ZKPs
9
Private Addition The computation: secret sharing over small field Malicious users: efficient zero-knowledge proof to bound the L2-norm of the user vector
10
Big Integers vs. Small Ones Most applications work with “regular-sized” integers (e.g. 32- or 64-bit). Arithmetic operations are very fast when each operand fits into a single memory cell (~10 -9 sec) Public-key operations (e.g. used in encryption and verification) must use keys with sufficient length (e.g. 1024-bit) for security. Existing private computation solutions must work with large integers extensively (~10 -3 sec) A 6 orders of magnitude difference!
11
vivi uiui u i + v i = d i Private Addition d i : user i’s private vector. u i,,v i and d i are all in a small integer field
12
μ = Σu i ν = Σv i u i + v i = d i Private Addition
13
u i + v i = d i μ = Σu i ν = Σv i μ ν Private Addition
14
μ + ν Private Addition
15
Provable privacy Computation on each server is over small field: same cost as non-private implementation – O(m) small field operations So the cost for privacy is only due to verification For that we have a solution that involves only O(log m) large field operations Private Addition
16
The Need for Verification Private computation obfuscates user data. A malicious user could input anything. Think of a voting scheme: “Please place your vote 0 or 1 in the envelope” Bush 100,000 Gore -100,000
17
I can prove that I know X without disclosing what X is. I can prove that an encrypted number is a ZERO OR ONE, i.e. a bit. (6 extra numbers needed) I can prove that an encrypted number is a k-bit integer. I need 6k extra numbers to do this (!!!) Zero Knowledge Proofs
18
Bounding the L2-Norm A natural and effective way to restrict a cheating user’s malicious influence You must have a big vector to produce large influence on the sum Perturbation theory bounds system change with norms: |σ i (A) - σ i (B)| ≤ ||A-B|| 2 [ Weyl] Can be the basis for other checks Setting L = 1 forces each user to have only 1 vote
19
An Efficient ZKP of Boundedness Luckily, we don’t need to prove that every number in a user’s vector is small, only that the vector is small. The server asks for some random projections of the user’s vector, and expects the user to prove that the square sum of them is small. O(log m) public key crypto operations (instead of O(m)) to prove that the L-2 norm of an m-dim vector is smaller than L. Running time reduced from hours to seconds.
20
Random Projection-based L2-Norm ZKP Server generates N random m -vectors in {-1, 0, +1} m with i.i.d. probability {¼, ½, ¼} User projects his data to the N directions. provides ZKP that the square sum of the projections < NL 2 /2 Expensive public key operations are only on the projections and the square sum
21
Effectiveness
22
Acceptance/rejection Probabilities (a) Linear and (b) log plots of probability of user input acceptance as a function of |d|/L for N = 50. (b) also includes probability of rejection. In each case, the steepest (jagged curve) is the single-value vector (case 3), the middle curve is Zipf vector (case 2) and the shallow curve is uniform vector (case 1)
23
Performance Evaluation (a)Verifier and (b) prover times in seconds for the validation protocol where (from top to bottom) L (the required bound) has 40, 20, or 10 bits. The x-axis is the vector length. Standard technique takes 6 to 10 hours at m = 10 6
24
Current Status The protocols (the L2-norm ZKP and the private vector addition) have been implemented Adding more mid-tier components In Java using native code for big integer Runs on Linux platform Made an open-source toolkit for building privacy-preserving real-world applications
25
More info duan@cs.berkeley.edu http://www.cs.berkeley.edu/~duan/res earch/p4p.html Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.