Presentation is loading. Please wait.

Presentation is loading. Please wait.

Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton.

Similar presentations


Presentation on theme: "Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton."— Presentation transcript:

1 Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton

2 2 A story We’re experiencing a lot of fraud lately… Here too.. I can’t find a pattern to recognize fraud in advance.. Neither can I.. Maybe we should share information.. But, what about Patients’ privacy Business secrets Have you heard of “Secure function evaluation” ? This is all “theory”. It can’t be efficient.

3 3 New Opportunities for Interaction Between –Enterprises, and government agencies holding sensitive data. –P2P users –Mobile wireless crowds (PDAs, cell phones) What about privacy? A bidirectional approach: –Finding what is actually needed –Designing useful and efficient cryptographic tools

4 4 Cryptographic Protocols for Privacy Preserving Computation x y F(x,y) and nothing else Input: Output: x y As if… F(x,y)

5 5 Does the trusted party scenario make sense? x y F(x,y) We cannot hope for more privacy Does the trusted party scenario make sense? Are the parties motivated to submit their true inputs? Can they tolerate the disclosure of F(x,y)? If so, we can implement the scenario without a trusted party.

6 6 Secure Function Evaluation [ Yao, GMW,BGW ] x y C(x,y) and nothing else nothing Input: Output: F(x,y) – A public function. Represented as a Boolean circuit C(x,y). Implementation: O(|X|) “oblivious transfers”. O(|C|) communication. Pretty efficient for small circuits! (but what about larger circuits?)

7 7 An equality circuit AND = x1x1 y1y1 = x2x2 y2y2 = xnxn ynyn = xy 1 if x=y 0 otherwise

8 8 Cryptographic methods vs. randomization methods inaccuracy overhead lack of privacy Randomization methods [statistical disclosure, AS] Cryptographic methods Our goal…

9 9 Examples of Simple Privacy Preserving Primitives (with reasonable solutions) Is X = Y? Is X > Y? What is X  Y? What is median of X  Y? Auctions (negotiations). Many parties, private bids. Compute the winning bidder and the sale price, but nothing else. [NPS] Voting Add privacy to data mining algs (ID3 – [LP])

10 Private Set Intersection with Mike Freedman, NYU Kobbi Nissim, MSR

11 11 Applications of Set Intersection Government agency A Government agency B People on welfareExpensive car buyers Compute intersection and nothing else

12 12 Computing the Intersection Private Equality Test (PET) –Alice: x. Bob: y. –Output: 1 iff x=y –Privacy preserving solutions: Cannot use hash functions alone Yao, [FNW], [NP] Generalization: list intersection –X = x 1, …, x n Y = y 1, …, y n

13 13 The basic tool: Homomorphic Encryption Semantically secure public key encryption Given Enc(M1), ENC(M2), can compute (without knowing the decryption key) –Enc(M1+M2) –Enc(c· M1) for any constant c. –I.e. Enc(a 0 )+Enc(a 1 )x+…+Enc(a n )x n = Enc(P(x)) Examples: El Gamal, Paillier, DJ.

14 14 The Scenario Client: X = x 1, …, x n Server: Y = y 1, …, y n Output: –Client learns X  Y. –Server learns nothing.

15 15 The Protocol Client defines a polynomial of degree n whose roots are x 1,…,x n –P(y) = (x 1 -y)·(x 2 -y)·…·(x n -y) = a n y n + … + a 1 y + a 0 Sends to server homomorphic encryptions of coefficients –Enc(a n ),…, Enc(a 0 ) (only the client can decrypt)

16 16 …The Protocol Server uses homomorphic properties to compute  y Enc( r·P(y) + y) (r is random) If y  X  Y result is Enc(r·0+y)=Enc(y), otherwise result is Enc(random). Server sends (permuted) results to C. C decrypts, compares to its list.

17 17 Security Bad server? The server only sees semantically secure encryptions. Learning about C’s input = breaking enc. Bad client? The client can, given only the output X  Y, simulate her “view” in the protocol. (I.e. she generates encryptions of items in X  Y, and of random items.)

18 18 Efficiency Client encrypts and decrypts n values Communication is O(n) Server: –For each input computes Enc(r·P(y)+y), i.e. n exponentiations. –Total O(n 2 ) exponentiations –Can use hashing to reduce overhead to O(n lnln n).

19 19 Is Approximation easier? Can we approximate size of intersection (i.e. scalar product) with sublinear overhead? Lower bound:  –Approximating |X  Y| within 1  ε factor requires Ω(n) communication (  constant ε). –True even for randomized algorithms. –Proof: reduction to Razborov’s lower bound for Disjointness. Upper bound: protocols with matching overhead.

20 Secure Computation of the K th -ranked element with Gagan Aggarwal, Stanford Nina Mishra, HPL

21 21 Secure Computation of the K th -ranked element Inputs: –A: S A B: S B –Large sets of unique items (  D). –There’s also the multi-party scenario Output: x  S A  S B s.t. |{y | y<x, y  S A  S B }| = k-1 Median: k = (|S A | + |S B |) / 2

22 22 Motivation Basic statistical analysis of distributed data E.g. histogram of salaries in competing business in the same area Sometimes the parties might want to hide the size of their inputs

23 23 Some information is always revealed The K th -ranked element reveals some information Suppose S A = x 1,…,x 1000 –Median of S A  S B = x 400 Party A now learns that S B contains at least 200 elements smaller than x 400 But she shouldn’t learn more

24 24 Results, and previous work Previous work: generic constructions – overhead at least linear in k. New results: –Two-party: log k secure comparisons of log D bit numbers. –Multi-party: log D simple computations with log D bit numbers.

25 25 RARA An (insecure) two-party median protocol LALA SASA SBSB m A RBRB LBLB m B L A lies below the median, R B lies above the median. New median is same as original median. Recursion  Need log n rounds (suppose each set contains 2 i items) m A < m B

26 26 Secure two-party median protocol A finds median of S A, call it m A B finds median of S B, call it m B mA < mBmA < mB A deletes xєS A s.t. x < m A. B deletes xєS B s.t. x > m B. A deletes xєS A s.t. x > m A. B deletes xєS B s.t. x < m B. YES NO Secure comparison (e.g. a small circuit)

27 27 Proof of security Simulation: Given the protocol’s output, each party can simulate the execution of the protocol SASA median First comparison: m A < m B Second comparison: m A > m B

28 28 Arbitrary inputs, arbitrary k SASA SBSB K ++ Now, compute the median of two sets of size k Size should be a power of 2 2i2i -- ++ ++ median of new inputs = k th element of original inputs

29 29 Conclusions Efficient privacy preserving primitives for basic tasks Open problems –Intersection: approximate matching? –Median: clustering? Theory and applications can and should interact –Tools from the theory of cryptography (e.g. SFE) can be used in applications –Applications can benefit from rigorous analysis There’s a lot more to be done…


Download ppt "Private Analysis of Data Sets Benny Pinkas HP Labs, Princeton."

Similar presentations


Ads by Google