1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

2 Outline Motivation Privacy Secure computation and privacy Privacy preserving SVM Related to 3D-LBS Challenges

3 Motivation Huge databases exist in various applications  Medical data  Consumer purchase data  Census data  Communication and media-related data  Data gathered by government agencies Can these data be utilized?  For medical research  For improving customer service  For homeland security

4 Motivation Data sharing is necessary for full utilization  Pooling medical data can improve the quality of medical research  Pooling of information from different government agencies can provide a wider picture What is the health status of citizens that are supported by social welfare? Are there citizens that receive simultaneous support from different agencies? Data gathered by the government (e.g., census data) should be publicly available

5 Motivation The huge amount of data available means that it is possible to learn a lot of information about individuals from public data  Purchasing patterns  Family history  Medical data  …

6 Privacy Human definition:  Privacy and autonomy: information that is personal, confidential or private should not be unnecessarily distributed or publicly known  Privacy and control: Personal or private information should not be misused (whatever that means) Difficulties in mathematically formulating  The same information is classified differently by different people  Legitimate use is interpreted differently by different people

7 Secure computation and privacy Secure computation  Assume that there is a function that all parties wish to compute  Secure computation shows how to compute that function in the safest way possible  In particular, it guarantees minimal information leakage (the output only) Privacy  Does the function output itself reveal “sensitive information”, or  Should the parties agree to compute this function?

8 This talk Secure multiparty computation  Trace back to “two millionaires problem” (A. Yao 82), or earlier Privacy preserving data mining  Privacy preserving SVM (Vaidya et al. KIS 2007)

9 Secure multiparty computation A set of parties with private inputs Objective: jointly compute a function of the inputs so that certain security properties (like privacy and correctness) are preserved  Applications: secure elections, auctions… Properties must be ensured even if some of the parties maliciously attack the protocol

10 Secure computation tasks Examples:  Authentication protocols  Online payments  Auctions  Elections  Privacy preserving data mining  Essentially any task…

11 The real model x Protocol output y

12 The ideal model x f 1 (x,y) y f 2 (x,y) x f 1 (x,y) y f 2 (x,y)

13 IDEALREAL Trusted party Protocol interaction  The security definition For every real adversary A there exists an adversary S

14 Why this approach? General – it captures all applications The specifics of an application are defined by its functionality, security is defined as above The security guarantees achieved are easily understood (because the ideal model is easily understood) We can be confident that we did not “miss” any security requirements

15 The ideal model – More details The definition we gave suffices in the case of an honest majority When there is no honest majority  Guaranteed output delivery cannot be achieved  Fairness cannot be achieved Changes to ideal model:  Corrupted parties receive output first  Adversary decides if honest parties receive their outputs as well  This is called security with abort

16 Defects to the ideal model When no honest majority, fairness and guaranteed output delivery cannot be achieved  This “defect” is included into the ideal model This approach can be used to models of partial information leakage:  The parties wish to compute a function f, but more information is leaked by the protocol  This can be modeled by having the trusted party explicitly leak this information  Helps for efficiency considerations Advantage: explicit defect!

17 Privacy preserving data mining Setting  Data is distributed at different sites  These sites may be third parties (e.g., hospitals, government bodies) or individuals Aim  Compute the data mining algorithm on the data so that nothing but the output is learned  That is, carry out a secure computation

18 Privacy  Security Secure computation only deals with the process of computing the function  It does not ask whether or not the function should be computed

19 Privacy and secure computation Secure computation can be used to solve any distributed data-mining problem A two-stage process  Decide that the function/algorithm should be computed – an issue of privacy  Apply secure computation techniques to compute it securely – security But, not every privacy problem can be cast as a distributed computation

20 Privacy preserving SVM classification (Vaidya et al. KIS 2007)

21 SVM introduction Data Objective Problem: How to pass the kernel matrix among different parties?

22 Linear kernel Polynomial kernel RBF kernel Case 1: On vertically partitioned data D1D1 D2D2 Bank, health insurance company and auto insurance company collect different information about the same people

23 Secure merge Assumption: K parties, P i holds v i Procedure  P 0 chooses a random number R from a uniform distribution over F  P 0 sends ( R+v 0 ) mod | F | to P 1  P i receives  P i sends to P i+1

24 Case 2: On horizontally partitioned data Different banks collect data for their customers D1D1 D2D2

25 Case 3: On arbitrarily partitioned data

26 How to do? Homomorphic encryption where * is either addition or multiplication (in some abelian group) Existing cryptosystems  Goldwasser–Micali (Blum M, Goldwasser S 1984)  Benaloh cryptosystem (Benaloh JC 1986)  Naccache–Stern cryptosystem (Naccache D, Stern J 1998)  Paillier cryptosystem (Paillier P 1999)  Okamoto–Uchiyama cryptosystem (Okamoto T, Uchiyama S 1998)

27 Algorithm

28 Other security issues Collusion with the third party Q

29 Related to 3D-LBS Issues  Some data may need be encrypted before research test  May need to work data miners on encrypted data Problems  Lack knowledge of security and cryptography  Limited understanding of privacy

30 Future challenges Understand the real problem and what the real requirement is  A very non-trivial task and one that requires interdisciplinary cooperation  Computer scientists should help to formalize the notion, lawyers, policy-makers, social scientists should be involved in understanding the concerns  Some challenges Reconciling cultural and legal differences relating to privacy in different countries Understanding when privacy is “allowed” to be breached

31 Future challenges Appropriate modeling for secure computation Efficient protocols…

32 Conclusion Privacy-preserving data mining is truly needed  Data mining is being used: by security agencies, governmental bodies and corporations  Privacy advocates and citizen outcry often prevents positive use of data mining  Good solutions (which are still currently out of reach) may be able to resolve the conflict

1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

Similar presentations

Presentation on theme: "1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

Similar presentations

Presentation on theme: "1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”"— Presentation transcript:

Similar presentations

About project

Feedback