1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

Slides:



Advertisements
Similar presentations
Dov Gordon & Jonathan Katz University of Maryland.
Advertisements

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
Revisiting the efficiency of malicious two party computation David Woodruff MIT.
Fair Computation with Rational Players Adam Groce and Jonathan Katz University of Maryland.
Quid-Pro-Quo-tocols Strengthening Semi-Honest Protocols with Dual Execution Yan Huang 1, Jonathan Katz 2, David Evans 1 1. University of Virginia 2. University.
Efficiency vs. Assumptions in Secure Computation Yuval Ishai Technion & UCLA.
Secure Computation of Linear Algebraic Functions
Secure Evaluation of Multivariate Polynomials
Secure Multiparty Computations on Bitcoin
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Rational Oblivious Transfer KARTIK NAYAK, XIONG FAN.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
On Fair Exchange, Fair Coins and Fair Sampling Shashank Agrawal, Manoj Prabhakaran University of Illinois at Urbana-Champaign.
Introduction to Modern Cryptography, Lecture 12 Secure Multi-Party Computation.
Eran Omri, Bar-Ilan University Joint work with Amos Beimel and Ilan Orlov, BGU Ilan Orlov…!??!!
Lect. 18: Cryptographic Protocols. 2 1.Cryptographic Protocols 2.Special Signatures 3.Secret Sharing and Threshold Cryptography 4.Zero-knowledge Proofs.
Modeling Insider Attacks on Group Key Exchange Protocols Jonathan Katz Ji Sun Shin University of Maryland.
Impossibility Results for Concurrent Two-Party Computation Yehuda Lindell IBM T.J.Watson.
An architecture for Privacy Preserving Mining of Client Information Jaideep Vaidya Purdue University This is joint work with Murat.
Oblivious Transfer based on the McEliece Assumptions
Buyer-Seller Watermarking (BSW) Protocols Geong Sen Poh 31 Oct 2006.
Proactive Secure Mobile Digital Signatures Work in progress. Ivan Damgård and Gert Læssøe Mikkelsen University of Aarhus.
Co-operative Private Equality Test(CPET) Ronghua Li and Chuan-Kun Wu (received June 21, 2005; revised and accepted July 4, 2005) International Journal.
Privacy Preserving K-means Clustering on Vertically Partitioned Data Presented by: Jaideep Vaidya Joint work: Prof. Chris Clifton.
Jointly Restraining Big Brother: Using cryptography to reconcile privacy with data aggregation Ran Canetti IBM Research.
1 Introduction to Secure Computation Benny Pinkas HP Labs, Princeton.
Secure Multiparty Computation and Privacy Yehuda Lindell Bar-Ilan University.
Survey: Secure Composition of Multiparty Protocols Yehuda Lindell IBM T.J. Watson.
Tutorial on Secure Multi-Party Computation
Privacy Preserving Data Mining: An Overview and Examination of Euclidean Distance Preserving Data Transformation Chris Giannella cgiannel AT acm DOT org.
1 Information and Data Privacy: An Indian Perspective  Why is this important? Public concern about privacy.  Considerable concern in developed countries.
Privacy-Aware Computing Introduction. Outline  Brief introduction Motivating applications Major research issues  Tentative schedule  Reading assignments.
Privacy Preserving Learning of Decision Trees Benny Pinkas HP Labs Joint work with Yehuda Lindell (done while at the Weizmann Institute)
Control of Personal Information in a Networked World Rebecca Wright Boaz Barak Jim Aspnes Avi Wigderson Sanjeev Arora David Goodman Joan Feigenbaum ToNC.
Alexander Potapov.  Authentication definition  Protocol architectures  Cryptographic properties  Freshness  Types of attack on protocols  Two-way.
Information-Theoretic Security and Security under Composition Eyal Kushilevitz (Technion) Yehuda Lindell (Bar-Ilan University) Tal Rabin (IBM T.J. Watson)
CS573 Data Privacy and Security
Efficient and Robust Private Set Intersection and multiparty multivariate polynomials Dana Dachman-Soled 1, Tal Malkin 1, Mariana Raykova 1, Moti Yung.
Andrew Lindell Aladdin Knowledge Systems and Bar-Ilan University 04/09/08 CRYP-202 Legally-Enforceable Fairness in Secure Two-Party Computation.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
1 Privacy-Preserving Distributed Information Sharing Nan Zhang and Wei Zhao Texas A&M University, USA.
Technology Panel What technical tools are in our disposal for achieving privacy Privacy: Technology + Policy –Technology can Implement Policy –Without.
1 CIS 5371 Cryptography 3. Private-Key Encryption and Pseudorandomness B ased on: Jonathan Katz and Yehuda Lindel Introduction to Modern Cryptography.
Technology Panel What technical tools are in our disposal for achieving privacy and security Privacy: Technology + Policy –Without Policy, technology will.
Secure Incremental Maintenance of Distributed Association Rules.
Secure Computation (Lecture 7-8) Arpita Patra. Recap >> (n,t)-Secret Sharing (Sharing/Reconstruction) > Shamir Sharing > Lagrange’s Interpolation for.
Tools for Privacy Preserving Distributed Data Mining
Slide 1 Vitaly Shmatikov CS 380S Introduction to Secure Multi-Party Computation.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Li Xiong (Emory University) Subramanyam Chitti (GA Tech) Ling Liu.
Threshold PKC Shafi Goldwasser and Ran Canetti. Public Key Encryption [DH] A PKC consists of 3 PPT algorithms (G,E,D) - G(1 k ) outputs public key e,
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Advantage of File-oriented system: it provides useful historical information about how data are managed earlier. File-oriented systems create many problems.
Mining Multiple Private Databases Topk Queries Across Multiple Private Databases (2005) Mining Multiple Private Databases Using a kNN Classifier (2007)
Rational Cryptography Some Recent Results Jonathan Katz University of Maryland.
OBJECTIVES  To understand the concept of Electronic Payment System and its security services.  To bring out solution in the form of applications to.
Privacy Preserving Payments in Credit Networks By: Moreno-Sanchez et al from Saarland University Presented By: Cody Watson Some Slides Borrowed From NDSS’15.
m-Privacy for Collaborative Data Publishing
Marketing Research Chapter 29. The Marketing Research Process The five steps that a business follows when conducting marketing research are: Defining.
1 Privacy Preserving Data Mining Introduction August 2 nd, 2013 Shaibal Chakrabarty.
Andrew Lindell Aladdin Knowledge Systems and Bar-Ilan University 04/08/08 CRYP-106 Efficient Fully-Simulatable Oblivious Transfer.
1 Diffie-Hellman (Key Exchange) Protocol Rocky K. C. Chang 9 February 2007.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Verifiable Threshold Secret Sharing and Full Fair Secure Two-party Computation YE Jian-wei March 7, 2009.
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Round-Efficient Multi-Party Computation in Point-to-Point Networks Jonathan Katz Chiu-Yuen Koo University of Maryland.
Bit Commitment, Fair Coin Flips, and One-Way Accumulators Matt Ashoff 11/9/2004 Cryptographic Protocols.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
Topic 36: Zero-Knowledge Proofs
Privacy-Preserving Clustering
Differential Privacy in Practice
Presentation transcript:

1 Privacy Preserving Data Mining Haiqin Yang Extracted from a ppt “Secure Multiparty Computation and Privacy” Added “Privacy Preserving SVM”

2 Outline Motivation Privacy Secure computation and privacy Privacy preserving SVM Related to 3D-LBS Challenges

3 Motivation Huge databases exist in various applications  Medical data  Consumer purchase data  Census data  Communication and media-related data  Data gathered by government agencies Can these data be utilized?  For medical research  For improving customer service  For homeland security

4 Motivation Data sharing is necessary for full utilization  Pooling medical data can improve the quality of medical research  Pooling of information from different government agencies can provide a wider picture What is the health status of citizens that are supported by social welfare? Are there citizens that receive simultaneous support from different agencies? Data gathered by the government (e.g., census data) should be publicly available

5 Motivation The huge amount of data available means that it is possible to learn a lot of information about individuals from public data  Purchasing patterns  Family history  Medical data  …

6 Privacy Human definition:  Privacy and autonomy: information that is personal, confidential or private should not be unnecessarily distributed or publicly known  Privacy and control: Personal or private information should not be misused (whatever that means) Difficulties in mathematically formulating  The same information is classified differently by different people  Legitimate use is interpreted differently by different people

7 Secure computation and privacy Secure computation  Assume that there is a function that all parties wish to compute  Secure computation shows how to compute that function in the safest way possible  In particular, it guarantees minimal information leakage (the output only) Privacy  Does the function output itself reveal “sensitive information”, or  Should the parties agree to compute this function?

8 This talk Secure multiparty computation  Trace back to “two millionaires problem” (A. Yao 82), or earlier Privacy preserving data mining  Privacy preserving SVM (Vaidya et al. KIS 2007)

9 Secure multiparty computation A set of parties with private inputs Objective: jointly compute a function of the inputs so that certain security properties (like privacy and correctness) are preserved  Applications: secure elections, auctions… Properties must be ensured even if some of the parties maliciously attack the protocol

10 Secure computation tasks Examples:  Authentication protocols  Online payments  Auctions  Elections  Privacy preserving data mining  Essentially any task…

11 The real model x Protocol output y

12 The ideal model x f 1 (x,y) y f 2 (x,y) x f 1 (x,y) y f 2 (x,y)

13 IDEALREAL Trusted party Protocol interaction  The security definition For every real adversary A there exists an adversary S

14 Why this approach? General – it captures all applications The specifics of an application are defined by its functionality, security is defined as above The security guarantees achieved are easily understood (because the ideal model is easily understood) We can be confident that we did not “miss” any security requirements

15 The ideal model – More details The definition we gave suffices in the case of an honest majority When there is no honest majority  Guaranteed output delivery cannot be achieved  Fairness cannot be achieved Changes to ideal model:  Corrupted parties receive output first  Adversary decides if honest parties receive their outputs as well  This is called security with abort

16 Defects to the ideal model When no honest majority, fairness and guaranteed output delivery cannot be achieved  This “defect” is included into the ideal model This approach can be used to models of partial information leakage:  The parties wish to compute a function f, but more information is leaked by the protocol  This can be modeled by having the trusted party explicitly leak this information  Helps for efficiency considerations Advantage: explicit defect!

17 Privacy preserving data mining Setting  Data is distributed at different sites  These sites may be third parties (e.g., hospitals, government bodies) or individuals Aim  Compute the data mining algorithm on the data so that nothing but the output is learned  That is, carry out a secure computation

18 Privacy  Security Secure computation only deals with the process of computing the function  It does not ask whether or not the function should be computed

19 Privacy and secure computation Secure computation can be used to solve any distributed data-mining problem A two-stage process  Decide that the function/algorithm should be computed – an issue of privacy  Apply secure computation techniques to compute it securely – security But, not every privacy problem can be cast as a distributed computation

20 Privacy preserving SVM classification (Vaidya et al. KIS 2007)

21 SVM introduction Data Objective Problem: How to pass the kernel matrix among different parties?

22 Linear kernel Polynomial kernel RBF kernel Case 1: On vertically partitioned data D1D1 D2D2 Bank, health insurance company and auto insurance company collect different information about the same people

23 Secure merge Assumption: K parties, P i holds v i Procedure  P 0 chooses a random number R from a uniform distribution over F  P 0 sends ( R+v 0 ) mod | F | to P 1  P i receives  P i sends to P i+1

24 Case 2: On horizontally partitioned data Different banks collect data for their customers D1D1 D2D2

25 Case 3: On arbitrarily partitioned data

26 How to do? Homomorphic encryption where * is either addition or multiplication (in some abelian group) Existing cryptosystems  Goldwasser–Micali (Blum M, Goldwasser S 1984)  Benaloh cryptosystem (Benaloh JC 1986)  Naccache–Stern cryptosystem (Naccache D, Stern J 1998)  Paillier cryptosystem (Paillier P 1999)  Okamoto–Uchiyama cryptosystem (Okamoto T, Uchiyama S 1998)

27 Algorithm

28 Other security issues Collusion with the third party Q

29 Related to 3D-LBS Issues  Some data may need be encrypted before research test  May need to work data miners on encrypted data Problems  Lack knowledge of security and cryptography  Limited understanding of privacy

30 Future challenges Understand the real problem and what the real requirement is  A very non-trivial task and one that requires interdisciplinary cooperation  Computer scientists should help to formalize the notion, lawyers, policy-makers, social scientists should be involved in understanding the concerns  Some challenges Reconciling cultural and legal differences relating to privacy in different countries Understanding when privacy is “allowed” to be breached

31 Future challenges Appropriate modeling for secure computation Efficient protocols…

32 Conclusion Privacy-preserving data mining is truly needed  Data mining is being used: by security agencies, governmental bodies and corporations  Privacy advocates and citizen outcry often prevents positive use of data mining  Good solutions (which are still currently out of reach) may be able to resolve the conflict