Foundations of Privacy Lecture 3 Lecturer: Moni Naor.

Slides:



Advertisements
Similar presentations
Computational Privacy. Overview Goal: Allow n-private computation of arbitrary funcs. –Impossible in information-theoretic setting Computational setting:
Advertisements

Foundations of Cryptography Lecture 2: One-way functions are essential for identification. Amplification: from weak to strong one-way function Lecturer:
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 11 Lecturer: Moni Naor.
Efficient Zero-Knowledge Proof Systems Jens Groth University College London.
Foundations of Privacy Lecture 1 Lecturer: Moni Naor.
Polling With Physical Envelopes A Rigorous Analysis of a Human–Centric Protocol Tal Moran Joint work with Moni Naor.
CS555Topic 241 Cryptography CS 555 Topic 24: Secure Function Evaluation.
11 Provable Security. 22 Given a ciphertext, find the corresponding plaintext.
Foundations of Cryptography Lecture 5 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 13 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Lecturer: Moni Naor Joint work with Cynthia Dwork Foundations of Privacy Informal Lecture Impossibility of Disclosure Prevention or The Case for Differential.
Foundations of Cryptography Lecture 12 Lecturer: Moni Naor.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
CSCE 715 Ankur Jain 11/16/2010. Introduction Design Goals Framework SDT Protocol Achievements of Goals Overhead of SDT Conclusion.
Asymmetric Cryptography part 1 & 2 Haya Shulman Many thanks to Amir Herzberg who donated some of the slides from
Foundations of Privacy Lecture 2 Lecturer: Moni Naor.
Lecturer: Moni Naor Foundations of Cryptography Lecture 12: Commitment and Zero-Knowledge.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Introduction to Modern Cryptography, Lecture 7/6/07 Zero Knowledge and Applications.
Cryptography1 CPSC 3730 Cryptography Chapter 9 Public Key Cryptography and RSA.
1 Constructing Pseudo-Random Permutations with a Prescribed Structure Moni Naor Weizmann Institute Omer Reingold AT&T Research.
Calibrating Noise to Sensitivity in Private Data Analysis
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
CS555Spring 2012/Topic 41 Cryptography CS 555 Topic 4: Computational Approach to Cryptography.
Lecturer: Moni Naor Foundations of Cryptography Lecture 9: Pseudo-Random Functions and Permutations.
Foundations of Cryptography Lecture 10: Pseudo-Random Permutations and the Security of Encryption Schemes Lecturer: Moni Naor Announce home )deadline.
Cramer-Shoup is Plaintext Aware in the Standard Model Alexander W. Dent Information Security Group Royal Holloway, University of London.
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 8 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
Foundations of Cryptography Rahul Jain CS6209, Jan – April 2011
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Public-Key Encryption with Lazy Parties Kenji Yasunaga Institute of Systems, Information Technologies and Nanotechnologies (ISIT), Japan Presented at SCN.
CIS 5371 Cryptography Introduction.
Differential Privacy Tutorial Part 1: Motivating the Definition Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
Secure Computation (Lecture 7-8) Arpita Patra. Recap >> (n,t)-Secret Sharing (Sharing/Reconstruction) > Shamir Sharing > Lagrange’s Interpolation for.
On the Practical Feasibility of Secure Distributed Computing A Case Study Gregory Neven, Frank Piessens, Bart De Decker Dept. of Computer Science, K.U.Leuven.
Slide 1 Vitaly Shmatikov CS 380S Introduction to Secure Multi-Party Computation.
Secure two-party computation: a visual way by Paolo D’Arco and Roberto De Prisco.
Secure Computation (Lecture 5) Arpita Patra. Recap >> Scope of MPC > models of computation > network models > modelling distrust (centralized/decentralized.
Cryptography Lecture 2 Arpita Patra. Summary of Last Class  Introduction  Secure Communication in Symmetric Key setting >> SKE is the required primitive.
Zero-knowledge proof protocols 1 CHAPTER 12: Zero-knowledge proof protocols One of the most important, and at the same time very counterintuitive, primitives.
Privacy vs. Utility Xintao Wu University of North Carolina at Charlotte Nov 10, 2008.
Secure Computation (Lecture 2) Arpita Patra. Vishwaroop of MPC.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Secure Computation Lecture Arpita Patra. Recap >> Improving the complexity of GMW > Step I: Offline: O(n 2 c AND ) OTs; Online: i.t., no crypto.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Microdata masking as permutation Krish Muralidhar Price College of Business University of Oklahoma Josep Domingo-Ferrer UNESCO Chair in Data Privacy Dept.
Secure Computation (Lecture 9-10) Arpita Patra. Recap >> MPC with honest majority in i.t. settings > Protocol using (n,t)-sharing, proof of security---
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
1 CIS 5371 Cryptography 1.Introduction. 2 Prerequisites for this course  Basic Mathematics, in particular Number Theory  Basic Probability Theory 
Topic 36: Zero-Knowledge Proofs
Privacy-preserving Release of Statistics: Differential Privacy
Modern symmetric-key Encryption
Course Business I am traveling April 25-May 3rd
Cryptography Lecture 3.
Topic 5: Constructing Secure Encryption Schemes
Cryptography Lecture 5.
Cryptography Lecture 8.
Impossibility of SNARGs
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

Foundations of Privacy Lecture 3 Lecturer: Moni Naor

Recap of last weeks’ lectures Privacy is hard to capture precisely but clarified direction Cryptography, Secure Function Evaluation and Privacy Examples of attacks on presumed sanitization Today: The Impossibility of Disclosure Prevention Differential Privacy Netflix

Cryptography and Privacy Extremely relevant - but does not solve the privacy problem Secure function Evaluation How to distributively compute a function f(X 1, X 2, …,X n ), –where X j known to party j. E.g.,  = sum(a,b,c, …) –Parties should only learn final output (  ) Many results depending on –Number of players –Means of communication –The power and model of the adversary –How the function is represented More worried what to compute than how to compute

Example: Securely Computing Sums X1X1 X2X2 X3X3 X4X4 X5X5 0 · X i · P-1. Want to compute  X i Party 1 selects r 2 R [0..P-1]. Sends Y 1 = X 1 +r Party i received Y i-1 and sends Y i = Y i-1 + X i Party 1 received Y n and announces  =  X i = Y n -r Y1Y1 Y2Y2 Y3Y3 Y4Y4 Y5Y5  mod P

Is this Protocol Secure? To talk rigorously about cryptographic security: Specify the Power of the Adversary –Access to the data/system –Computational power? – “Auxiliary” information? Define a Break of the System –What is compromise –What is a “win” for the adversary? If it controls two players - insecure Can be all powerful here

The Simulation Paradigm A protocol is considered secure if: For every adversary (of a certain type) There exists a (adversary) simulator that outputs an indistinguishable ``transcript”. Examples: Encryption Zero-knowledge Secure function evaluation Power of analogy

SFE: Simulating the ideal model A protocol is considered secure if: For every adversary there exists a simulator operating in the ``ideal” (trusted party) model that outputs an indistinguishable transcript. Major result : “ Any function f that can be evaluated using polynomial resources can be securely evaluated using polynomial resources ” Breaking = distinguishing! Ideal model: can obtain the value at a given point

Tables for a Gate b i, b j are the true values c i, c j permutated values b k =G(b i, b j ) If we know (c i, W i b i ) and (c j, W j b j ) want to know (c k, W k b k ) ij k W i 0,W i 1 W j 0,W j 1 W k 0,W k 1 G Typical entry: [(c k, W k G(b i,b j ) ) +F W i b i (c j,k) + F W j b j (c i, k)]

Translation table for an OR gate ij k W i 0,W i 1 W j 0,W j 1 W k 0,W k 1 G Encrypt (  k (b i,b j ), W k G(b i,b j ) ) with W i b i, W j b j Sender constructs a translation table from input values to output values

Secret Sharing Splitting a secret s to n parties p 1, p 2 … p n so that any `legitimate’ subset of participants should be able to reconstruct s No illegitimate subsets should learn anything about s The legitimate subsets are defined by a (monotone) access structure. Threshold scheme Efficient method: polynomial interpolation

General Access Structures For any access structure: there exists a secret sharing scheme Issue: size of shares proportional to number of subsets in access structure Which access structures have succinct sharing schemes?

The Problem with SFE SFE does not imply privacy: The problem is with ideal model –E.g.,  = sum(a,b) –Each player learns only what can be deduced from  and her own input to f –if  and a yield b, so be it. Need ways of talking about leakage even in the ideal model

Modern Privacy of Data Analysis Is public analysis of private data a meaningful/achievable Goal? The holy grail: Get utility of statistical analysis while protecting privacy of every individual participant Ideally: “privacy-preserving” sanitization allows reasonably accurate answers to meaningful information

Sanitization: Traditional View Curator/ Sanitizer OutputData A Trusted curator can access DB of sensitive information, should publish privacy-preserving sanitized version

Traditional View: Interactive Model Data Multiple queries, chosen adaptively ? query 1 query 2 Sanitizer

Sanitization: Traditional View Curator/ Sanitizer OutputData A How to sanitize Anonymization?

Auxiliary Information Information from any source other than the statistical database –Other databases, including old releases of this one –Newspapers –General comments from insiders –Government reports, census website –Inside information from a different organization Eg, Google’s view, if the attacker/user is a Google employee Linkage Attacks: Malicious Use of Aux Info

18 AOL Search History Release (2006) 650,000 users, 20 Million queries, 3 months AOL’s goal : –provide real query logs from real users Privacy? –“Identifying information” replaced with random identifiers – But: different searches by the same user still linked

19 Name: Thelma Arnold Age: 62 Widow Residence: Lilburn, GA AOL Search History Release (2006)

Other Successful Attacks Against anonymized HMO records [Sweeny 98] –Proposed K -anonymity Against K-anonymity [MGK06] –Proposed L-diversity Against L-diversity [XT07] –Proposed M-Invariance Against all of the above [GKS08]

Why Settle for Ad Hoc Notions of Privacy? Dalenius, 1977: Anything that can be learned about a respondent from the statistical database can be learned without access to the database –Captures possibility that “ I ” may be an extrovert –The database doesn’t leak personal information –Adversary is a user Analogous to Semantic Security for Crypto – Anything that can be learned from the ciphertext can be learned without the ciphertext –Adversary is an eavesdropper Goldwasser- Micali 1982

Computational Security of Encryption Semantic Security Whatever Adversary A can compute on encrypted string X  0,1  n, so can A ’ that does not see the encryption of X, yet simulates A ’s knowledge with respect to X A selects: Distribution D n on  0,1  n Relation R(X,Y) - computable in probabilistic polynomial time For every pptm A there is an pptm A’ so that for all pptm relation R for X  R D n  Pr  R(X,A(E(X))  - Pr  R(X,A’(  ))   is negligible Outputs of A and A’ are indistinguishable even for a tester who knows X

XY R E(X) A XY R. A’ A: D n A’: D n ¼ X 2 R D n

Making it Slightly less Vague Cryptographic Rigor Applied to Privacy Define a Break of the System –What is compromise –What is a “win” for the adversary? Specify the Power of the Adversary –Access to the data –Computational power? – “Auxiliary” information? Conservative/Paranoid by Nature –Protect against all feasible attacks

In full generality: Dalenius Goal Impossible –Database teaches smoking causes cancer –I smoke in public –Access to DB teaches that I am at increased risk for cancer But what about cases where there is significant knowledge about database distribution

Outline The Framework A General Impossibility Result –Dalenius’ goal cannot be achieved in a very general sense The Proof –Simplified –General case

Two Models DatabaseSanitized Database ? San Non-Interactive: Data are sanitized and released

Two Models Database Interactive: Multiple Queries, Adaptively Chosen ? San

Auxiliary Information Common theme in many privacy horror stories: Not taking into account side information –Netflix challenge: not taking into account IMDb [Narayanan-Shmatikov] The auxiliary information The Database SAN(DB) =remove names

Not learning from DB With access to the database San A Auxiliary Information San A’ Auxiliary Information DB There is some utility of DB that legitimate users should learn Possible breach of privacy Goal: users learn the utility without the breach Without access to the database

Not learning from DB With access to the database Without access to the database San A Auxiliary Information San A’ Auxiliary Information DB Want: anything that can be learned about an individual from the database can be learned without access to the database D 8 D 8 A 9 A’ whp DB 2 R D 8 auxiliary information z |Prob [A(z) $ DB wins] – Prob[ A’(z) wins]| is small

Illustrative Example for Difficulty Want: anything that can be learned about a respondent from the database can be learned without access to the database More Formally 8 D 8 A 9 A’ whp DB 2 R D 8 auxiliary information z |Probability [A(z) $ DB wins] – Probability [ A’(z) wins]| is small Example: suppose height of individual is sensitive information –Average height in DB not known a priori Aux z = “Adam is 5 cm shorter than average in DB ” –A learns average height in DB, hence, also Adam’s height –A’ does not

Defining “Win”: The Compromise Function Notion of privacy compromise Compromise? y 0/1 Adv DB D Privacy breach Privacy compromise should be non trivial: Should not be possible to find privacy breach from auxiliary information alone – hardly a convincing one! Privacy breach should exist: Given DB there should be y that is a privacy breach Should be possible to find y efficiently Let  be bound on breach

Basic Concepts D Distribution on (Finite) Databases D –Something about the database must be unknown –Captures knowledge about the domain E.g., rows of database correspond to owners of 2 pets D Privacy Mechanism San( D, DB) –Can be interactive or non-interactive –May have access to the distribution D D Auxiliary Information Generator AuxGen( D, DB) –Has access to the distribution and to DB – Formalizes partial knowledge about DB Utility Vector w –Answers to k questions about the DB –(Most of) utility vector can be learned by user –Utility: Must inherit sufficient min-entropy from source D

Impossibility Theorem: Informal DFor any* distribution D on Databases DB For any* reasonable privacy compromise decider C. Fix any useful* privacy mechanism San Then There is an auxiliary info generator AuxGen and an adversary A Such that For all adversary simulators A’ [A(z) $ San( DB)] wins, but [A’(z)] does not win Tells us information we did not know z=AuxGen(DB) Finds a compromise

Impossibility Theorem Fix any useful* privacy mechanism San and any reasonable privacy compromise decider C. Then D There is an auxiliary info generator AuxGen and an adversary A such that for “ all ” distributions D and all adversary simulators A’ Pr[A( D, San( D,DB), AuxGen( D, DB)) wins] - Pr[A’( D, AuxGen( D, DB)) wins] ≥  for suitable, large,  D The probability spaces are over choice of DB 2 R D and the coin flips of San, AuxGen, A, and A’ To completely specify: need assumption on the entropy of utility vector W and how well SAN(W) behaves