Lecturer: Moni Naor Joint work with Cynthia Dwork Foundations of Privacy Informal Lecture Impossibility of Disclosure Prevention or The Case for Differential.

Slides:



Advertisements
Similar presentations
Detection of Algebraic Manipulation with Applications to Robust Secret Sharing and Fuzzy Extractors Ronald Cramer, Yevgeniy Dodis, Serge Fehr, Carles Padro,
Advertisements

Short seed extractors against quantum storage Amnon Ta-Shma Tel-Aviv University 1.
Foundations of Cryptography Lecture 7 Lecturer:Danny Harnik.
Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)
Foundations of Cryptography Lecture 2: One-way functions are essential for identification. Amplification: from weak to strong one-way function Lecturer:
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 11 Lecturer: Moni Naor.
Foundations of Privacy Lecture 1 Lecturer: Moni Naor.
Polling With Physical Envelopes A Rigorous Analysis of a Human–Centric Protocol Tal Moran Joint work with Moni Naor.
Foundations of Cryptography Lecture 13 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 4 Lecturer: Moni Naor.
NON-MALLEABLE EXTRACTORS AND SYMMETRIC KEY CRYPTOGRAPHY FROM WEAK SECRETS Yevgeniy Dodis and Daniel Wichs (NYU) STOC 2009.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
Foundations of Privacy Lecture 4 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 12 Lecturer: Moni Naor.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
 Secure Authentication Using Biometric Data Karen Cui.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Session 5 Hash functions and digital signatures. Contents Hash functions – Definition – Requirements – Construction – Security – Applications 2/44.
1 Adaptive Witness Encryption and Asymmetric Password-based Cryptography PKC 2015 March 31, 2015 Mihir Bellare UC San Diego Viet Tung Hoang University.
Foundations of Privacy Lecture 2 Lecturer: Moni Naor.
The Goldreich-Levin Theorem: List-decoding the Hadamard code
Topics in Cryptography Lecture 4 Topic: Chosen Ciphertext Security Lecturer: Moni Naor.
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
1 Streaming Computation of Combinatorial Objects Ziv Bar-Yossef U.C. Berkeley Omer Reingold AT&T Labs – Research Ronen.
Calibrating Noise to Sensitivity in Private Data Analysis
1 Leonid Reyzin May 23, th International Conference on Information Theoretic Security Minentropy and its Variations for Cryptography.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
CS555Spring 2012/Topic 41 Cryptography CS 555 Topic 4: Computational Approach to Cryptography.
Collecting Correlated Information from a Sensor Network Micah Adler University of Massachusetts, Amherst.
Information Theory and Security
Cramer-Shoup is Plaintext Aware in the Standard Model Alexander W. Dent Information Security Group Royal Holloway, University of London.
Foundations of Cryptography Lecture 9 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 8 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.
How Robust are Linear Sketches to Adaptive Inputs? Moritz Hardt, David P. Woodruff IBM Research Almaden.
CMSC 414 Computer and Network Security Lecture 3 Jonathan Katz.
Simulating independence: new constructions of Condensers, Ramsey Graphs, Dispersers and Extractors Boaz Barak Guy Kindler Ronen Shaltiel Benny Sudakov.
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
Why Extractors? … Extractors, and the closely related “Dispersers”, exhibit some of the most “random-like” properties of explicitly constructed combinatorial.
Foundations of Cryptography Lecture 6 Lecturer: Moni Naor.
Cryptography Lecture 2 Arpita Patra. Summary of Last Class  Introduction  Secure Communication in Symmetric Key setting >> SKE is the required primitive.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Randomness Extraction Beyond the Classical World Kai-Min Chung Academia Sinica, Taiwan 1 Based on joint works with Xin Li, Yaoyun Shi, and Xiaodi Wu.
Cryptography Lecture 2 Arpita Patra. Recall >> Crypto: Past and Present (aka Classical vs. Modern Cryto) o Scope o Scientific Basis (Formal Def. + Precise.
1 Leonid Reyzin Boston University Adam Smith Weizmann  IPAM  Penn State Robust Fuzzy Extractors & Authenticated Key Agreement from Close Secrets Yevgeniy.
An Introduction to Differential Privacy and its Applications 1 Ali Bagherzandi Ph.D Candidate University of California at Irvine 1- Most slides in this.
Differential Privacy (1). Outline  Background  Definition.
When is Key Derivation from Noisy Sources Possible?
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Umans Complexity Theory Lecturess Lecture 11: Randomness Extractors.
Towards Privacy in Public Databases
Reusable Fuzzy Extractors for Low-Entropy Distributions
Information Complexity Lower Bounds
Sampling of min-entropy relative to quantum knowledge Robert König in collaboration with Renato Renner TexPoint fonts used in EMF. Read the TexPoint.
Privacy-preserving Release of Statistics: Differential Privacy
Modern symmetric-key Encryption
Differential Privacy in Practice
When are Fuzzy Extractors Possible?
The Curve Merger (Dvir & Widgerson, 2008)
Conditional Computational Entropy
When are Fuzzy Extractors Possible?
Cryptography Lecture 5.
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

Lecturer: Moni Naor Joint work with Cynthia Dwork Foundations of Privacy Informal Lecture Impossibility of Disclosure Prevention or The Case for Differential Privacy

Let’s Talk About Sex Better Privacy Means Better Data

Private Data Analysis Simple Counts and Correlations –Was there a significant rise in asthma emergency room cases this month? –What is the correlation between new HIV infections and crystal meth usage? Holistic Statistics –Are the data inherently low-dimensional? Collaborative filtering for movie recommendations Beyond Statistics: Private Data Analysis –How far is the (proprietary) network from bipartite? …while preserving privacy of individuals

Different from SFE Secure Function Evaluation participants collaboratively compute a function f of their private inputs E.g.,  = sum(a,b,c, …) –Each player learns only what can be deduced from  and her own input to f Miracle of Modern Science! SFE does not imply privacy! –Privacy only ensured “modulo f ” if  and a yield b, so be it.

Cryptographic Rigor Applied to Privacy Define a Break of the System –What is a “win” for the adversary? –May settle for partial information Specify the Power of the Adversary –Computational power? “Auxiliary” information? Conservative/Paranoid by Nature –All breaks are forever –Protect against all feasible attacks

Dalenius, 1977 Anything that can be learned about a respondent from the statistical database can be learned without access to the database –Captures possibility that “ I ” may be an extrovert –The database doesn’t leak personal information –Adversary is a user Analagous to Semantic Security for Crypto – Anything that can be learned from the ciphertext can be learned without the ciphertext –Adversary is an eavesdropper Goldwasser- Micali 1982

Outline The Framework A General Impossibility Result –Dalenius’ goal cannot be achieved The Proof –Simplified –General case

Two Models DatabaseSanitized Database ? San Non-Interactive: Data are sanitized and released

Two Models Database Interactive: Multiple Queries, Adaptively Chosen ? San

Auxiliary Information Common theme in many privacy horror stories: Not taking into account side information –Netflix challenge: not taking into account IMDB [Narayan-Shmatikov]

Not learning from DB With access to the database Without access to the database San A Auxiliary Information San A’ Auxiliary Information DB There is some utility of DB that legitimate user should be able to learn Possible breach of privacy Goal: users learn the utility without the breach

Not learning from DB With access to the database Without access to the database San A Auxiliary Information San A’ Auxiliary Information DB Want: anything that can be learned about an individual from the statistical database can be learned without access to the database D 8 D 8 A 9 A’ whp DB 2 R D 8 auxiliary information z |Prob [A(z) $ DB wins] – Prob[ A’(z) wins]| is small

Illustrative Example for Impossibility Want: anything that can be learned about a respondent from the statistical database can be learned without access to the database More Formally 8 D 8 A 9 A’ whp DB 2 R D 8 auxiliary information z |Probability [A(z) $ DB wins] – Probability [ A’(z) wins]| is small Example: Aux z = “Kobi Oz is 10 cm shorter than average in DB ” –A learns average height in DB, hence, also Kobi’s height –A’ does not Impossibility Requires Utility –Mechanism must convey info about DB Not predictable by someone w/o access “Hint generator” and A share secret, unknown to A’

Defining “Win”: The Compromise Function Notion of privacy compromise Compromise? y 0/1 Adv DB D Privacy breach Privacy compromise should be non trivial: Should not be possible to find privacy breach from auxiliary information alone Privacy breach should exist: Given DB there should be y that is a privacy breach Should be possible to find y

Additional Basic Concepts DDistribution on (Finite) Databases D –Something about the database must be unknown –Captures knowledge about the domain E.g., rows of database correspond to owners of 2 pets DPrivacy Mechanism San( D, DB) –Can be interactive or non-interactive –May have access to the distribution D DAuxiliary Information Generator AuxGen( D, DB) –Has access to the distribution and to DB –Formalizes partial knowledge about DB Utility Vector w –Answers to k questions about the DB –(Most of) utility vector can be learned by user –Utility: Must inherit sufficient min-entropy from source D

Impossibility Theorem Fix any useful* privacy mechanism San and any reasonable privacy compromise decider C. Then D There is an auxiliary info generator AuxGen and an adversary A such that for “ all ” distributions D and all adversary simulators A’ Pr[A( D, San( D,DB), AuxGen( D, DB)) wins] - Pr[A’( D, AuxGen( D, DB)) wins] ≥  for suitable, large,  D The probability spaces are over choice of DB 2 R D and the coin flips of San, AuxGen, A, and A’ Tells us information we did not know To completely specify need assumption on the entropy of utility vector

Strategy The auxiliary info generator will provide a hint that together with the Utility Vector w will yield the privacy breach. Want AuxGen to work without knowing D just DB –Find privacy breach y and encode in z –Make sure z alone does not give y. Only with w Complication: is the utility vector w –Completely learned by the user? –Or just an approximation?

Entropy of Random Sources Source: –Probability distribution X on {0,1} n. –Contains some “randomness”. Measure of “randomness” –Shannon entropy: H(X) = - ∑ x  Γ P x (x) log P x (x) Represents how much we can compress X on the average But even a high entropy source may have a point with prob 0.9 – min-entropy: H min (X) = - log max x  Γ P x (x) Represents the most likely value of X {0,1} n

Min-entropy Definition : X is a k - source if H min (X) ¸ k. i.e. Pr[X = x] · 2 -k for all x Examples: –Bit-fixing: some k coordinates of X uniform, rest fixed or even depend arbitrarily on others. –Unpredictable Source: 8 i 2 [n], b 1,..., b i-1 2 {0,1}, k/n · Prob[X i =1| X 1, X 2, … X i-1 = b 1,..., b i-1 ] · 1-k/n –Flat k -source: Uniform over S µ {0,1} n, |S|=2 k Fact every k -source is convex combination of flat ones.

Extractors Universal procedure for “purifying” an imperfect source Definition : a function Ext : {0,1} n £ {0,1} d  {0,1} m is a (k,  ) - extractor if: 8 k -sources X, Ext(X, U d ) is  -close to U m. d random bits “seed” E XT k - source of length n m almost-uniform bits {0,1} n 2 k strings x s

Strong extractors Output looks random even after seeing the seed. Definition : Ext is a (k,  ) strong extractor if Ext ’ (x,s) = s ◦ Ext(x,s) is a (k,  ) -extractor i.e. 8 k -sources X, for a 1-  ’ frac. of s 2 {0,1} d Ext(X,s) is  -close to U m

Extractors from Hash Functions Leftover Hash Lemma [ILL89]: universal (pairwise independent) hash functions yield strong extractors –output length: m = k-O(1) –seed length: d = O(n) Example: Ext(x,(a,b))= first m bits of a ¢ x+b in GF[2 n ] Almost pairwise independence [SZ94,GW94]: –seed length: d= O(log n+k)

Suppose w Learned Completely AuxGen and A share a secret: w AuxGen(DB) Find privacy breach y of DB Find w from DB –simulate A Choose s 2 R {0,1} d and compute Ext(w,s) Set z = (s, Ext(w,s) © y) San DB Aux Gen A C 0/1 w z

Suppose w Learned Completely AuxGen and A share a secret: w DB Aux Gen A’ C 0/1San DB Aux Gen A C 0/1 w z z = (s, Ext(w,s) © y) z Technical Conditions: H 1 (W|y) ≥ |y| and |y| “safe”

Why is it a compromise? AuxGen and A share a secret: w Why doesn’t A’ learn y : For each possible value of y (s, Ext(w,s)) is  -close to uniform Hence: (s, Ext(w,s) © y) is  -close to uniform San DB Aux Gen A C 0/1 w z z = (s, Ext(s,w) © y) Technical Conditions: H 1 (W|y) ≥ |y| and |y| “safe”

w Need Not be Learned Completely Relaxed Utility: Something Close to w is Learned D AuxGen( D, DB) does not know exactly what A will learn Need that something close to w produces the same extracted randomness as w Ordinary extractors offer no such guarantee Fuzzy Extractors (m,ℓ,t,  ) : (Gen, Rec) Gen(w) outputs extracted r 2 {0,1} ℓ and public string p. For any distribution W min-entropy at least m (R, P) Ã Gen(W) ) (R, P) and (U ℓ, P) are within stat distance  Rec(p,w*) : reconstructs r given p and any w * sufficiently close to w (r, p) Ã Gen(w) and || w – w * || 0 · t ) Rec(w *, p) = r. Dodis, Reyzin and Smith

Construction Based on ECC Error-correcting code ECC : –Any two codewords differ by at least 2t bits Gen(w): p = w © ECC(r’) –where r’ is random r is extracted from r’ Given p and w’ close to w : –Compute w © p –Decode to get ECC(r’) –r is extracted from r’ 2t2t w ECC(r’) p w’

w Need Not be Learned Completely Fuzzy Extractors (m,ℓ,t,  ) : (Gen, Rec) Gen(w) outputs extracted r 2 {0,1} ℓ and public string p. For any distribution W of sufficient min-entropy (R, P) Ã Gen(W) ) (R, P) and (U ℓ, P) are within stat distance  Rec: reconstructs r given p and any w * sufficiently close to w (r, p) Ã Gen(w) and || w – w * || 0 · t ) Rec(w *, p) = r. Idea: (r, p) Ã Gen(w); Set z = (p, r © y) A reconstructs r from w* close to w r looks almost uniform to A’ even given p Problem : p leaks information about w - might disclose privacy breach y’ ; Solution: AuxGen interacts with DB to learn safe w’ (r, p) Ã Gen(w’); Set z = (p, r © y) w’’ (learned by A ) and w’ both sufficiently close to w ) w’, w’’ close to each other ) A(w’’, p) can reconstruct r. By assumption w’ should not yield a breach! Let  be bound on breach

w Need Not be Learned Completely AuxGen and A share a secret: r DB Aux Gen A’ C 0/1San DB Aux Gen A C 0/1 w’’ z (p, r) = Gen(w’) z = (p, r © y) A: r = Rec(p, w’’) z r almost unif, given p p should not be disclosive w’

w Need Not be Learned Completely Pr[A’(z)] wins D ≤ Pr[A $ San( D, DB) wins ] +  ≤  +  DB Aux Gen A’ C 0/1San DB Aux Gen A C 0/1 w’’ z (p, r) = Gen(w’) z = (p, r © y) A: r = Rec(p, w’’) z r almost unif, given p p should not be disclosive w’

w Need Not be Learned Completely Need extra min-entropy: H min (W|y) ≥ L+|p| Pr[A’(z)] wins ≤ Pr[A $ San( D, DB) wins ] +  ≤  +  DB Aux Gen A’ C 0/1San DB Aux Gen A C 0/1 w’’ z (p, r) = Gen(w’) z = (p, r © y) A: r = Rec(p, w’’) z r almost unif, given p p should not be disclosive w’

Two Remarkable Aspects Works Even if Kobi Oz not in Database! –Motivates a definition based on increased risk incurred by joining the database, Risk to Kobi if in database vs Risk to Kobi if not in DB Cf: What can be learned about Kobi with vs w/o DB access Dalenius’ Goal Impossible but Semantic Sec Possible –Yet, definitions are similar. –Resolved by utility: the adversary is a user. –Without auxiliary information: User must learn something from mechanism; – Simulator learns nothing Eavesdropper should learn nothing; – Simulator learns nothing –What about SFE: the comparison is to an ideal party Differential Privacy

Possible answer: Differential Privacy Noticeable Relative Shift between K (DB – Me) and K (DB + Me) ? If not, then no perceptible risk is incurred by joining DB. Anything adversary can do, it could do without Me. Possible responses Pr [response] Bad Responses: XXX