Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,

Slides:



Advertisements
Similar presentations
Merkle Puzzles Are Optimal
Advertisements

Efficient Private Approximation Protocols Piotr Indyk David Woodruff Work in progress.
On the Complexity of Parallel Hardness Amplification for One-Way Functions Chi-Jen Lu Academia Sinica, Taiwan.
On the (Im)Possibility of Arthur-Merlin Witness Hiding Protocols Iftach Haitner, Alon Rosen and Ronen Shaltiel 1.
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Are PCPs Inherent in Efficient Arguments? Guy Rothblum, MIT ) MSR-SVC ) IAS Salil Vadhan, Harvard University.
Efficiency vs. Assumptions in Secure Computation Yuval Ishai Technion & UCLA.
Quantum Money from Hidden Subspaces Scott Aaronson and Paul Christiano.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Computational Privacy. Overview Goal: Allow n-private computation of arbitrary funcs. –Impossible in information-theoretic setting Computational setting:
Inaccessible Entropy Iftach Haitner Microsoft Research Omer Reingold Weizmann & Microsoft Hoeteck Wee Queens College, CUNY Salil Vadhan Harvard University.
Inaccessible Entropy Iftach Haitner Microsoft Research Omer Reingold Weizmann Institute Hoeteck Wee Queens College, CUNY Salil Vadhan Harvard University.
Chosen-Ciphertext Security from Slightly Lossy Trapdoor Functions PKC 2010 May 27, 2010 Petros Mol, Scott Yilek 1 UC, San Diego.
1 Reducing Complexity Assumptions for Statistically-Hiding Commitment Iftach Haitner Omer Horviz Jonathan Katz Chiu-Yuen Koo Ruggero Morselli Ronen Shaltiel.
Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.
Foundations of Cryptography Lecture 11 Lecturer: Moni Naor.
Vote privacy: models and cryptographic underpinnings Bogdan Warinschi University of Bristol 1.
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Rational Oblivious Transfer KARTIK NAYAK, XIONG FAN.
Foundations of Cryptography Lecture 5 Lecturer: Moni Naor.
Achieving Byzantine Agreement and Broadcast against Rational Adversaries Adam Groce Aishwarya Thiruvengadam Ateeq Sharfuddin CMSC 858F: Algorithmic Game.
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
TAMPER DETECTION AND NON-MALLEABLE CODES Daniel Wichs (Northeastern U)
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
On the (Im)Possibility of Key Dependent Encryption Iftach Haitner Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
A Parallel Repetition Theorem for Any Interactive Argument Iftach Haitner Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before.
Co-operative Private Equality Test(CPET) Ronghua Li and Chuan-Kun Wu (received June 21, 2005; revised and accepted July 4, 2005) International Journal.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
CMSC 414 Computer and Network Security Lecture 6 Jonathan Katz.
1 Constructing Pseudo-Random Permutations with a Prescribed Structure Moni Naor Weizmann Institute Omer Reingold AT&T Research.
Foundations of Privacy Lecture 11 Lecturer: Moni Naor.
On Everlasting Security in the Hybrid Bounded Storage Model Danny Harnik Moni Naor.
Foundations of Cryptography Lecture 8 Lecturer: Moni Naor.
Foundations of Cryptography Rahul Jain CS6209, Jan – April 2011
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Computational Entropy Joint works with Iftach Haitner (Tel Aviv), Thomas Holenstein (ETH Zurich), Omer Reingold (MSR-SVC), Hoeteck Wee (George Washington.
On Constructing Parallel Pseudorandom Generators from One-Way Functions Emanuele Viola Harvard University June 2005.
Cryptography Lecture 2 Arpita Patra. Summary of Last Class  Introduction  Secure Communication in Symmetric Key setting >> SKE is the required primitive.
On the Communication Complexity of SFE with Long Output Daniel Wichs (Northeastern) joint work with Pavel Hubáček.
1 Information Security – Theory vs. Reality , Winter Lecture 10: Garbled circuits and obfuscation Eran Tromer Slides credit: Boaz.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
TAMPER DETECTION AND NON-MALLEABLE CODES Daniel Wichs (Northeastern U)
1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim.
CRYPTOGRAPHIC HARDNESS OTHER FUNCTIONALITIES Andrej Bogdanov Chinese University of Hong Kong MACS Foundations of Cryptography| January 2016.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Privacy Preserving Outlier Detection using Locality Sensitive Hashing
Cryptography Lecture 3 Arpita Patra © Arpita Patra.
Privacy-Preserving Data Aggregation without Secure Channel: Multivariate Polynomial Evaluation Taeho Jung 1, XuFei Mao 2, Xiang-Yang Li 1, Shao-Jie Tang.
Cryptographic methods. Outline  Preliminary Assumptions Public-key encryption  Oblivious Transfer (OT)  Random share based methods  Homomorphic Encryption.
Computational Differential Privacy Ilya Mironov (MICROSOFT) Omkant Pandey (UCLA) Omer Reingold (MICROSOFT) Salil Vadhan (HARVARD)
Private Data Management with Verification
On the Size of Pairing-based Non-interactive Arguments
Introduction to Machine Learning
Understanding Generalization in Adaptive Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Cryptographic Hash Functions Part I
Differential Privacy in Practice
Vitaly (the West Coast) Feldman
Current Developments in Differential Privacy
Cryptography Lecture 24.
Cryptography Lecture 5.
Emanuele Viola Harvard University June 2005
Cryptography Lecture 23.
Presentation transcript:

Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov, Moni Naor, Omkant Pandey, Toni Pitassi, Omer Reingold, Guy Rothblum, Jon Ullman TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA

Computational Complexity When do computational resource constraints change what is possible? Examples: Computational Learning Theory [Valiant `84]: small VC dimension  learnable with efficient algorithms (bad news) Cryptography [Diffie & Hellman `76]: don’t need long shared secrets against a computationally bounded adversary (good news)

Today: Computational Complexity in Differential Privacy I. Computationally bounded curator – Makes differential privacy harder – Differentially private & accurate synthetic data infeasible to construct – Open: release other types of summaries/models? II. Computationally bounded adversary – Makes differential privacy easier – Provable gain in accuracy for 2-party protocols (e.g. for estimating Hamming distance)

PART I: COMPUTATIONALLY BOUNDED CURATORS

Cynthia’s Dream: Noninteractive Data Release Original Database DSanitization C(D) C

Noninteractive Data Release: Desidarata ( ,  )-differential privacy: for every D 1, D 2 that differ in one row and every set T, Pr[C(D 1 )  T]  exp(  )  Pr[C(D 2 )  T]+  with  negligible Utility: C(D) allows answering many questions about D Computational efficiency: C is polynomial-time computable.

D = (x 1,…,x n )  X n P = {  : X  {0,1}} For any  P, want to estimate (from C(D)) counting query  (D):=(  i  (x i ))/n within accuracy error   Example: X = {0,1} d P = {conjunctions on  k variables} Counting query = k-way marginal e.g. What fraction of people in D smoke and have cancer? Utility: Counting Queries >35Smoker?Cancer?

Form of Output Ideal: C(D) is a synthetic dataset –   P |  (C(D))-  (D)|   – Values consistent – Use existing software Alternatives? – Explicit list of |P| answers (e.g. contingency table) – Median of several synthetic datasets [RR10] – Program M s.t.   P |M(  )-  (D)|   >35Smoker?Cancer?

Positive Results minimum database sizecomputational complexity referencegeneral Pk-way marginalssyntheticgeneral Pk-way marginals [DN03,DN04, BDMN05] O(|P| 1/2 /  O(d k/2 /  ) N D = (x 1,…,x n )  ({0,1} d ) n P = {  : {0,1} d  {0,1}}  (D):=(1/n)  i  (x i )  accuracy error  = privacy

Positive Results minimum database sizecomputational complexity referencegeneral Pk-way marginalssyntheticgeneral Pk-way marginals [DN03,DN04, BDMN05] O(|P| 1/2 /  O(d k/2 /  ) Npoly(n,|P|)poly(n,d k ) D = (x 1,…,x n )  ({0,1} d ) n P = {  : {0,1} d  {0,1}}  (D):=(1/n)  i  (x i )  accuracy error  = privacy

Positive Results minimum database sizecomputational complexity referencegeneral Pk-way marginalssyntheticgeneral Pk-way marginals [DN03,DN04, BDMN05] O(|P| 1/2 /  O(d k/2 /  ) Npoly(n,|P|)poly(n,d k ) [BDCKMT07] O(d k /  ) Ypoly(n,2 d ) D = (x 1,…,x n )  ({0,1} d ) n P = {  : {0,1} d  {0,1}}  (D):=(1/n)  i  (x i )  accuracy error  = privacy

Positive Results minimum database sizecomputational complexity referencegeneral Pk-way marginalssyntheticgeneral Pk-way marginals [DN03,DN04, BDMN05] O(|P| 1/2 /  O(d k/2 /  ) Npoly(n,|P|)poly(n,d k ) [BDCKMT07] Õ((2d) k /  ) Ypoly(n,2 d ) [BLR08] O(d  log|P|/    )Õ(dk/    ) Y D = (x 1,…,x n )  ({0,1} d ) n P = {  : {0,1} d  {0,1}}  (D):=(1/n)  i  (x i )  accuracy error  = privacy

Positive Results minimum database sizecomputational complexity referencegeneral Pk-way marginalssyntheticgeneral Pk-way marginals [DN03,DN04, BDMN05] O(|P| 1/2 /  O(d k/2 /  ) Npoly(n,|P|)poly(n,d k ) [BDCKMT07] Õ((2d) k /  ) Ypoly(n,2 d ) [BLR08] O(d  log|P|/    )Õ(dk/    ) Yqpoly(n,|P|,2 d )qpoly(n,2 d ) D = (x 1,…,x n )  ({0,1} d ) n P = {  : {0,1} d  {0,1}}  (D):=(1/n)  i  (x i )  accuracy error  = privacy

Positive Results minimum database sizecomputational complexity referencegeneral Pk-way marginalssyntheticgeneral Pk-way marginals [DN03,DN04, BDMN05] O(|P| 1/2 /  O(d k/2 /  ) Npoly(n,|P|)poly(n,d k ) [BDCKMT07] Õ((2d) k /  ) Ypoly(n,2 d ) [BLR08] O(d  log|P|/    )Õ(dk/    ) Yqpoly(n,|P|,2 d )qpoly(n,2 d ) [DNRRV09, DRV10] O(d  log 2 |P|/    )Õ(dk 2 /    ) Ypoly(n,|P|,2 d ) D = (x 1,…,x n )  ({0,1} d ) n P = {  : {0,1} d  {0,1}}  (D):=(1/n)  i  (x i )  accuracy error  = privacy Summary: Can construct synthetic databases accurate on huge families of counting queries, but complexity may be exponential in dimensions of data and query set P. Question: is this inherent?

Negative Results for Synthetic Data Summary: Producing accurate & differentially private synthetic data is as hard as breaking cryptography (e.g. factoring large integers). Inherently exponential in dimensionality of data (and in dimensionality of queries).

Negative Results for Synthetic Data Thm [DNRRV09]: Under standard crypto assumptions (OWF), there is no n=poly(d) and curator that: – Produces synthetic databases. – Is differentially private. – Runs in time poly(n,d). – Achieves accuracy error  =.99 for P = {circuits of size d 2 } (so |P|~2 d 2 ) Thm [UV10]: Under standard crypto assumptions (OWF), there is no n=poly(d) and curator that: – Produces synthetic databases. – Is differentially private. – Runs in time poly(n,d). – Achieves accuracy error  =.01 for 2-way marginals.

Tool 1: Digital Signature Schemes A digital signature scheme consists of algorithms (Gen,Sign,Ver): On security parameter d, Gen(d) = (SK,PK)  {0,1} d  {0,1} d On m  {0,1} d, can compute  =Sign SK (m)  {0,1} d s.t. Ver PK (m,  )=1 Given many (m,  ) pairs, infeasible to generate new (m’,  ’) satisfying Ver PK Gen, Sign, Ver all computable by circuits of size d 2.

Hard-to-Sanitize Databases Ver PK  {circuits of size d 2 }=P Ver PK (D) = 1 m1m1 Sign SK (m 1 ) m2m2 Sign SK (m 2 ) m3m3 Sign SK (m 3 )  mnmn Sign SK (m n ) Generate random (PK,SK)  Gen(d), m 1, m 2,…, m n  {0,1} d D m’ 1 11 m’ 2 22  m’ k kk curator C(D) Case 1: m’ i  D  Forgery! Case 2: m’ i  D  Reidentification! Ver PK (C(D))  1-  > 0  i Ver PK (m’ i,  i )=1

Negative Results for Synthetic Data Thm [DNRRV09]: Under standard crypto assumptions (OWF), there is no n=poly(d) and curator that: – Produces synthetic databases. – Is differentially private. – Runs in time poly(n,d). – Achieves accuracy error  =.99 for P = {circuits of size d 2 } (so |P|~2 d 2 ) Thm [UV10]: Under standard crypto assumptions (OWF), there is no n=poly(d) and curator that: – Produces synthetic databases. – Is differentially private. – Runs in time poly(n,d). – Achieves accuracy error  =.01 for 3-way marginals.

Tool 2: Probabilistically Checkable Proofs The PCP Theorem:  efficient algorithms (Red,Enc,Dec) s.t. w s.t. V(w)=1 Circuit V of size d 2 Enc Red Set of 3-clauses on d’=poly(d) vars  V ={x 1  x 5   x 7,  x 1  v 5  x d’,…} z  {0,1} d’ satisfying all of  V z’  {0,1} d’ satisfying.99 fraction of  V Decw’ s.t. V(w’)=1

Hard-to-Sanitize Databases Let  PK = Red(Ver PK ) Each clause in  PK is satisfied by all z i m1m1 Sign SK (m 1 ) m2m2 Sign SK (m 2 ) m3m3 Sign SK (m 3 )  mnmn Sign SK (m n ) Generate random (PK,SK)  Gen(d), m 1, m 2,…, m n  {0,1} d D z’ 1 z’ 2  z’ k curator C(D) Case 1: m’ i  D  Forgery! Case 2: m’ i  D  Reidentification! Each clause in  PK is satisfied by  1-  of the z’ i  i s.t. z’ i satisfies  1-  of the clauses Dec(z’ i ) = valid (m’ i,  i ) z1z1 z2z2 z3z3  znzn Enc Ver PK

Part I Conclusions Producing private, synthetic databases that preserve simple statistics requires computation exponential in the dimension of the data. How to bypass? Average-case accuracy: Heuristics that don’t give good accuracy on all databases, only those from some class of models. Non-synthetic data: – Thm [DNRRV09]: For general P (e.g. P={circuits of size d 2 }),  efficient curators “iff”  efficient “traitor-tracing” schemes – But for structured P (e.g. P={all marginals}), wide open!

PART II: COMPUTATIONALLY BOUNDED ADVERSARIES

Motivation Differential privacy protects even against adversaries with unlimited computational power. Can we gain by restricting to adversaries with bounded (but still huge) computational power? – Better accuracy/utility? – Enormous success in cryptography from considering computationally bounded adversaries.

Definitions [MPRV09] ( ,neg(k))-differential privacy: for all D 1, D 2 differing in one row, every set T, and security parameter k, Pr[C k (D 1 )  T]  exp(  )  Pr[C k (D 2 )  T]+neg(k)  Computational  -differential privacy v1: for all D 1, D 2 differing in one row, every probabilistic poly(k)-time algorithm T, and security parameter k, Pr[T(C k (D 1 )  ]  exp(  )  Pr[T(C k (D 2 ))=1]+neg(k) Computational  -differential privacy v2:  ( ,neg(k))- differentially private C’ k such that for all D, C k (D) and C’ k (D) are computationally indistinguishable. immediate open: requires generalization of Dense Model Thm [GT04,RTTV08]

2-Party Privacy 2-party (& multiparty) privacy: each party has a sensitive dataset, want to do a joint computation f(D A,D B ) m1m1 m2m2 m3m3 m k-1 mkmk DADA x1x1 x2x2  xnxn DBDB y1y1 y2y2  ymym Z A  f(D A,D B ) Z B  f(D A,D B ) A’s view should be a (computational) differentially private function of D B (even if A deviates from protocol), and vice-versa

Benefit of Computational Differential Privacy Thm: Under standard cryptographic assumptions (OT),  2-party computational  -differentially private protocol for estimating Hamming distance of bitvectors, with error O(1/  ). Proof: generic paradigm Centralized Solution: Trusted third party could compute diff. private approx. to Hamming distance w/error O(1/  ) Distribute via Secure Function Evaluation [Yao86,GMW86]: Centralized solution  distributed protocol s.t. no computationally bounded party can learn anything other than its output. Remark: More efficient or improved protocols by direct constructions [DKMMN06,BKO08,MPRV09]

Benefit of Computational Differential Privacy Thm: Under standard cryptographic assumptions (OT),  2-party computational  -differentially private protocol for estimating Hamming distance of bitvectors, with error O(1/  ). Thm [MPRV09,MMPRTV10]: The best 2-party differentially private protocol (vs. unbounded adversaries) for estimating Hamming distance has error   (n 1/2 ). Computational privacy  significant gain in accuracy! And efficiency gains too [BNO06].

Conclusions Computational complexity is relevant to differential privacy. Bad news: producing synthetic data is intractable Good news: better protocols against bounded adversaries Interaction with differential privacy likely to benefit complexity theory too.