The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:

Slides:



Advertisements
Similar presentations
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Advertisements

When Random Sampling Preserves Privacy Kamalika Chaudhuri U.C.Berkeley Nina Mishra U.Virginia.
I have a DREAM! (DiffeRentially privatE smArt Metering) Gergely Acs and Claude Castelluccia {gergely.acs, INRIA 2011.
Differentially Private Recommendation Systems Jeremiah Blocki Fall A: Foundations of Security and Privacy.
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Computational Complexity & Differential Privacy Salil Vadhan Harvard University Joint works with Cynthia Dwork, Kunal Talwar, Andrew McGregor, Ilya Mironov,
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Private Analysis of Graph Structure With Vishesh Karwa, Sofya Raskhodnikova and Adam Smith Pennsylvania State University Grigory Yaroslavtsev
Raef Bassily Adam Smith Abhradeep Thakurta Penn State Yahoo! Labs Private Empirical Risk Minimization: Efficient Algorithms and Tight Error Bounds Penn.
By Claudia Fiorini, Enrico Martinelli, Fabio Massacci
Foundations of Privacy Lecture 6 Lecturer: Moni Naor.
The State of the Art Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A AA A AAA.
Seminar in Foundations of Privacy 1.Adding Consistency to Differential Privacy 2.Attacks on Anonymized Social Networks Inbal Talgam March 2008.
Kunal Talwar MSR SVC [Dwork, McSherry, Talwar, STOC 2007] TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A AA A.
 Guarantee that EK is safe  Yes because it is stored in and used by hw only  No because it can be obtained if someone has physical access but this can.
Differential Privacy 18739A: Foundations of Security and Privacy Anupam Datta Fall 2009.
Sublinear time algorithms Ronitt Rubinfeld Blavatnik School of Computer Science Tel Aviv University TexPoint fonts used in EMF. Read the TexPoint manual.
Foundations of Privacy Lecture 7 Lecturer: Moni Naor.
April 13, 2010 Towards Publishing Recommendation Data With Predictive Anonymization Chih-Cheng Chang †, Brian Thompson †, Hui Wang ‡, Danfeng Yao † †‡
Privacy without Noise Yitao Duan NetEase Youdao R&D Beijing China CIKM 2009.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Calibrating Noise to Sensitivity in Private Data Analysis
The Union-Split Algorithm and Cluster-Based Anonymization of Social Networks Brian Thompson Danfeng Yao Rutgers University Dept. of Computer Science Piscataway,
Current Developments in Differential Privacy Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences Harvard.
Task 1: Privacy Preserving Genomic Data Sharing Presented by Noman Mohammed School of Computer Science McGill University 24 March 2014.
Differentially Private Transit Data Publication: A Case Study on the Montreal Transportation System Rui Chen, Concordia University Benjamin C. M. Fung,
R 18 G 65 B 145 R 0 G 201 B 255 R 104 G 113 B 122 R 216 G 217 B 218 R 168 G 187 B 192 Core and background colors: 1© Nokia Solutions and Networks 2014.
Private Analysis of Graphs
Using Data Privacy for Better Adaptive Predictions Vitaly Feldman IBM Research – Almaden Foundations of Learning Theory, 2014 Cynthia Dwork Moritz Hardt.
Defining and Achieving Differential Privacy Cynthia Dwork, Microsoft TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.:
Differential Privacy Tutorial Part 1: Motivating the Definition Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual.
Overview of Privacy Preserving Techniques.  This is a high-level summary of the state-of-the-art privacy preserving techniques and research areas  Focus.
APPLYING EPSILON-DIFFERENTIAL PRIVATE QUERY LOG RELEASING SCHEME TO DOCUMENT RETRIEVAL Sicong Zhang, Hui Yang, Lisa Singh Georgetown University August.
Foundations of Privacy Lecture 3 Lecturer: Moni Naor.
CS573 Data Privacy and Security Statistical Databases
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Slide 1 Differential Privacy Xintao Wu slides (P2-20) from Vitaly Shmatikove, then from Adam Smith.
Differentially Private Marginals Release with Mutual Consistency and Error Independent of Sample Size Cynthia Dwork, Microsoft TexPoint fonts used in EMF.
Privacy by Learning the Database Moritz Hardt DIMACS, October 24, 2012.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Foundations of Privacy Lecture 5 Lecturer: Moni Naor.
Differential Privacy Some contents are borrowed from Adam Smith’s slides.
Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering.
A Whirlwind Tour of Differential Privacy
PRIVACY TOOLS FOR SHARING RESEARCH DATA NSF site visit October 19, 2015 Salil Vadhan Supported by the NSF Secure & Trustworthy Cyberspace (SaTC) program,
Massive Data Sets and Information Theory Ziv Bar-Yossef Department of Electrical Engineering Technion.
m-Privacy for Collaborative Data Publishing
Differential Privacy (1). Outline  Background  Definition.
1 Differential Privacy Cynthia Dwork Mamadou H. Diallo.
Auditing Information Leakage for Distance Metrics Yikan Chen David Evans TexPoint fonts used in EMF. Read the TexPoint manual.
Yang, et al. Differentially Private Data Publication and Analysis. Tutorial at SIGMOD’12 Part 4: Data Dependent Query Processing Methods Yin “David” Yang.
Privacy-safe Data Sharing. Why Share Data? Hospitals share data with researchers – Learn about disease causes, promising treatments, correlations between.
Space for things we might want to put at the bottom of each slide. Part 6: Open Problems 1 Marianne Winslett 1,3, Xiaokui Xiao 2, Yin Yang 3, Zhenjie Zhang.
Sergey Yekhanin Institute for Advanced Study Lower Bounds on Noise.
Reconciling Confidentiality Risk Measures from Statistics and Computer Science Jerry Reiter Department of Statistical Science Duke University.
A hospital has a database of patient records, each record containing a binary value indicating whether or not the patient has cancer. -suppose.
Privacy Issues in Graph Data Publishing Summer intern: Qing Zhang (from NC State University) Mentors: Graham Cormode and Divesh Srivastava.
Computational Differential Privacy Ilya Mironov (MICROSOFT) Omkant Pandey (UCLA) Omer Reingold (MICROSOFT) Salil Vadhan (HARVARD)
Private Data Management with Verification
Understanding Generalization in Adaptive Data Analysis
Privacy-preserving Release of Statistics: Differential Privacy
Graph Analysis with Node Differential Privacy
Differential Privacy in Practice
Vitaly (the West Coast) Feldman
CS 154, Lecture 6: Communication Complexity
Current Developments in Differential Privacy
Published in: IEEE Transactions on Industrial Informatics
CS639: Data Management for Data Science
Some contents are borrowed from Adam Smith’s slides
Differential Privacy (1)
Presentation transcript:

The Complexity of Differential Privacy Salil Vadhan Harvard University TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA

Thank you Shafi & Silvio For... inspiring us with beautiful science challenging us to believe in the “impossible” guiding us towards our own journeys And Oded for organizing this wonderful celebration enabling our individual & collective development

Data Privacy: The Problem Given a dataset with sensitive information, such as: Census data Health records Social network activity Telecommunications data How can we: enable others to analyze the data while protecting the privacy of the data subjects? open data privacy

Traditional approach: “anonymize” by removing “personally identifying information (PII)” Many supposedly anonymized datasets have been subject to reidentification: – Gov. Weld’s medical record reidentified using voter records [Swe97]. – Netflix Challenge database reidentified using IMDb reviews [NS08] – AOL search users reidentified by contents of their queries [BZ06] – Even aggregate genomic data is dangerous [HSR+08] Data Privacy: The Challenge privacy utility

Differential Privacy A strong notion of privacy that: Is robust to auxiliary information possessed by an adversary Degrades gracefully under repetition/composition Allows for many useful computations Emerged from a series of papers in theoretical CS: [Dinur-Nissim `03 (+Dwork), Dwork-Nissim `04, Blum-Dwork- McSherry-Nissim `05, Dwork-McSherry-Nissim-Smith `06]

Def [DMNS06 ] : A randomized algorithm C is  -differentially private iff  databases D, D’ that differ on one row 8 query sequences q 1,…,q t  sets T  R t, Pr[C(D,q 1,…,q t )  T]  e   Pr[C(D’,q 1,…,q t )  T] +   Pr[C(D’,q 1,…,q t )  T]  small constant, e.g.  =.01,  cryptographically small, e.g.  = Differential Privacy Database D  X n C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 data analysts D‘ “My data has little influence on what the analysts see” cf. indistinguishability [Goldwasser-Micali `82]

Differential Privacy Database D  X n C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 data analysts D‘

D = (x 1,…,x n )  X n Goal: given q : X ! {0,1} estimate counting query q(D):=  i q(x i )/n within error   Example: X = {0,1} d q = conjunction on  k variables Counting query = k-way marginal e.g. What fraction of people in D are over 40 and were once fans of Van Halen? Differential Privacy: Example Male?VH?

Differential Privacy: Example

Other Differentially Private Algorithms histograms [DMNS06] contingency tables [BCDKMT07, GHRU11], machine learning [BDMN05,KLNRS08], logistic regression & statistical estimation [CMS11,S11,KST11,ST12] clustering [BDMN05,NRS07] social network analysis [HLMJ09,GRU11,KRSY11,KNRS13,BBDS13] approximation algorithms [GLMRT10] singular value decomposition [HR13] streaming algorithms [DNRY10,DNPR10,MMNW11] mechanism design [MT07,NST10,X11,NOS12,CCKMV12,HK12,KPRU12] …

Differential Privacy: More Interpretations Whatever an adversary learns about me, it could have learned from everyone else’s data. Mechanism cannot leak “individual-specific” information. Above interpretations hold regardless of adversary’s auxiliary information. Composes gracefully (k repetitions ) k  differentially private) But No protection for information that is not localized to a few rows. No guarantee that subjects won’t be “harmed” by results of analysis. cf. semantic security [Goldwasser-Micali `82]

This talk: Computational Complexity in Differential Privacy Q: Do computational resource constraints change what is possible? Computationally bounded curator – Makes differential privacy harder – Exponential hardness results for unstructured queries or synthetic data. – Subexponential algorithms for structured queries w/other types of data representations. Computationally bounded adversary – Makes differential privacy easier – Provable gain in accuracy for multi-party protocols (e.g. for estimating Hamming distance)

A More Ambitious Goal: Noninteractive Data Release Original Database DSanitization C(D) C Goal: From C(D), can answer many questions about D, e.g. all counting queries associated with a large family of predicates Q = {q : X ! {0,1}}

Noninteractive Data Release: Possibility Male?VH? Male?VH? C “fake” people

Noninteractive Data Release: Complexity [Goldwasser-Micali- Rivest `84] Connection to inapproximability [FGLSS `91, ALMSS `92]

Noninteractive Data Release: Complexity

Traitor-Tracing Schemes [Chor-Fiat-Naor `94] A TT scheme consists of (Gen,Enc,Dec,Trace)… users broadcaster

Traitor-Tracing Schemes [Chor-Fiat-Naor `94] A TT scheme consists of (Gen,Enc,Dec,Trace)… users Q: What if some users try to resell the content? pirate decoder broadcaster

Traitor-Tracing Schemes [Chor-Fiat-Naor `94] A TT scheme consists of (Gen,Enc,Dec,Trace)… users Q: What if some users try to resell the content? pirate decoder tracer accuse user i A: Some user in the coalition will be traced!

Traitor-tracing vs. Differential Privacy [Dwork-Naor-Reingold-Rothblum-Vadhan `09, Ullman `13] Traitor-tracing: Given any algorithm P that has the “functionality” of the user keys, the tracer can identify one of its user keys Differential privacy: There exists an algorithm C(D) that has the “functionality” of the database but no one can identify any of its records Opposites!

broadcaster

accuse user i

Differential Privacy vs. Traitor-Tracing User Keys Ciphertexts Pirate Decoder Tracing Algorithm

Noninteractive Data Release: Complexity

Noninteractive Data Release: Algorithms

How to go beyond synthetic data? Database D Sanitization C

Conclusions Differential Privacy has many interesting questions & connections for complexity theory Computationally Bounded Curators Complexity of answering many “simple” queries still unknown. We know even less about complexity of private PAC learning. Computationally Bounded Curators & Multiparty Differential Privacy Connections to communication complexity, randomness extractors, crypto protocols, dense model theorems. Also many basic open problems!