Download presentation
Presentation is loading. Please wait.
Published byCecil Montgomery Modified over 9 years ago
1
Differential Privacy: Theoretical & Practical Challenges Salil Vadhan Center for Research on Computation & Society John A. Paulson School of Engineering & Applied Sciences Harvard University on sabbatical at Shing-Tung Yau Center Department of Applied Mathematics National Chiao-Tung University Lecture at Institute of Information Science, Academia Sinica November 9, 2015
2
Data Privacy: The Problem Given a dataset with sensitive information, such as: Census data Health records Social network activity Telecommunications data How can we: enable “desirable uses” of the data while protecting the “privacy” of the data subjects? Academic research Informing policy Identifying subjects for drug trial Searching for terrorists Market analysis … Academic research Informing policy Identifying subjects for drug trial Searching for terrorists Market analysis … ????
3
NameSexBloodHIV? ChenFBY JonesMAN SmithMON RossMOY LuFAN ShahMBY Approach 1: Encrypt the Data Problems? NameSexBloodHIV? 100101001001110101110111 101010111010111111001001 001010100100011001110101 001110010010110101100001 110101000000111001010010 111110110010000101110101
4
NameSexBloodHIV? ChenFBY JonesMAN SmithMON RossMOY LuFAN ShahMBY Approach 2: Anonymize the Data “re-identification” often easy [Sweeney `97] Problems?
5
Approach 3: Mediate Access C C trusted “curator” q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 data analysts Problems? NameSexBloodHIV? ChenFBY JonesMAN SmithMON RossMOY LuFAN ShahMBY Even simple “aggregate” statistics can reveal individual info. [Dinur-Nissim `03, Homer et al. `08, Mukatran et al. `11, Dwork et al. `15]
6
Privacy Models from Theoretical CS ModelUtilityPrivacyWho Holds Data? Differential Privacystatistical analysis of dataset individual-specific info trusted curator Secure Function Evaluation any query desiredeverything other than result of query original users (or semi-trusted delegates) Fully Homomorphic (or Functional) Encryption any query desiredeverything (except possibly result of query) untrusted server
7
Differential privacy C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 data analysts Requirement: effect of each individual should be “hidden” [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] SexBloodHIV? FBY MAN MON MOY FAN MBY
8
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY MAN MON MOY FAN MBY adversary
9
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY MAN MON MOY FAN MBY Requirement: an adversary shouldn’t be able to tell if any one person’s data were changed arbitrarily adversary
10
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY MON MOY FAN MBY Requirement: an adversary shouldn’t be able to tell if any one person’s data were changed arbitrarily adversary
11
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY Requirement: an adversary shouldn’t be able to tell if any one person’s data were changed arbitrarily adversary
12
Simple approach: random noise C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY M
13
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C randomized curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY adversary
14
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C randomized curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY adversary
15
Differential privacy [Dinur-Nissim ’03+Dwork, Dwork-Nissim ’04, Blum-Dwork-McSherry- Nissim ’05, Dwork-McSherry-Nissim-Smith ’06] C C randomized curator q1q1 a1a1 q2q2 a2a2 q3q3 a3a3 SexBloodHIV? FBY FAY MON MOY FAN MBY adversary
16
Simple approach: random noise C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C
17
Answering multiple queries C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C
18
Answering multiple queries C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C
19
Some Differentially Private Algorithms histograms [DMNS06] contingency tables [BCDKMT07, GHRU11, TUV12, DNT14], machine learning [BDMN05,KLNRS08], regression & statistical estimation [CMS11,S11,KST11,ST12,JT13 ] clustering [BDMN05,NRS07] social network analysis [HLMJ09,GRU11,KRSY11,KNRS13,BBDS13] approximation algorithms [GLMRT10] singular value decomposition [HR12, HR13, KT13, DTTZ14] streaming algorithms [DNRY10,DNPR10,MMNW11] mechanism design [MT07,NST10,X11,NOS12,CCKMV12,HK12,KPRU12] … See Simons Institute Workshop on Big Data & Differential Privacy 12/13Simons Institute Workshop on Big Data & Differential Privacy 12/13
20
Differential Privacy: Interpretations Whatever an adversary learns about me, it could have learned from everyone else’s data. Mechanism cannot leak “individual-specific” information. Above interpretations hold regardless of adversary’s auxiliary information. Composes gracefully (k repetitions ) k differentially private) But No protection for information that is not localized to a few rows. No guarantee that subjects won’t be “harmed” by results of analysis.
21
Answering multiple queries C “What fraction of people are type B and HIV positive?” SexBloodHIV? FBY MAN MON MOY FAN MBY C
22
Amazing possibility: synthetic data Utility: preserves fraction of people with every set of attributes! “fake” people [Blum-Ligett-Roth ’08, Hardt-Rothblum `10] C C SexBloodHIV? FBY MAN MON MOY FAN MBY SexBloodHIV? MBN FBY MOY FAN FON
23
Our result: synthetic data is hard “fake” people [Dwork-Naor-Reingold-Rothblum-V. `09, Ullman-V. `11] Theorem: Any such C requires exponential computing time (else we could break all of cryptography). C C SexBloodHIV? FBY MAN MON MOY FAN MBY SexBloodHIV? MBN FBY MOY FAN FON
24
Contains the same statistical information as synthetic data But can be computed in sub-exponential time! rich summary C C SexBloodHIV? FBY MAN MON MOY FAN MBY Our result: alternative summaries [Thaler-Ullman-V. `12, Chandrasekaran-Thaler-Ullman-Wan `13] Open: is there a polynomial-time algorithm?
25
Amazing Possibility II: Statistical Inference & Machine Learning C SexBloodHIV? FBY MAN MON MOY FAN MBY
26
DP Theory & Practice Theory: differential privacy research has many intriguing theoretical challenges rich connections w/other parts of CS theory & mathematics e.g. cryptography, learning theory, game theory & mechanism design, convex geometry, pseudorandomness, optimization, approximability, communication complexity, statistics, … Practice: interest from many communities in seeing whether DP can be brought to practice e.g. statistics, databases, medical informatics, privacy law, social science, computer security, programming languages, …
27
Challenges for DP in Practice
28
Some Efforts to Bring DP to Practice CMU-Cornell-PennState “Integrating Statistical and Computational Approaches to Privacy” (See http://onthemap.ces.census.gov/) Google “RAPPOR" UCSD “Integrating Data for Analysis, Anonymization, and Sharing” (iDash) UT Austin “Airavat: Security & Privacy for MapReduce” UPenn “Putting Differential Privacy to Work” Stanford-Berkeley-Microsoft “Towards Practicing Privacy” Duke-NISSS “Triangle Census Research Network” Harvard “Privacy Tools for Sharing Research Data” MIT/CSAIL/ALFA "MoocDB Privacy tools for Sharing MOOC data" …
29
Computer Science, Law, Social Science, Statistics Privacy tools for sharing research data http://privacytools.seas.harvard.edu/ Any opinions, findings, and conclusions or recommendations expressed here are those of the author(s) and do not necessarily reflect the views of the funders of the work. A SaTC Frontier project
30
30 Target: Data Repositories
31
Datasets are restricted due to privacy concerns Goal: enable wider sharing while protecting privacy
32
Challenges for Sharing Sensitive Data Complexity of Law Thousands of privacy laws in the US alone, at federal, state and local level, usually context-specific: HIPAA, FERPA, CIPSEA, Privacy Act, PPRA, ESRA, …. Difficulty of Deidentification Stripping “PII” usually provides weak protections and/or poor utility Inefficient Process for Obtaining Restricted Data Can involve months of negotiation between institutions, original researchers Goal: make sharing easier for researcher without expertise in privacy law/cs/stats Sweeney `97
33
Vision: Integrated Privacy Tools Risk Assessment and De-Identification Risk Assessment and De-Identification Differential Privacy Customized & Machine- Actionable Terms of Use Customized & Machine- Actionable Terms of Use Data Tag Generator Data Set Query Access Restricted Access Tools to be developed during project Consent from subjects Open Access to Sanitized Data Set IRB proposal & review Policy Proposals and Best Practices Database of Privacy Laws & Regulations Deposit in repository
34
Already can run many statistical analyses (“Zelig methods”) through the Dataverse interface, without downloading data.
35
A new interactive data exploration & analysis tool to in Dataverse 4.0. Plan: use differential privacy to enable access to currently restricted datasets
36
Goals for our DP Tools General-purpose: applicable to most datasets uploaded to Dataverse. Automated: no differential privacy expert optimizing algorithms for a particular dataset or application Tiered access: DP interface for wide access to rough statistical information, helping users decide whether to apply for access to raw data (cf. Census PUMS vs RDCs) (Limited) prototype on project website: http://privacytools.seas.harvard.edu/
37
Differential Privacy: Summary
38
Privacy Models from Theoretical CS ModelUtilityPrivacyWho Holds Data? Differential Privacystatistical analysis of dataset individual-specific info trusted curator Secure Function Evaluation any query desiredeverything other than result of query original users (or semi-trusted delegates) Fully Homomorphic (or Functional) Encryption any query desiredeverything (except possibly result of query) untrusted server See Shafi Goldwasser’s talk at White House-MIT Big Data Privacy Workshop 3/3/14
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.