Differential Privacy and Statistical Inference: A TCS Perspective Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences, Harvard University Simons Institute Data Privacy Planning Workshop May 23, 2017
Two views of data analysis Traditional TCS Algorithms (and much of DP literature): 𝑥 M 𝑌 “utility” 𝑈(𝑥,𝑌) Statistical Inference 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 Population 𝑃 𝑌 M random sampling “utility” 𝑈(𝑃,𝑌)
Statistical Inference with DP 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 Population 𝑃 𝑌 M random sampling “utility” 𝑈(𝑃,𝑌) Desiderata: M differentially private [worst-case] Utility maximized [average-case over 𝑋=( 𝑋 1 ,…, 𝑋 𝑛 ), 𝑌 worst-case (frequentist) or average-case (Bayesian) over 𝑃] Example: Differentially private PAC Learning [Kasiviswanathan-Lee-Nissim-Raskhodnikova-Smith `08].
Natural Two-Step Approach 1. Start with the “best” non-private inference procedure 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 Population 𝑃, mean 𝜇 𝑋 𝑀 𝑛𝑝 𝑋 −𝜇 =Θ 1 𝑛 2. Approximate it as well as possible with a DP procedure 𝑥 1 𝑥 2 ⋮ 𝑥 𝑛 𝑌= 𝑥 +Lap(.) 𝑀 𝑑𝑝 𝑌− 𝑥 =Θ 1 𝑛
Privacy for Free? 𝑀 𝑑𝑝 𝑋 1 𝑋 2 ⋮ 𝑌= 𝑋 +Lap(.) 𝑋 𝑛 𝑌−𝜇 = 1+o 1 ⋅ 𝑋 −𝜇 Population 𝑃, mean 𝜇 𝑌= 𝑋 +Lap(.) 𝑀 𝑑𝑝 𝑌−𝜇 = 1+o 1 ⋅ 𝑋 −𝜇
Limitations of Two-Step Approach Asymptotics hides important parameters. 𝜎 𝑛 ≫ 𝑅 𝜖 𝑛 only when 𝑛≫ 𝑅 𝜎𝜖 2 , often huge! Some parameters (e.g. 𝜎= σ[𝑃]) may be unknown. Can draw wildly incorrect inferences at finite 𝑛 [“DP kills”] Requiring 𝑀 𝑑𝑝 𝑥 ≈ 𝑀 𝑛𝑝 𝑥 on worst-case inputs may be overkill (and even impossible), e.g. if range 𝑅 unbounded Optimal non-private procedure may not yield optimal differentially private procedure.
A Different Two-Step Approach “summary” of dataset 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 Population 𝑃 𝑌 𝑍 M T “utility” 𝑈(𝑃,𝑍) DP mechanism Post-processing Naïve application runs into similar difficulties as before. [Feinberg-Rinaldo-Yang `11, Karwa-Slavkovic `12]. Approaches for addressing these problems in [Vu-Slavkovic `09, McSherry-Williiams `10, Karwa-Slavkovic `12, ...].
Take-Away Messages M Study overall design problem: 𝑋 1 𝑋 2 ⋮ 𝑌 𝑋 𝑛 Population 𝑃 𝑌 M random sampling “utility” 𝑈(𝑃,𝑌) Study overall design problem: M differentially private [worst-case] Utility maximized [average-case over 𝑋=( 𝑋 1 ,…, 𝑋 𝑛 ), 𝑌 worst-case (frequentist) or average-case (Bayesian) over 𝑃] [Kasiviswanathan et al. `08, Dwork-Lei `09, Smith `10, Wasserman-Zhou `10, Hall-Rinaldo-Wasserman `13, Duchi-Jordan-Wainwright `12 & `13, Barber-Duchi `14,… ]
Take-Away Messages M 2. Ensure “soundness”: 𝑋 1 𝑋 2 ⋮ 𝑌 𝑋 𝑛 Population 𝑃 𝑌 M random sampling “utility” 𝑈(𝑃,𝑌) 2. Ensure “soundness”: Prevent incorrect conclusions even at small 𝑛. OK to declare “failure”. [Vu-Slakvovic `09, McSherry-Williiams `10, Karwa-Slavkovic `12, ...].
Example 1: Confidence Intervals [Karwa-Vadhan `17] 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 𝑃=𝑁(𝜇, 𝜎 2 ) 𝜇∈ −𝑅,𝑅 𝜎∈[ 𝜎 𝑚𝑖𝑛 , 𝜎 𝑚𝑎𝑥 ] 𝐼⊆ℝ M Requirements: Privacy: 𝑀 𝜖-differentially private. Coverage (“soundness”): ∀ 𝑛, 𝜇, 𝜎, 𝜖, Pr 𝜇∈𝐼 ≥ .95. Goal: Length (“utility”): minimize E[|I|].
Example 1: Confidence Intervals [Karwa-Vadhan `17] 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 𝑃=𝑁(𝜇, 𝜎 2 ) 𝜇∈ −𝑅,𝑅 𝜎 known 𝐼⊆ℝ M Upper Bound: there is an 𝜖-DP algorithm 𝑀 achieving E[|𝐼|] ≤ 2𝑧 .975 ⋅ 𝜎 𝑛 + 𝜎 𝜖 ⋅ 𝑂 1 𝑛 non-private length provided that 𝑛≳ 𝑐 𝜖 log 𝑅 𝜎
Example 1: Confidence Intervals [Karwa-Vadhan `17] 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 𝑃=𝑁(𝜇, 𝜎 2 ) 𝜇∈ −𝑅,𝑅 𝜎 known 𝐼⊆ℝ M Upper Bound: there is an 𝜖-DP algorithm 𝑀 achieving E[|𝐼|] ≤ 2𝑧 .975 ⋅ 𝜎 𝑛 + 𝜎 𝜖 ⋅ 𝑂 1 𝑛 Lower Bound: Must have either E[|𝐼|] ≥𝑅/2 or both E[|𝐼|] ≥ 𝜎 𝜖𝑛 and 𝑛≳ 𝑐 𝜖 log 𝑅 𝜎 provided that 𝑛≳ 𝑐 𝜖 log 𝑅 𝜎
Example 2: Hypothesis Testing [Vu-Slavkovic `09, Uhler-Slavkovic-Feinberg `13, Yu-Feinberg-Slavkovic-Uhler `14, Gaboardi-Lim-Rogers-Vadhan `16, Wang-Lee-Kifer `16, Kifer-Rogers `16] 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 𝑃 distribution on X M 0 or 1 Requirements: Privacy: 𝑀 𝜖-differentially private. Significance (Type I error): for all 𝑛, 𝜖 if 𝑃= 𝐻 0 then Pr 𝑀 𝑋 =0 ≥ .95. Goal: Power (Type II error): if 𝑃 “far” from 𝐻 0 , then Pr 𝑀 𝑋 =1 “large”
Example 2: Hypothesis Testing [Cai-Daskalakis-Kamath `17] 𝑋 1 𝑋 2 ⋮ 𝑋 𝑛 𝑃 distribution on X M 0 or 1 Requirements: Privacy: 𝑀 𝜖-differentially private. Significance (Type I error): for all 𝑛, 𝜖,𝛾 if 𝑃= 𝐻 0 then Pr 𝑀 𝑋 =0 ≥ .95. Goal: Power (Type II error): if 𝑑 𝑇𝑉 𝑃, 𝐻 0 ≥𝛾, then Pr 𝑀 𝑋 =1 ≥.95.
Challenges for Future Research I Study more sophisticated inference problems e.g. confidence intervals for multivariate gaussians with unknown covariance matrix (related to least-squares regression [Sheffet `16]) What asymptotics are acceptable? Much of statistical inference relies on calculating asymptotic distributions (e.g. via CLT); often reliable even when 𝑛 small. [Kifer-Rogers `17]: assume that 𝜖=Ω( 1 𝑛 ).
Challenges for Future Research II Can we rigorously analyze effect of privacy even when non-private algorithms don’t have rigorous analyses? e.g. in hypothesis testing, privacy needs at most 𝑂( 1 𝜖 ) blow-up in sample size… but this is suboptimal [Cai-Daskalakis-Kamath `17]. Lower Bounds Most existing techniques prove lower bounds on some kind of inference problems. We should explicitly state these! Does privacy have an inherent cost even when 𝑛 is large? e.g. must DP confidence intervals have length E[|𝐼|] ≥ 2𝑧 .975 ⋅ 𝜎 𝑛 +Ω 𝜎 𝜖𝑛 ?