Download presentation
Presentation is loading. Please wait.
Published byUtami Halim Modified over 6 years ago
1
Differential Privacy and Statistical Inference: A TCS Perspective
Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences, Harvard University Simons Institute Data Privacy Planning Workshop May 23, 2017
2
Two views of data analysis
Traditional TCS Algorithms (and much of DP literature): ๐ฅ M ๐ โutilityโ ๐(๐ฅ,๐) Statistical Inference ๐ 1 ๐ 2 โฎ ๐ ๐ Population ๐ ๐ M random sampling โutilityโ ๐(๐,๐)
3
Statistical Inference with DP
๐ 1 ๐ 2 โฎ ๐ ๐ Population ๐ ๐ M random sampling โutilityโ ๐(๐,๐) Desiderata: M differentially private [worst-case] Utility maximized [average-case over ๐=( ๐ 1 ,โฆ, ๐ ๐ ), ๐ worst-case (frequentist) or average-case (Bayesian) over ๐] Example: Differentially private PAC Learning [Kasiviswanathan-Lee-Nissim-Raskhodnikova-Smith `08].
4
Natural Two-Step Approach
1. Start with the โbestโ non-private inference procedure ๐ 1 ๐ 2 โฎ ๐ ๐ Population ๐, mean ๐ ๐ ๐ ๐๐ ๐ โ๐ =ฮ 1 ๐ 2. Approximate it as well as possible with a DP procedure ๐ฅ 1 ๐ฅ 2 โฎ ๐ฅ ๐ ๐= ๐ฅ +Lap(.) ๐ ๐๐ ๐โ ๐ฅ =ฮ 1 ๐
5
Privacy for Free? ๐ ๐๐ ๐ 1 ๐ 2 โฎ ๐= ๐ +Lap(.) ๐ ๐ ๐โ๐ = 1+o 1 โ
๐ โ๐
Population ๐, mean ๐ ๐= ๐ +Lap(.) ๐ ๐๐ ๐โ๐ = 1+o 1 โ
๐ โ๐
6
Limitations of Two-Step Approach
Asymptotics hides important parameters. ๐ ๐ โซ ๐
๐ ๐ only when ๐โซ ๐
๐๐ 2 , often huge! Some parameters (e.g. ๐= ฯ[๐]) may be unknown. Can draw wildly incorrect inferences at finite ๐ [โDP killsโ] Requiring ๐ ๐๐ ๐ฅ โ ๐ ๐๐ ๐ฅ on worst-case inputs may be overkill (and even impossible), e.g. if range ๐
unbounded Optimal non-private procedure may not yield optimal differentially private procedure.
7
A Different Two-Step Approach
โsummaryโ of dataset ๐ 1 ๐ 2 โฎ ๐ ๐ Population ๐ ๐ ๐ M T โutilityโ ๐(๐,๐) DP mechanism Post-processing Naรฏve application runs into similar difficulties as before. [Feinberg-Rinaldo-Yang `11, Karwa-Slavkovic `12]. Approaches for addressing these problems in [Vu-Slavkovic `09, McSherry-Williiams `10, Karwa-Slavkovic `12, ...].
8
Take-Away Messages M Study overall design problem: ๐ 1 ๐ 2 โฎ ๐ ๐ ๐
Population ๐ ๐ M random sampling โutilityโ ๐(๐,๐) Study overall design problem: M differentially private [worst-case] Utility maximized [average-case over ๐=( ๐ 1 ,โฆ, ๐ ๐ ), ๐ worst-case (frequentist) or average-case (Bayesian) over ๐] [Kasiviswanathan et al. `08, Dwork-Lei `09, Smith `10, Wasserman-Zhou `10, Hall-Rinaldo-Wasserman `13, Duchi-Jordan-Wainwright `12 & `13, Barber-Duchi `14,โฆ ]
9
Take-Away Messages M 2. Ensure โsoundnessโ: ๐ 1 ๐ 2 โฎ ๐ ๐ ๐
Population ๐ ๐ M random sampling โutilityโ ๐(๐,๐) 2. Ensure โsoundnessโ: Prevent incorrect conclusions even at small ๐. OK to declare โfailureโ. [Vu-Slakvovic `09, McSherry-Williiams `10, Karwa-Slavkovic `12, ...].
10
Example 1: Confidence Intervals [Karwa-Vadhan `17]
๐ 1 ๐ 2 โฎ ๐ ๐ ๐=๐(๐, ๐ 2 ) ๐โ โ๐
,๐
๐โ[ ๐ ๐๐๐ , ๐ ๐๐๐ฅ ] ๐ผโโ M Requirements: Privacy: ๐ ๐-differentially private. Coverage (โsoundnessโ): โ ๐, ๐, ๐, ๐, Pr ๐โ๐ผ โฅ .95. Goal: Length (โutilityโ): minimize E[|I|].
11
Example 1: Confidence Intervals [Karwa-Vadhan `17]
๐ 1 ๐ 2 โฎ ๐ ๐ ๐=๐(๐, ๐ 2 ) ๐โ โ๐
,๐
๐ known ๐ผโโ M Upper Bound: there is an ๐-DP algorithm ๐ achieving E[|๐ผ|] โค 2๐ง .975 โ
๐ ๐ + ๐ ๐ โ
๐ 1 ๐ non-private length provided that ๐โณ ๐ ๐ log ๐
๐
12
Example 1: Confidence Intervals [Karwa-Vadhan `17]
๐ 1 ๐ 2 โฎ ๐ ๐ ๐=๐(๐, ๐ 2 ) ๐โ โ๐
,๐
๐ known ๐ผโโ M Upper Bound: there is an ๐-DP algorithm ๐ achieving E[|๐ผ|] โค 2๐ง .975 โ
๐ ๐ + ๐ ๐ โ
๐ 1 ๐ Lower Bound: Must have either E[|๐ผ|] โฅ๐
/2 or both E[|๐ผ|] โฅ ๐ ๐๐ and ๐โณ ๐ ๐ log ๐
๐ provided that ๐โณ ๐ ๐ log ๐
๐
13
Example 2: Hypothesis Testing [Vu-Slavkovic `09, Uhler-Slavkovic-Feinberg `13, Yu-Feinberg-Slavkovic-Uhler `14, Gaboardi-Lim-Rogers-Vadhan `16, Wang-Lee-Kifer `16, Kifer-Rogers `16] ๐ 1 ๐ 2 โฎ ๐ ๐ ๐ distribution on X M 0 or 1 Requirements: Privacy: ๐ ๐-differentially private. Significance (Type I error): for all ๐, ๐ if ๐= ๐ป 0 then Pr ๐ ๐ =0 โฅ .95. Goal: Power (Type II error): if ๐ โfarโ from ๐ป 0 , then Pr ๐ ๐ =1 โlargeโ
14
Example 2: Hypothesis Testing [Cai-Daskalakis-Kamath `17]
๐ 1 ๐ 2 โฎ ๐ ๐ ๐ distribution on X M 0 or 1 Requirements: Privacy: ๐ ๐-differentially private. Significance (Type I error): for all ๐, ๐,๐พ if ๐= ๐ป 0 then Pr ๐ ๐ =0 โฅ .95. Goal: Power (Type II error): if ๐ ๐๐ ๐, ๐ป 0 โฅ๐พ, then Pr ๐ ๐ =1 โฅ.95.
15
Challenges for Future Research I
Study more sophisticated inference problems e.g. confidence intervals for multivariate gaussians with unknown covariance matrix (related to least-squares regression [Sheffet `16]) What asymptotics are acceptable? Much of statistical inference relies on calculating asymptotic distributions (e.g. via CLT); often reliable even when ๐ small. [Kifer-Rogers `17]: assume that ๐=ฮฉ( 1 ๐ ).
16
Challenges for Future Research II
Can we rigorously analyze effect of privacy even when non-private algorithms donโt have rigorous analyses? e.g. in hypothesis testing, privacy needs at most ๐( 1 ๐ ) blow-up in sample sizeโฆ but this is suboptimal [Cai-Daskalakis-Kamath `17]. Lower Bounds Most existing techniques prove lower bounds on some kind of inference problems. We should explicitly state these! Does privacy have an inherent cost even when ๐ is large? e.g. must DP confidence intervals have length E[|๐ผ|] โฅ 2๐ง .975 โ
๐ ๐ +ฮฉ ๐ ๐๐ ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.