Presentation is loading. Please wait.

Presentation is loading. Please wait.

Differential Privacy and Statistical Inference: A TCS Perspective

Similar presentations


Presentation on theme: "Differential Privacy and Statistical Inference: A TCS Perspective"โ€” Presentation transcript:

1 Differential Privacy and Statistical Inference: A TCS Perspective
Salil Vadhan Center for Research on Computation & Society School of Engineering & Applied Sciences, Harvard University Simons Institute Data Privacy Planning Workshop May 23, 2017

2 Two views of data analysis
Traditional TCS Algorithms (and much of DP literature): ๐‘ฅ M ๐‘Œ โ€œutilityโ€ ๐‘ˆ(๐‘ฅ,๐‘Œ) Statistical Inference ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› Population ๐‘ƒ ๐‘Œ M random sampling โ€œutilityโ€ ๐‘ˆ(๐‘ƒ,๐‘Œ)

3 Statistical Inference with DP
๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› Population ๐‘ƒ ๐‘Œ M random sampling โ€œutilityโ€ ๐‘ˆ(๐‘ƒ,๐‘Œ) Desiderata: M differentially private [worst-case] Utility maximized [average-case over ๐‘‹=( ๐‘‹ 1 ,โ€ฆ, ๐‘‹ ๐‘› ), ๐‘Œ worst-case (frequentist) or average-case (Bayesian) over ๐‘ƒ] Example: Differentially private PAC Learning [Kasiviswanathan-Lee-Nissim-Raskhodnikova-Smith `08].

4 Natural Two-Step Approach
1. Start with the โ€œbestโ€ non-private inference procedure ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› Population ๐‘ƒ, mean ๐œ‡ ๐‘‹ ๐‘€ ๐‘›๐‘ ๐‘‹ โˆ’๐œ‡ =ฮ˜ 1 ๐‘› 2. Approximate it as well as possible with a DP procedure ๐‘ฅ 1 ๐‘ฅ 2 โ‹ฎ ๐‘ฅ ๐‘› ๐‘Œ= ๐‘ฅ +Lap(.) ๐‘€ ๐‘‘๐‘ ๐‘Œโˆ’ ๐‘ฅ =ฮ˜ 1 ๐‘›

5 Privacy for Free? ๐‘€ ๐‘‘๐‘ ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘Œ= ๐‘‹ +Lap(.) ๐‘‹ ๐‘› ๐‘Œโˆ’๐œ‡ = 1+o 1 โ‹… ๐‘‹ โˆ’๐œ‡
Population ๐‘ƒ, mean ๐œ‡ ๐‘Œ= ๐‘‹ +Lap(.) ๐‘€ ๐‘‘๐‘ ๐‘Œโˆ’๐œ‡ = 1+o 1 โ‹… ๐‘‹ โˆ’๐œ‡

6 Limitations of Two-Step Approach
Asymptotics hides important parameters. ๐œŽ ๐‘› โ‰ซ ๐‘… ๐œ– ๐‘› only when ๐‘›โ‰ซ ๐‘… ๐œŽ๐œ– 2 , often huge! Some parameters (e.g. ๐œŽ= ฯƒ[๐‘ƒ]) may be unknown. Can draw wildly incorrect inferences at finite ๐‘› [โ€œDP killsโ€] Requiring ๐‘€ ๐‘‘๐‘ ๐‘ฅ โ‰ˆ ๐‘€ ๐‘›๐‘ ๐‘ฅ on worst-case inputs may be overkill (and even impossible), e.g. if range ๐‘… unbounded Optimal non-private procedure may not yield optimal differentially private procedure.

7 A Different Two-Step Approach
โ€œsummaryโ€ of dataset ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› Population ๐‘ƒ ๐‘Œ ๐‘ M T โ€œutilityโ€ ๐‘ˆ(๐‘ƒ,๐‘) DP mechanism Post-processing Naรฏve application runs into similar difficulties as before. [Feinberg-Rinaldo-Yang `11, Karwa-Slavkovic `12]. Approaches for addressing these problems in [Vu-Slavkovic `09, McSherry-Williiams `10, Karwa-Slavkovic `12, ...].

8 Take-Away Messages M Study overall design problem: ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘Œ ๐‘‹ ๐‘›
Population ๐‘ƒ ๐‘Œ M random sampling โ€œutilityโ€ ๐‘ˆ(๐‘ƒ,๐‘Œ) Study overall design problem: M differentially private [worst-case] Utility maximized [average-case over ๐‘‹=( ๐‘‹ 1 ,โ€ฆ, ๐‘‹ ๐‘› ), ๐‘Œ worst-case (frequentist) or average-case (Bayesian) over ๐‘ƒ] [Kasiviswanathan et al. `08, Dwork-Lei `09, Smith `10, Wasserman-Zhou `10, Hall-Rinaldo-Wasserman `13, Duchi-Jordan-Wainwright `12 & `13, Barber-Duchi `14,โ€ฆ ]

9 Take-Away Messages M 2. Ensure โ€œsoundnessโ€: ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘Œ ๐‘‹ ๐‘›
Population ๐‘ƒ ๐‘Œ M random sampling โ€œutilityโ€ ๐‘ˆ(๐‘ƒ,๐‘Œ) 2. Ensure โ€œsoundnessโ€: Prevent incorrect conclusions even at small ๐‘›. OK to declare โ€œfailureโ€. [Vu-Slakvovic `09, McSherry-Williiams `10, Karwa-Slavkovic `12, ...].

10 Example 1: Confidence Intervals [Karwa-Vadhan `17]
๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› ๐‘ƒ=๐‘(๐œ‡, ๐œŽ 2 ) ๐œ‡โˆˆ โˆ’๐‘…,๐‘… ๐œŽโˆˆ[ ๐œŽ ๐‘š๐‘–๐‘› , ๐œŽ ๐‘š๐‘Ž๐‘ฅ ] ๐ผโŠ†โ„ M Requirements: Privacy: ๐‘€ ๐œ–-differentially private. Coverage (โ€œsoundnessโ€): โˆ€ ๐‘›, ๐œ‡, ๐œŽ, ๐œ–, Pr ๐œ‡โˆˆ๐ผ โ‰ฅ .95. Goal: Length (โ€œutilityโ€): minimize E[|I|].

11 Example 1: Confidence Intervals [Karwa-Vadhan `17]
๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› ๐‘ƒ=๐‘(๐œ‡, ๐œŽ 2 ) ๐œ‡โˆˆ โˆ’๐‘…,๐‘… ๐œŽ known ๐ผโŠ†โ„ M Upper Bound: there is an ๐œ–-DP algorithm ๐‘€ achieving E[|๐ผ|] โ‰ค 2๐‘ง .975 โ‹… ๐œŽ ๐‘› + ๐œŽ ๐œ– โ‹… ๐‘‚ 1 ๐‘› non-private length provided that ๐‘›โ‰ณ ๐‘ ๐œ– log ๐‘… ๐œŽ

12 Example 1: Confidence Intervals [Karwa-Vadhan `17]
๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› ๐‘ƒ=๐‘(๐œ‡, ๐œŽ 2 ) ๐œ‡โˆˆ โˆ’๐‘…,๐‘… ๐œŽ known ๐ผโŠ†โ„ M Upper Bound: there is an ๐œ–-DP algorithm ๐‘€ achieving E[|๐ผ|] โ‰ค 2๐‘ง .975 โ‹… ๐œŽ ๐‘› + ๐œŽ ๐œ– โ‹… ๐‘‚ 1 ๐‘› Lower Bound: Must have either E[|๐ผ|] โ‰ฅ๐‘…/2 or both E[|๐ผ|] โ‰ฅ ๐œŽ ๐œ–๐‘› and ๐‘›โ‰ณ ๐‘ ๐œ– log ๐‘… ๐œŽ provided that ๐‘›โ‰ณ ๐‘ ๐œ– log ๐‘… ๐œŽ

13 Example 2: Hypothesis Testing [Vu-Slavkovic `09, Uhler-Slavkovic-Feinberg `13, Yu-Feinberg-Slavkovic-Uhler `14, Gaboardi-Lim-Rogers-Vadhan `16, Wang-Lee-Kifer `16, Kifer-Rogers `16] ๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› ๐‘ƒ distribution on X M 0 or 1 Requirements: Privacy: ๐‘€ ๐œ–-differentially private. Significance (Type I error): for all ๐‘›, ๐œ– if ๐‘ƒ= ๐ป 0 then Pr ๐‘€ ๐‘‹ =0 โ‰ฅ .95. Goal: Power (Type II error): if ๐‘ƒ โ€œfarโ€ from ๐ป 0 , then Pr ๐‘€ ๐‘‹ =1 โ€œlargeโ€

14 Example 2: Hypothesis Testing [Cai-Daskalakis-Kamath `17]
๐‘‹ 1 ๐‘‹ 2 โ‹ฎ ๐‘‹ ๐‘› ๐‘ƒ distribution on X M 0 or 1 Requirements: Privacy: ๐‘€ ๐œ–-differentially private. Significance (Type I error): for all ๐‘›, ๐œ–,๐›พ if ๐‘ƒ= ๐ป 0 then Pr ๐‘€ ๐‘‹ =0 โ‰ฅ .95. Goal: Power (Type II error): if ๐‘‘ ๐‘‡๐‘‰ ๐‘ƒ, ๐ป 0 โ‰ฅ๐›พ, then Pr ๐‘€ ๐‘‹ =1 โ‰ฅ.95.

15 Challenges for Future Research I
Study more sophisticated inference problems e.g. confidence intervals for multivariate gaussians with unknown covariance matrix (related to least-squares regression [Sheffet `16]) What asymptotics are acceptable? Much of statistical inference relies on calculating asymptotic distributions (e.g. via CLT); often reliable even when ๐‘› small. [Kifer-Rogers `17]: assume that ๐œ–=ฮฉ( 1 ๐‘› ).

16 Challenges for Future Research II
Can we rigorously analyze effect of privacy even when non-private algorithms donโ€™t have rigorous analyses? e.g. in hypothesis testing, privacy needs at most ๐‘‚( 1 ๐œ– ) blow-up in sample sizeโ€ฆ but this is suboptimal [Cai-Daskalakis-Kamath `17]. Lower Bounds Most existing techniques prove lower bounds on some kind of inference problems. We should explicitly state these! Does privacy have an inherent cost even when ๐‘› is large? e.g. must DP confidence intervals have length E[|๐ผ|] โ‰ฅ 2๐‘ง .975 โ‹… ๐œŽ ๐‘› +ฮฉ ๐œŽ ๐œ–๐‘› ?


Download ppt "Differential Privacy and Statistical Inference: A TCS Perspective"

Similar presentations


Ads by Google