1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim.

1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim Ben Gurion Adam Smith Penn State To appear in SICOMP special issue for FOCS ‘08 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A

Private Learning Goal: machine learning algorithms that protect the privacy of individual examples (people, organizations,...) Desiderata ‒ Privacy: Worst-case guarantee (differential privacy) ‒ Learning: Distributional guarantee (e.g., PAC learning) This work ‒ Characterize classification problems learnable privately ‒ Understand power of popular models for private analysis 2

What Can We Compute Privately? Prior work: Function evaluation [DiNi,DwNi,BDMN,EGS,DMNS,…] ‒ Statistical Query (SQ) Learning [Blum Dwork McSherry Nissim 05] ‒ Learning Mixtures of Gaussians [Nissim Raskhodnikova Smith 08] Mechanism design [McSherry Talwar 07] This work: PAC Learning (in general, not captured by function evaluation) Some subsequent work: Learning [Chaudhuri Monteleoni 08, McSherry Williams 09, Beimel Kasiwiswanathan Nissim 10, Sarwate Chaudhuri Monteleoni] Statistical inference [Smith 08, Dwork Lei 09, Wasserman Zhou 09] Synthetic data [Machanavajjhala Kifer Abowd Gehrke Vilhuber 08,Blum Ligget Roth 08, Dwork Naor Reingold Rothblum Vadhan 09, Roth Roughgarden 10] Combinatorial optimization [Gupta Liget tMcSherry Roth Talwar 10] 3 A “Tell me f(x)” f(x) + noise User

Our Results 1: What is Learnable Privately PAC* = PAC learnable with poly samples, not necessarily efficiently Privately PAC-learnable Parity PAC* PAC* = Private PAC* PAC-learnable SQ Halfplanes Conjunctions … 4

Basic Privacy Models 5 A xnxn x n-1  x3x3 x2x2 x1x1 R1R1 R2R2 xnxn x3x3 x2x2 x1x1 R3R3 R n-1 RnRn R1R1 R2R2 xnxn x n-1 x3x3 x2x2 x1x1 R3R3 R n-1 RnRn Centralized LocalNoninteractiveLocal (Interactive)  Most work in data mining “randomized response”, “input perturbation”, “Post Randomization Method” (PRAM), “Framework for High-Accuracy Strict-Privacy Preserving Mining” (FRAPP) [W65, AS00, AA01,EGS03,HH02,MS06] Advantages: ‒ private data never leaves person’s hands ‒ easy distribution of extracted information (e.g., CD, website)

Our Results 2: Power of Private Models PAC* = PAC learnable with poly samples, not necessarily efficiently Parity = PAC* Privately PAC-learnable Local Noninteractive Masked Parity Centralized = Nonadaptive SQ Local = SQ 6

Definition: Differential Privacy [DMNS06] Intuition: Users learn roughly the same thing about me whether or not my data is in the database. xnxn x n- 1 x3x3 x2x2 x1x1  Algorithm A A(x) xnxn x n- 1 x’ 3 x2x2 x1x1  Algorithm A A(x’) A randomized algorithm A is  -differentially private if  for all databases x, x’ that differ in one element  for all sets of answers S Pr[A(x)  S] ≤ e ε Pr[A(x’)  S] 7

Properties of Differential Privacy Composition: If algorithms A 1 and A 2 are ε -differentially private then the algorithm that outputs (A 1 (x),A 2 (x)) is 2 ε -differentially private Meaningful in the presence of arbitrary external information 8

Learning: An Example* Bank needs to decide which applicants are bad credit risks Goal: given sample of past customers (labeled examples), produce good prediction rule (hypothesis) for future loan applicants Reasonable hypotheses given this data: ‒ Predict YES iff (!Recent Delinquency) AND (% down > 5) ‒ Predict YES iff 100*(Mmp/inc) – (% down) < 25 *Example taken from Blum, FOCS03 tutorial % down Recent delinquency? High debt? Mmp/ inc 1010No 0.32 1010NoYes0.25 5YesNo0.30 20No 0.31 5No 0.32 10Yes 0.38 label z i example y i 9 Good Risk? Yes No Yes No

PAC Learning: The Setting Algorithm draws independent examples from some distribution P, labeled by some target function c. 10

PAC Learning: The Setting Algorithm outputs hypothesis h (a function from points to labels). 11

PAC Learning: The Setting Hypothesis h is good if it mostly agrees with target c: Pr y~P [h(y) ≠ c(y)] ≤ α. Require that h is good with probability at least 1- β. new point drawn from P 12 accuracyconfidence

* Definition. Algorithm A PAC learns concept class C if, for all c in C, all distributions P and all α, β in (0,1/2) Given poly(1/α,1/ β,size(c)) examples drawn from P, labeled by some c in C A outputs a good hypothesis (of accuracy α ) with probability  1- β in poly time PAC Learning Definition PAC Learning Definition [Valiant 84] A concept class C is a set of functions {c : D  {0, 1 }} together with their representation. * in poly time of poly length 13

% down Recent delinquency? High debt? Mmp/ inc 1010No 0.32 1010NoYes0.25 5YesNo0.30 20No 0.31 Good Risk? Yes No Yes Private Learning Input: Database: x = (x 1,x 2,…,x n ) where x i = (y i,z i ), where y i ~ P, z i = c(y i ) (z i is the label of example y i ) Output: a hypothesis e.g., Predict Yes if 100*(Mmp/inc) -(% down)<25 Algorithm A privately PAC learns concept class C if: ‒ Utility: Algorithm A PAC learns concept class C ‒ Privacy: Algorithm A is  -differentially private 25No 0.30Yes Average-case guarantee Worst-case guarantee 14

How Can We Design Private Learners? Previous privacy work focused on function approximation First attempt: View non-private learner as function to be approximated ‒ Problem: “Close” hypothesis may mislabel many points 15

PAC* = Private PAC* Theorem. Every PAC* learnable concept class can be learned privately, using a poly number of samples. Proof: Adapt exponential mechanism [MT07]: score(x,h) = # of examples in x correctly classified by hypothesis h Output hypothesis h from C with probability  e  score(x,h) ‒ may take exponential time Privacy: for any hypothesis h, Pr[h is output on input x] Pr[h is output on input x’] e   score(x,h) ∑ h e   score(x’,h) e   score(x’,h) ∑ h e   score(x,h) score(x,h)=3 = 16 ● ≤ e 2  score(x,h)=4

PAC* = Private PAC* Theorem. Every PAC* learnable concept class can be learned privately, using a poly number of samples. Proof: score(x,h) = # of examples in x correctly classified by h Output hypothesis h from C with probability  e  score(x,h) Utility (learning): Best hypothesis correctly labels all examples: Pr[h]  e  n Bad hypotheses mislabel > α fraction of examples: Pr[h]  e  (1- α )n e  (1- α )n ∙(# bad hypothesis) |C| e  n e  α n Sufficient to ensure n  log|C| ∙ poly(1/ ε, 1/α,1/β). Then w/ probability  1- β, output h labels  1- α fraction of examples correctly. “Occam’s razor”: If n  log|C| ∙ poly(1/α,1/β) then h does well on examples  it does well on distribution P 17 Private version of “Occam’s razor” Pr[output h is bad] ≤ ≤ ≤ β

Our Results: What is Learnable Privately Note: Parity with noise is thought to be hard Private PAC ≠ learnable with noise Privately PAC-learnable Parity PAC* PAC* = Private PAC* PAC-learnable SQ Halfplanes Conjunctions … 18 [BDMN05]

Efficient Learner for Parity Parity Problems Domain: D = {0,1} d Concepts: c r (x) = (mod 2) Input: x =((y 1,c r (y 1 )),….., (y n,c r (y n ))) Each example (y i,c r (y i )) is a linear constraint on r ‒ (1101,1) translates to r 1 + r 2 + r 4 (mod 2) =1 Non-private learning algorithm: ‒ Find r by solving the set of linear equations over GF(2) imposed by input x 19

The Effect of a Single Example Let V i be space of feasible solutions for the set of equations imposed by (y 1,c r (y 1 )),…,(y i,c r (y i )) Add a fresh example (y i+1,c r (y i+1 )) ‒ Consider the new solution space V i+1 Then ‒ |V i+1 | ≥ |V i |/2, or ‒ |V i+1 | = 0 ( system becomes inconsistent) The solution space changes drastically only when the non- private learner fails 100 000 001 101 101 new constraint: second coordinate is 1 new constraint: third coordinate is 0 20 100 000

Private Learner for Parity Algorithm A 1. With probability ½ output “fail”. 2. Construct x s by picking each example from x with probability . 3. Solve the system of equations imposed by examples in x s. ‒ Let V be the set of feasible solutions. 4. If V = Ø, output “fail”. Otherwise, r in V uniformly at random; output c r. 21 Smooths out extreme jumps in V i Lemma [utility]. Our algorithm PAC-learns parity with n = O((non-private-sample-size)/ε) Proof idea: Conditioned on passing step 1, get the same utility as with εn examples. By repeating a few times, pass step 1 w.h.p.

Private Learner for Parity Lemma. Algorithm A is 4ε-differentially private. Proof: For inputs x and x’ that differ in position i, show that for all outcomes probabilities go up/down by  4ε. Changed input x i enters the sample with probability ε. Probability of “fail” goes up or down by  ε/2. ≤ ≤ 1+ε as Pr[A(x’) fails]  1/2. For hypothesis r: =  2 ε/(1- ε) + 1  4ε+1 for ε  ½ Only one example is changed: = ≤ 2 22 Pr[A(x) fails] Pr[A(x’) fails] + ε /2 Pr[A(x’) fails] Pr[A(x) = r] ε Pr[A(x) = r | i  S] + (1- ε )Pr[A(x) = r | i  S] Pr[A(x’) = r] ε Pr[A(x’) = r | i  S] + (1- ε )Pr[A(x’) = r | i  S] Pr[A(x) = r | i  S] Pr[A(x’) = r | i  S] Pr[A(x) = r | i  S] 0 

Our Results: What is Learnable Privately Note: Parity with noise is thought to be hard Private PAC ≠ learnable with noise Privately PAC-learnable Parity PAC* PAC* = Private PAC* PAC-learnable SQ Halfplanes Conjunctions … 23 [BDMN05]

Our Results 2: Power of Private Models PAC* = PAC learnable ignoring computational efficiency = PAC* Local Noninteractive Centralized = Nonadaptive SQ Local = SQ 24

Interactive Non-interactive Reminder: Local Privacy Preserving Protocols User xnxn x n-1  x3x3 x2x2 x1x1 R1R1 R2R2 RnRn R3R3 … 25

Same guarantees as PAC model, but algorithm no longer has access to individual examples Theorem [BDMN05]. Any SQ algorithm can be simulated by a private algorithm Proof: [DMNS06] Perturb query answers using Laplace noise. SQ Oracl e Algorithm g: D × {0,1}  {0,1} E y~P [g(y,c(y))=1] ±  Probability that a random labeled example (~ P) satisfies g τ > 1 /poly(...) g can be evaluated in poly time poly running time Statistical Query (SQ) Learning [Kearns 93] 26

(Non-interactive) Local =(Non-adaptive) SQ Theorem. Any (non-adaptive) SQ algorithm can be simulated by a (non-interactive) local algorithm Local protocol for SQ: ‒ For each i, compute bit R(x i ) ‒ Sum of noisy bits allows approximation to answer Participants can compute noisy bits on their own R (applied by each participant) is differentially private If all SQ queries are known in advance (non-adaptive), the protocol is non-interactive User R R R A 27

(Non-interactive) Local =(Non-adaptive) SQ Theorem. Any (non-interactive) local algorithm can be simulated by a (non-adaptive) SQ algorithm. Technique: Rejection sampling Proof idea [non-interactive case]: To simulate randomizer R: D  W on entry z i, need to output w in W with probability p(w)=Pr z~P [R(z)=w]. Let q(w)=Pr z~P [R(0)=w] 1. Sample w from q(w) 2. With probability p(w)/(q(w)e ε ), output w 3. With the remaining probability repeat from (1). Use SQ queries to estimate p(w). 28

Our Results 2: Power of Private Models PAC* = PAC learnable ignoring computational efficiency Parity = PAC* Local Noninteractive Masked Parity Centralized = Nonadaptive SQ Local = SQ 29

Non-interactive Local ⊊ Interactive Local Masked Parity Problems Concepts: c r,a : {0,1} d+log d+1  {+1,-1} indexed by r  {0,1} d and a  {0,1} c r,a (y,i,b) = (Adaptive) SQ learner: Two rounds of communication Non-adaptive SQ learner: Needs  2 d -1 samples ‒ Proof uses Fourier analytic argument similar to proof that parity is not in SQ 30 (-1) r y (mod 2) + a if b=0 (-1) r i if b=1

Summary PAC* is privately learnable ‒ Non-efficient learners Known problems in PAC are efficiently privately learnable ‒ Parity ‒ SQ [BDMN05] ‒ What else is in PAC? Equivalence of local model and SQ: ‒ Local = SQ ‒ Local non-interactive = non-adaptive SQ Interactivity helps in local model ‒ Local non-interactive ⊊ Local ‒ SQ non-adaptive ⊊ SQ Open questions Separate efficient learning from efficient private learning Better private algorithms for SQ problems Other learning models 31

Ad for [Beimel Kasiwiswanathan Nissim 10] Sample Complexity of Learning: Private vs. Nonprivate Non-private: Θ(VCDIM(C)) Private learning: O(log |C|) for all concept classes C Question: Is this gap essential? Answer: Yes, for proper private learning There is a concept class C with VCDIM =1:  Proper non-private learning of C: O(1) samples  Proper private learning of C: Θ(log |C|)) samples  Improper private learning of C: O(1) samples Note: This separates proper and improper private learning 32

1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim.

Similar presentations

Presentation on theme: "1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim.

Similar presentations

Presentation on theme: "1 What Can We Learn Privately? Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan Los Alamos Homin Lee UT Austin Kobbi Nissim."— Presentation transcript:

Similar presentations

About project

Feedback