Download presentation
Presentation is loading. Please wait.
Published byJerome Bailey Modified over 9 years ago
1
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A AAA
2
The Power of Small, Private, Miracles TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A AAA Joint work with Guy Rothblum and Salil Vadhan
3
Boosting [Schapire, 1989] General method for improving accuracy of any given learning algorithm Example: Learning to recognize spam e-mail “Base learner” receives labeled examples, outputs heuristic Labels are {+1, -1} Run many times; combine the resulting heuristics
4
Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D Terminate?
5
Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D How ? Terminate?
6
Boosting for People [Variant of AdaBoost, FS95] Initial distribution D is uniform on database rows S is always a subset of k elements drawn from D k Combiner is majority Weight update: If correctly classified by current A, decrease weight by factor of e “subtract 1 from exponent” If incorrectly classified by current A, increase weight by factor of e “add 1 to exponent” Re-normalize to obtain updated D
7
Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(- s c s (i)) s N s D t+1 (i) = (1/m) exp (- s c s (i)) i s N s D t+1 (i) = (1/m) i exp (- s c s (i)) s N s = (1/m) i exp (- s c s (i)) A t (i) correct?
8
Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(- s c s (i)) s N s D t+1 (i) = (1/m) exp (- s c s (i)) i s N s D t+1 (i) = (1/m) i exp (- s c s (i)) s N s = (1/m) i exp (- s c s (i)) A t (i) correct?
9
Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(- s c s (i)) s N s D t+1 (i) = (1/m) exp (- s c s (i)) i s N s D t+1 (i) = (1/m) i exp (- s c s (i)) s N s = (1/m) i exp (- s c s (i)) A t (i) correct?
10
Why Does it Work? Update rule: multiply weight by exp(-c t (i)) D t+1 (i) = [D t (i) exp(-c t (i)] / N t N t D t+1 (i) = D t (i) exp(-c t (i)) N t N t-1 …N 1 D t+1 (i) = D 1 (i)exp(- s c s (i)) s N s D t+1 (i) = (1/m) exp (- s c s (i)) i s N s D t+1 (i) = (1/m) i exp (- s c s (i)) s N s = (1/m) i exp (- s c s (i)) A t (i) correct?
11
s N s = (1/m) i exp (- s c s (i)) s N s is shrinking exponentially (depends on ´ ) Normalizers are sums of weights; At start of each round these sum to 1 “more” decrease (because the base learner is good) than increase More weight has the exponent shrink than otherwise i exp (- s c s (i)) = i exp (- y i s A s (i)) This is an upper bound on # of incorrectly classified examples: If y i ≠ sign[ s A s (i)] ( = majority{A 1 (i), A 2 (i),…}), then y i s A s (i) < 0, so exp(-y i s A s (i)) ≥ 1. Therefore, the number of incorrectly classified examples is exponentially small in t
12
-1/+1 renormalize Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … majority Initially: D uniform on DB rows Does well on ½ + ´ of D Privacy? Terminate?
13
Private Boosting for People Base learner must be differentially private Main concern is rows whose weight grows too large Affects termination test, sampling, re-normalizing Similar to problem arising when learning in the presence of noise Similar solution: smooth boosting Remove (give up on) elements that become too heavy Carefully! Removing one heavy element and re-normalizing may cause another element to become heavy… Ensure this is rare (else give up on too many elements; hurt accuracy)
14
Iterative Smoothing Not today.
15
Boosting for Queries? Goal: Given database DB and a set Q of low-sensitivity queries, produce an object O (eg, synthetic database) such that 8 q 2 Q : can extract from O an approximation of q(DB). Assume existence of ( ² 0, ± 0 )-dp Base Learner producing an object O that does well on more than half of D Pr q » D [ |q( O ) – q(DB)| (1/2 + ´ )
16
Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Initially: D uniform on Q Does well on ½ + ´ of D
17
-1/+1 renormalize Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … median Initially: D uniform on Q Does well on ½ + ´ of D Privacy? Individual can affect many queries at once! Terminate?
18
Privacy is Problematic In smooth boosting for people, at each round an individual has only a small effect on the probability distribution In boosting for queries, an individual can affect the quality of q(A t ) simultaneously for many q As time progresses, distributions on neighboring databases could evolve completely differently, yielding very different A t ’s Slightly ameliorated by sampling (if only a few samples, maybe can avoid the q’s on the edge?) How can we make the re-weighting less sensitive?
19
Private Boosting for Queries [Variant of AdaBoost] Initial distribution D is uniform on queries in Q S is always a set of k elements drawn from Q k Combiner is median [viz. Freund92] Weight update for queries If very well approximated by A t, decrease weight by factor of e (“-1”) If very poorly approximated by A t, increase weight by factor of e (“+1”) In between, scale with distance of midpoint (down or up): 2 ( |q(DB) – q(A t )| - ( ¸ + ¹ /2) ) / ¹ (sensitivity 2 ½ / ¹ ) +
20
Theorem (minus some parameters) Let all q 2 Q have sensitivity · ½. Run the query-boost algorithm for T = log | Q |/ ´ 2 rounds with ¹ = ((log | Q |/ ´ 2 ) 2 ½ √k ) / ² The resulting object Q is ( ( ² + T ² 0 ), T ± 0 ) )-dp and, whp, gives ( ¸ + ¹ )-accurate answers to all the queries in Q. Better privacy (small ² ) gives worse utility (larger ¹ ) Better base learner (smaller k, larger ´ ) helps
21
Proving Privacy Technique #1: Pay Your Debt and Move On Fix A 1, A 2, …, A t (record D vs D’ confidence gain) “Pay Your Debt” Focus on gain in selection of S 2 Q k in round t+1 “Move On” Based on distributions D t+1 and D’ t+1 determined in round t Will call them D, D’ Technique #2: Evolution of Confidence [DiDwN03] “Delay Payment Until Final Reckoning” Choose q 1, q 2, …, in turn For each q 2 Q, bound |ln ( D[q] / D’[q] )| and expectation | E q »D ln ( D[q ] / D’[q] )| Pr q1,…,qk [| i ln ( D[q i ] / D’[q i ] )| > z√k (A + B) + k B] < exp(-z 2 /2) A B
22
Bounding E q »D ln ( P[q ] / P’[q] ) Assume D, D’ are A-dp wrt one another, for A < 1. Then 0 · E q » D ln[ D(q)/D’(q) ] · 2A 2 (that is, B · 2A 2 ). KL(D||D’) = q ln[ D(q)/D’(q) ] D(q); always ¸ 0 So, KL(D||D’) · KL(D||D’) + KL(D’||D) = q D(q) ( ln[ D(q)/D’(q) ] + ln[ D’(q)/D(q) ] ) + (D’(q)-D(q)) ln[ D’(q)/D(q) ] · q 0 + |D’(q)-D(q)| A = A q [ max (D(q),D’(q)) - min (D(q),D’(q)) ] · A q e A min (D(q),D’(q)) - min (D(q),D’(q)) · A q (e A – 1) min (D(q),D’(q)) · 2A 2 when A < 1 Compare DiDwN03
23
Motivation and Application Boosting for People Logistic Regression for 3000+ dimensional data Slight twist on CM did pretty well (eps = 1.5) Thought about alternatives Boosting for Queries Reducing the dependence on the concept class in the work on synthetic databases in DNRRV09 (Salil’s talk) Over-interpreted the polytime DiNi style attacks (we were spoiled) Can’t have cn queries with error o(√n) BLR08: can have cn queries with error O(n 2/3 ) DNNRV09: O(n 1/2 | Q | o(1) ) Now: O(n 1/2 log 2 | Q |) Result is more general Only know of base learner for counting queries
24
Base Learner S: Labeled examples from D A 1, A 2, … Update D A Combine A 1, A 2, … Does well on ½ + ´ of D Terminate?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.