Download presentation
Presentation is loading. Please wait.
Published byLiv Abrahamsen Modified over 5 years ago
1
Generalization bounds for uniformly stable algorithms
Vitaly Feldman Brain with Jan Vondrak
2
Uniform stability Uniform stability [Bousquet,Elisseeff 02]:
Domain π (e.g. πΓπ) Dataset π= π§ 1 ,β¦, π§ π β π π Learning algorithm π΄: π π βπ Loss function β:πΓπβ β + Uniform stability [Bousquet,Elisseeff 02]: π΄ has uniform stability πΎ w.r.t. β if For all neighboring π, π β² ,π§βπ β π΄ π ,π§ ββ π΄ π β² ,π§ β€πΎ
3
Generalization error Probability distribution π over π, πβΌ π π
Population loss: π π β π€ = π π§βΌπ β π€,π§ Empirical loss: π π β π€ = 1 π π=1 π β π€, π§ π Generalization error/gap: Ξ π β π΄ = π π β π΄ π β π π β π΄ π
4
Stochastic convex optimization
π= πΉ 2 π 1 β π€ π€ 2 β€1} For all π§βπ, β(π€,π§) is convex 1-Lipschitz in π€ Minimize π π β π€ β π π§βΌπ [β(π€,π§)] over π πΏ β = min π€βπ π π β π€ For all π, π΄ being SGD with rate π=1/ π : Pr πβΌ π π π π β π΄(π) β₯ πΏ β +π log 1/πΏ π β€πΏ Uniform convergence error: β³ π π ERM might have generalization error: β³ π π [Shalev-Shwartz,Shamir,Srebro,Sridharan β09; Feldman 16]
5
Stable optimization Strongly convex ERM [BE 02, SSSS 09]
π΄ π (π)=argmi n π€βπ π π β π€ + π 2 π€ 2 2 π΄ π is 1 ππ uniformly stable and minimizes π π β π€ within π 2 Gradient descent on smooth losses [Hardt,Recht,Singer 16] π΄ π (π) : π steps of GD on π π β π€ with π= 1 π π΄ π is π π uniformly stable and minimizes π π β π€ within 2 π
6
Generalization bounds
For π΄,β w/ range 0,1 and uniform stability πΎβ 1 π ,1 1 π π πβΌ π π Ξ π β(π΄ ) β€πΎ [Rogers,Wagner 78] ππ« πβΌ π π Ξ π β(π΄) β₯πΎ π log 1/πΏ β€πΏ [Bousquet,Elisseeff 02] Vacuous when πΎβ₯1/ π ππ« πβΌ π π Ξ π β(π΄) β₯ πΎ log 1/πΏ β€πΏ NEW!
7
Comparison [BE 02] This work π=100 Generalization error 1 π Ξ³
8
Second moment π πβΌ π π Ξ π β(π΄) 2 β€ πΎ 2 + 1 π TIGHT! NEW!
π πβΌ π π Ξ π β(π΄) 2 β€ πΎ π Chebyshev: ππ« πβΌ π π Ξ π β π΄ β₯ πΎ+1/ π πΏ β€πΏ NEW! TIGHT! 1 π Ξ³ Generalization error Previously [Devroye,Wagner 79; BE 02] π πβΌ π π Ξ π β(π΄) 2 β€πΎ+ 1 π [BE 02] This work
9
Implications Differentially-private prediction [Dwork,Feldman 18]
ππ« πβΌ π π π π β π΄(π) β₯ πΏ β +π 1 πΏ 1/4 π β€πΏ ππ« πβΌ π π π π β π΄(π) β₯ πΏ β +π log 1/πΏ π 1/3 β€πΏ For any loss πΏ:πΓπβ[0,1], π π΄ [πΏ(π΄ π,π₯ ,π¦)] has uniform stability π π β1βπ Stronger generalization bounds for DP prediction algorithms Differentially-private prediction [Dwork,Feldman 18] π=πΓπ. A randomized algorithm π΄(π,π₯) has π-DP prediction if for all π, π β² ,π₯βπ π· β π΄ π,π₯ ||π΄ π β² ,π₯ β€π
10
Data-dependent functions
Consider π: π π Γπββ . E.g. π π,π§ β‘β(π΄ π ,π§) π has uniform stability πΎ if For all neighboring π, π β² ,π§βπ π π,π§ βπ π β² ,π§ β€πΎ π π,β
βπ π β² ,β
β β€πΎ Generalization error/gap: Ξ π π = π π π π β π π π π
11
Generalization in expectation
Goal: π πβΌ π π π π π(π) β π π π(π) β€πΎ π πβΌ π π π π π(π) = π πβΌ π π ,πβΌ[π] π(π, π§ π ) For all π and π§βπ, π π, π§ π β€π π πβπ§ , π§ π +πΎ π π, π§ π β€ π π§βΌπ π π πβπ§ , π§ π +πΎ π πβΌ π π ,πβΌ[π] π(π, π§ π ) β€ π πβΌ π π , π§βΌπ,πβΌ π π π πβπ§ , π§ π +πΎ = π πβΌ π π ,π§βΌπ π π,π§ +πΎ = π πβΌ π π π π π(π) +πΎ π πβπ§ , π§ π β‘ π π+1 β‘ π,π§
12
Concentration via McDiarmid
For all neighboring π, π β² Ξ π π β Ξ π β² π β€2πΎ+ 1 π 1. π π§βΌπ π π,π§ β π π§βΌπ π πβ²,π§ β€πΎ 2. π π π π β π π β² π πβ²,π§ β€ π π π π β π π π π β² ,π§ + π π π π β² β π π β² π π β² ,π§ β€πΎ+ 1 π McDiarmid: ππ« πβΌ π π Ξ π π β₯π+ 2πΎ+ 1 π π log 1/πΏ β€πΏ where π= π πβΌ π π Ξ π π β€πΎ
13
Proof technique UNSTABLE!
Based on [Nissim,Stemmer 15; BNSSSU 16] Let π 1 ,β¦, π π ~ π π for π= ln 2 πΏ Need to bound π π 1 ,β¦, π π ~ π π ma x πβ π Ξ π π π π π 1 ,β¦, π π ~ π π , β=argma x π { Ξ π π π } π π β π π β β? π π 1 ,β¦, π π ~ π π , β=argma x π { Ξ π π π } π π π π β Let π be a distribution over β: Pr π£βΌπ π£β₯2 π π£ 1 ,β¦, π£ π βΌπ max 0, π£ 1 ,β¦, π£ π β€ ln 2 π UNSTABLE!
14
Stable max Exponential mechanism [McSherry,Talwar 07]
EM πΌ : sample πβ π πΌ Ξ π π π Stable: 2πΌ 2πΎ+ 1 π - differentially private π π 1 ,β¦, π π ~ π π ,β=E M πΌ π π β π π β β€ π π 1 ,β¦, π π ~ π π ,β=E M πΌ π π π π β exp 2πΌ 2πΎ+ 1 π β1 +πΎ Approximates max π π 1 ,β¦, π π ~ π π ,β=E M πΌ Ξ π β π β₯ π π 1 ,β¦, π π ~ π π ma x πβ π Ξ π π π β ln π πΌ
15
Game over π π 1 ,β¦, π π ~ π π ma x πβ π Ξ π π π β€ ln π πΌ + exp 2πΌ 2πΎ+ 1 π β1 +πΎ Pick πΌ= ln π πΎ+1/π get π πΎ+ 1 π ln 1 πΏ Let π be a distribution over β: Pr π£βΌπ π£β₯2 π π£ 1 ,β¦, π£ π βΌπ max 0, π£ 1 ,β¦, π£ π β€ ln 2 π
16
Conclusions Better understanding of uniform stability New technique
Open Gap between upper and lower bounds High probability generalization without strong uniformity
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.