Download presentation
Presentation is loading. Please wait.
2
Bayesian regularization of learning Sergey Shumsky NeurOK Software LLC
3
Scientific methods Induction F.Bacon Machine Models Data Deduction R.Descartes Math. modeling
4
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
5
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
6
Problem statement Learning is inverse, ill-posed problem Model Data Learning paradoxes Infinite predictions Finite data? How to optimize future predictions? How to select regular from casual in data? Regularization of learning Optimal model complexity
7
Well-posed problem Solution is unique Solution is stable Hadamard (1900-s) Tikhonoff (1960-s)
8
Learning from examples Problem: Find hypothesis h, generating observed data D in model H Well defined if not sensitive to: noise in data (Hadamard) learning procedure (Tikhonoff)
9
Learning is ill-posed problem Example: Function approximation Sensitive to noise in data Sensitive to learning procedure
10
Learning is ill-posed problem Solution is non-unique
11
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
12
Problem regularization Main idea: restrict solutions – sacrifice precision to stability How to choose?
13
Statistical Learning practice Data Learning set + Validation set Cross-validation: Systematic approach to ensembles Bayes + + … +
14
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
15
Statistical Learning theory Learning as inverse Probability Probability theory. H: h D Learning theory. H: h D H Bernoulli (1713) Bayes (~ 1750)
16
Bayesian learning Evidence Prior Posterior
17
Coin tossing game H
18
Monte Carlo simulations
19
Bayesian regularization Most Probable hypothesis Learning error Example: Function approximation Regularization
20
Minimal Description Length Most Probable hypothesis Code length for:Data hypothesis Example: Optimal prefix code 01 11 10 110 111 Rissanen (1978)
21
Data Complexity Complexity K ( D |H ) = min L ( h, D|H ) Code length L(h,D) = coded data L(D|h) + decoding program L(h) Data D Decoding Kolmogoroff (1965)
22
Complex = Unpredictable Prediction error ~ L ( h,D ) / L ( D ) Random data is uncompressible Compression = predictability Program h: length L(h,D) Data D Decoding Example: block coding Solomonoff (1978)
23
Universal Prior All 2 L programs with length L are equiprobable Data complexity Solomonoff (1960) Bayes (~1750) L(h,D) D H
24
Statistical ensemble Shorter description length Proof: Corollary: Ensemble predictions are superior to most probable prediction
25
Ensemble prediction
26
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
27
Model comparison Evidence Posterior
28
Statistics: Bayes vs. Fisher Fisher: max Likelihood Bayes: max Evidence
29
Historical outlook 20 – 60s of ХХ century Parametric statistics Asymptotic N 60 - 80s of ХХ century Non-Parametric statistics Regularization of ill-posed problems Non-asymptotic learning Algorithmic complexity Statistical physics of disordered systems Fisher (1912) Chentsoff (1962) Tikhonoff (1963) Vapnik (1968) Kolmogoroff (1965) Gardner (1988)
30
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
31
Statistical physics Probability of hypothesis - microstate Optimal model - macrostate
32
Free energy F = - log Z: Log of Sum F = E – TS: Sum of logs P = P{L}
33
EM algorithm. Main idea Introduce independent P: Iterations E-step: М-step:
34
EM algorithm Е-step Estimate Posterior for given Model М-step Update Model for given Posterior
35
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
36
Bayesian regularization: Examples Hypothesis testing Function approximation Data clustering y h y h(x)h(x) x P(x|H)P(x|H) x
37
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
38
Hypothesis testing Problem Noisy observations: y Is theoretical value h 0 true? Model H: Gaussian noise Gaussian prior y h0h0
39
Optimal model: Phase transition Confidence finite infinite
40
Threshold effect Student coefficient Hypothesis h 0 is true Corrections to h 0 y P(h)P(h) y h P(h)P(h)
41
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
42
Function approximation Problem Noisy data: y (x ) Find approximation h(x) Model: Noise Prior y h(x)h(x) x
43
Optimal model Free energy minimization
44
Saddle point approximation Function of best hypothesis
45
ЕМ learning Е-step. Optimal hypothesis М-step. Optimal regularization
46
Laplace Prior Pruned weights Equisensitive weights
47
Laplace regularization Е-step. Weights estimation М-step. Regularization
48
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
49
Clustering Problem Noisy data: x Find prototypes (mixture density approximation) How many clusters? Модель: Noise P(x|H)P(x|H) x
50
Optimal model Free energy minimization Iterations E-step: М-step:
51
ЕМ algorithm Е-step: М-step:
52
How many clusters? Number of clusters M( ) Optimal number of clusters h(m)h(m) 1/1/
53
Simulations: Uniform data Optimal model M
54
Simulations: Gaussian data Optimal model M 0 1020304050 -12.5 -12 -11.5 -11 -10.5 -10 -9.5
55
Simulations: Gaussian mixture Optimal model M
56
Outline Learning as ill-posed problem General problem: data generalization General remedy: model regularization Bayesian regularization. Theory Hypothesis comparison Model comparison Free Energy & EM algorithm Bayesian regularization. Practice Hypothesis testing Function approximation Data clustering
57
Summary Learning Ill-posed problem Remedy: regularization Bayesian learning Built-in regularization (model assumptions) Optimal model = minimal Description Length = minimal Free Energy Practical issues Learning algorithms with built-in optimal regularization - from first principles (opposite to cross validation)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.