Download presentation
Presentation is loading. Please wait.
1
Model Selection
2
Outline Motivation Overfitting Structural Risk Minimization Cross Validation Minimum Description Length
3
Motivation: Suppose we have a class of infinite Vcdim We have too few examples How can we find the best hypothesis Alternatively, Usually we choose the hypothesis class How should we go about doing it?
4
Overfitting Concept class: Intervals on a line Can classify any training set Zero training error: The only goal?!
5
Overfitting: Intervals Can always get zero error Are we interested?! Recall Occam Razor!
6
Overfitting: Intervals
7
Overfitting Simple concept plus noise A very complex concept –insufficient number of examples + noise 1/3
8
Theoretical Model Nested Hypothesis classes –H 1 H 2 H 3 … H i –Let VC-dim(H i )=I –For simplicity |H i | = 2 i There is a target function c(x), –For some i, c H i –e(h) = Pr [ h c] –e i = min h Hi e(h) –e * = min i e i
9
Theoretical Model Training error –obs(h) = Pr [ h c] –obs i = min h Hi obs(h) Complexity of h –d(h) = min i {h H i } Add a penalty for d(h) minimize: obs(h)+penalty(h)
10
Structural Risk Minimization Penalty based. Chose the hypothesis which minimizes: –obs(h)+penalty(h) SRM penalty:
11
SRM: Performance THEOROM –With probability 1- –h * : best hypothesis –g * : SRM choice –e(h * ) e(g*) e(h * )+ 2 penalty(h * ) Claim: The theorem is “tight” –H i includes 2 i coins
12
Proof Bounding the error in H i Bounding the error across H i
13
Cross Validation Separate sample to training and selection. Using the training –Select from each H i a candidate g i Using the selection sample –select between g 1, …,g m The split size –(1- )m training set – m selection set
14
Cross Validation: Performance Errors –e cv (m), e A (m) Theorem: with probability 1- Is CV always near-optimal ?!
15
Minimum Description length Penalty: size of h Related to MAP –size of h: log(Pr[h]) –errors: log(Pr[D|h])
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.