Presentation is loading. Please wait.

Presentation is loading. Please wait.

Model Selection. Outline Motivation Overfitting Structural Risk Minimization Cross Validation Minimum Description Length.

Similar presentations


Presentation on theme: "Model Selection. Outline Motivation Overfitting Structural Risk Minimization Cross Validation Minimum Description Length."— Presentation transcript:

1 Model Selection

2 Outline Motivation Overfitting Structural Risk Minimization Cross Validation Minimum Description Length

3 Motivation: Suppose we have a class of infinite Vcdim We have too few examples How can we find the best hypothesis Alternatively, Usually we choose the hypothesis class How should we go about doing it?

4 Overfitting Concept class: Intervals on a line Can classify any training set Zero training error: The only goal?!

5 Overfitting: Intervals Can always get zero error Are we interested?! Recall Occam Razor!

6 Overfitting: Intervals

7 Overfitting Simple concept plus noise A very complex concept –insufficient number of examples + noise 1/3

8 Theoretical Model Nested Hypothesis classes –H 1  H 2  H 3  …  H i  –Let VC-dim(H i )=I –For simplicity |H i | = 2 i There is a target function c(x), –For some i, c  H i –e(h) = Pr [ h  c] –e i = min h  Hi e(h) –e * = min i e i

9 Theoretical Model Training error –obs(h) = Pr [ h  c] –obs i = min h  Hi obs(h) Complexity of h –d(h) = min i {h  H i } Add a penalty for d(h) minimize: obs(h)+penalty(h)

10 Structural Risk Minimization Penalty based. Chose the hypothesis which minimizes: –obs(h)+penalty(h) SRM penalty:

11 SRM: Performance THEOROM –With probability 1-  –h * : best hypothesis –g * : SRM choice –e(h * )  e(g*)  e(h * )+ 2 penalty(h * ) Claim: The theorem is “tight” –H i includes 2 i coins

12 Proof Bounding the error in H i Bounding the error across H i

13 Cross Validation Separate sample to training and selection. Using the training –Select from each H i a candidate g i Using the selection sample –select between g 1, …,g m The split size –(1-  )m training set –  m selection set

14 Cross Validation: Performance Errors –e cv (m), e A (m) Theorem: with probability 1-  Is CV always near-optimal ?!

15 Minimum Description length Penalty: size of h Related to MAP –size of h: log(Pr[h]) –errors: log(Pr[D|h])


Download ppt "Model Selection. Outline Motivation Overfitting Structural Risk Minimization Cross Validation Minimum Description Length."

Similar presentations


Ads by Google