MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

MACHINE LEARNING 3. Supervised Learning

Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Class C of a “family car”  Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car?  Output:  Positive (+) and negative (–) examples  Input representation:  Expert suggestions  x 1 : price, x 2 : engine power  Ignore other attributes

Training set X Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3

Class C Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Assume class model (rectangle) (p1 ≤ price ≤ p2) & (e1 ≤ engine power ≤ e2)

S, G, and the Version Space Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent and make up the version space (Mitchell, 1997)

Hypothesis class H 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Error of h on H

Generalization Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7  Problem of generalization: how well our hypothesis will correctly classify future examples  In our example: hypothesis is characterized by 4 numbers (p1,p2,e1,e2)  Choose the best one  Include all positive and none negative  Infinitely many hypothesis for real-valued parameters

Doubt Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8  In some applications, a wrong decision is very costly  May reject an instance if fall between S (most specific) and G (most general)

VC Dimension Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Assumed that H (hypothesis space) includes true class C  H should be flexible enough or have enough capacity to include C  Need some measure of hypothesis space “flexibility” complexity  Can try to increase complexity of hypothesis space

VC Dimension Based on for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  N points can be labeled in 2 N ways as +/–  H shatters N if there exists h  H consistent for any of these: VC( H ) = N  An axis-aligned rectangle shatters 4 points only !

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11  Fix a probability of target classification error (planned future)  Actual error depends on training sample(past)  Want the actual probability error(actual future) be less than a target with high probability Probably Approximately Correct (PAC) Learning

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12  How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989)  Let’s calculate how many samples wee need for S  Each strip is at most ε /4  Pr that we miss a strip 1 ‒ ε /4  Pr that N instances miss a strip (1 ‒ ε /4) N  Pr that N instances miss 4 strips 4(1 ‒ ε /4) N  1-4(1 ‒ ε /4) N >1- δ and (1 ‒ x)≤exp( ‒ x)  4exp( ‒ ε N/4) ≤ δ and N ≥ (4/ ε )log(4/ δ )

Noise Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Imprecision in recording the input attributes  Error in labeling data points (teacher noise)  Additional attributes not taken into account (hidden or latent)  Same price/engine with different label due to a color  Effect of this attributes modeled as a noise  Class boundary might be not simple  Need more complicated hypothesis space/model

Noise and Model Complexity Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Use the simpler one because  Simpler to use (lower computational complexity)  Easier to train (lower space complexity)  Easier to explain (more interpretable)  Generalizes better (lower variance - Occam’s razor)

Occam Razor Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  If actual class is simple and there is mislabeling or noise, the simpler model will generalized better  Simpler model result in more errors on training set  Will generalized better, won’t try to explain noise in training sample  Simple explanations are more plausible

Multiple Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  General case K classes  Family, Sport, Luxury cars  Classes can overlap  Can use different/same hypothesis class  Fall into two classes? Sometimes worth to reject

Multiple Classes, C i i=1,...,K Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Train hypotheses h i (x), i =1,...,K:

Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18  Output is not Boolean (yes/no) or label but numeric value  Training Set of examples  Interpolation: fit function (polynomial)  Extrapolation: predict output for any x  Regression : added noise  Assumption: hidden variables  Approximate output by model: g(x)

Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Empirical error on training set  Hypothesis space is linear functions  Calculate best parameters to minimize error by taking partial derivatives

Model Selection & Generalization Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21  Learning is an ill-posed problem; data is not sufficient to find a unique solution  Each sample remove irrelevant hypothesis  The need for inductive bias, assumptions about H  E.g. rectangles in our example  Generalization: How well a model performs on new data  Overfitting: H more complex than C or f  Underfitting: H less complex than C or f

Triple Trade-Off Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22  There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c ( H ), 2. Training set size, N, 3. Generalization error, E, on new data  As N  E   As c ( H )  first E  and then E 

Cross-Validation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  To estimate generalization error, we need data unseen during training. We split the data as  Training set (50%) To train a model  Validation set (25%) To select a model (e.g. degree of polynomials)  Test (publication) set (25%) Estimate the error  Resampling when there is few data

Dimensions of a Supervised Learner 1. Model: 2. Loss function: 3. Optimization procedure: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Similar presentations

Presentation on theme: "MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Similar presentations

Presentation on theme: "MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)"— Presentation transcript:

Similar presentations

About project

Feedback