Download presentation
Presentation is loading. Please wait.
Published byIra Griffin Modified over 9 years ago
1
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University
2
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 S, G, and the Version Space most specific hypothesis, S most general hypothesis, G h H, between S and G is consistent and make up the version space (Mitchell, 1997)
3
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3 VC Dimension N points can be labeled in 2 N ways as +/– H shatters N if there exists h H consistent for any of these: VC(H ) = N An axis-aligned rectangle shatters 4 points only !
4
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989) Each strip is at most ε /4 Pr that we miss a strip 1 ‒ ε /4 Pr that N instances miss a strip (1 ‒ ε /4) N Pr that N instances miss 4 strips 4(1 ‒ ε /4) N 4(1 ‒ ε /4) N ≤ δ and (1 ‒ x)≤exp( ‒ x) 4exp( ‒ ε N/4) ≤ δ and N ≥ (4/ ε )log(4/ δ ) Probably Approximately Correct (PAC) Learning
5
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 Use the simpler one because Simpler to use (lower computational complexity) Easier to train (lower space complexity) Easier to explain (more interpretable) Generalizes better (lower variance - Occam’s razor) Noise and Model Complexity
6
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6 Multiple Classes, C i i=1,...,K Train hypotheses h i (x), i =1,...,K:
7
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7 Regression
8
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8 Model Selection & Generalization Learning is an ill-posed problem; data is not sufficient to find a unique solution The need for inductive bias, assumptions about H Generalization: How well a model performs on new data Overfitting: H more complex than C or f Underfitting: H less complex than C or f
9
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9 Triple Trade-Off There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c (H), 2. Training set size, N, 3. Generalization error, E, on new data As N E As c (H) first E and then E
10
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10 Cross-Validation To estimate generalization error, we need data unseen during training. We split the data as Training set (50%) Validation set (25%) Test (publication) set (25%) Resampling when there is few data
11
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11 Dimensions of a Supervised Learner 1. Model: 2. Loss function: 3. Optimization procedure:
12
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12 Steps to solving a supervised learning problem 1. Select the input-output pairs 2. Decide how to encode the inputs and outputs – This defines the instance space X, and the out put space Y. 3. Choose a class of hypotheses / representations: H 4. Choose an error function to define the best hypothesis 5. Choose an algorithm for searching efficiently through the space of hypotheses.
13
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13 Example: What hypothesis class should we pick? xy.862.49.09.83 -.85-.25.873.1 -.44.87 -.43.02 -1.1-.12.41.81 -.96-.83.17.43
14
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Linear Hypothesis Suppose y was a linear function of x: h w (x) = w 0 + w 1 x (+ … ) w i are called parameters or weights. To simplify notation we add an attribute x 0 = 1to the other n attributes (also called the bias term). where w and x are vectors of size n+1 How should we pick w ?
15
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15 Error Minimization We should make the predictions of h w close the true value y on the data we have. We define an error function or a cost function. We will pick w such that the error function is minimized. How should we choose the error function?
16
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16 Least Mean Squares (LMS) Try to make h w (x) close to y on the examples in the training set. We define a sum-of-squares error function We will choose w such as to minimize J(w) Compute w such that:
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.