Presentation is loading. Please wait.

Presentation is loading. Please wait.

ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for.

Similar presentations


Presentation on theme: "ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for."— Presentation transcript:

1 ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Slides for

2

3 Learning a Class from Examples Class C of a “family car” Prediction: Is car x a family car? Knowledge extraction: What do people expect from a family car? Output: Positive (+) and negative (–) examples Input representation: x 1 : price, x 2 : engine power Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 3

4 Training set X Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 4

5 Class C Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 5 The class of family car defined by the expert

6 Hypothesis class H Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 6 Error of h on H The class of family car defined by the learning system The class of family car defined by the expert

7 S, G, and the Version Space Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 7 most specific hypothesis, S most general hypothesis, G Any h  H, between S and G is a valid hypothesis with no error, said to be consistent with the training set, and make up the version space. (Mitchell, 1997) version space

8 Margin Choose h with largest margin We choose the hypothesis with the largest margin, for best separation. The shaded instances are those that define the margin; other instances can be remove without affecting h. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 8

9 VC Dimension N points can be labeled in 2 N ways as +/– H shatters ( 打碎 ) N if there exists h  H consistent to separate the positive examples from the negative. VC( H ) = N Fig. 2.6 An axis-aligned rectangle ( h ) can shatter four points. Only rectangles covering two points are shown. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 9 An axis-aligned rectangle shatters 4 points only !

10 VC Dimension Summary: The maximum number of points that can be shattered by H is called the VC dimension of H, and measures the capacity of H. A look up table has infinite VC dimension. Exercise Show that the VC dimension of the triangle hypothesis class is 7 in two dimensions. Hint: For best separation, it is best to place the seven points equidistant ( 等距的 ) on a circle. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 10

11 Probably Approximately Correct (PAC) Learning How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989) The pr of a positive example falling in here (and causing an error) is at most ε. Each strip is at most ε/4 Pr that we miss a strip 1 ‒ ε/4 Pr that N instances miss a strip (1 ‒ ε/4) N Pr that N instances miss 4 strips 4(1 ‒ ε/4) N 4(1 ‒ ε/4) N ≤ δ and (1 ‒ x) ≤ exp[ ‒ x] 4exp[ ‒ εN/4] ≤ δ and N ≥ (4/ε)log(4/δ) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 11

12 Noise and Model Complexity Use the simpler model because Simpler to use (lower computational complexity) Easier to train (lower space complexity) Easier to explain (more interpretable) Generalizes better (less variance and less affected by single instances) Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 12

13 Learning Multiple Classes, C i, i =1,...,K Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 13 Train hypotheses h i (x), i =1,..., K :

14 Regression Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 14 w 1 =? w 0 =? See page 36 Linear, second-order and sixth-order polynomials are fitted to the same set of points.

15 Model Selection & Generalization Learning is an ill-posed problem Data is not sufficient to find a unique solution The need for inductive bias (assumptions ) about H The inductive bias of a learning algorithm is the set of assumptions that the learner uses to predict outputs given inputs that it has not encountered ( 遇到 ) (Mitchell, 1980). A classical example of an inductive bias is Occam's Razor, assuming that the simplest consistent hypothesis about the target function is actually the best. ( http://en.wikipedia.org/wiki/Inductive_bias) Occam's Razor ( 奧卡姆剃刀 ): 排除不必要的要素 ( 假設 ) 的思考方式 ; 把論題簡化的思考原則 For example Assuming the shape of a rectangle is one inductive bias In linear regression, assuming a linear function is an inductive bias, and choosing the one that minimizes squared error is another inductive bias. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 15

16 Well-posed problem The mathematical term well-posed problem stems from a definition given by Hadamard. He believed that mathematical models of physical phenomena should have the properties that ▫A solution exists. ▫The solution is unique. ▫The solution's behavior hardly changes when there's a slight change in the initial condition (topology). Problems that are not well-posed in the sense of Hadamard are termed ill-posed. http://en.wikipedia.org/wiki/Well-posedness 16

17 Model Selection & Generalization Generalization: How well a model trained on the training set predicts the right output for new instances. Underfitting: H less complex than C or f The hypothesis is less complex than the function. Overfitting: H more complex than C or f The hypothesis is more complex than the function. Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 17

18 Triple Trade-Off There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c ( H )  c ( H ) : the capacity of the hypothesis class 2. The amount of training data, N 3. Generalization error, E, on new example For example As N  E  As c ( H )  first E  and then E  Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 18

19 Cross-Validation Cross-validation is the statistical practice of partitioning a sample of data into subsets such that the analysis is initially performed on a single subset, while the other subset(s) are retained ( 保留 ) for subsequent ( 隨後的 ) use in confirming and validating ( 證實 ) the initial analysis. (http://en.wikipedia.org/wiki/Cross-validation) 19

20 Cross-Validation To estimate generalization error, we need data unseen during training. We split the data as Training set (50%) Validation set (25%) : validation error To test the generalization ability Test (publication) set (25%) : expected error Contains examples not used in training or validation For example To find the right order in polynomial regression. Given a number of candidate polynomials of different orders. For each order, we find the coefficients on the training set, calculate their errors on the validation set, and take the one that has the least validation error as the best polynomial. Resampling when there is few data Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 20

21 1. Model: For example, in linear regression, the model is the linear function of the input whose slope ( 斜率 ) and intercept ( 截距 ) are the parameters learned from the data. 2. Loss function (Error function): 3. Optimization procedure: Dimensions of a Supervised Learner Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 21

22 Exercise The complexity of most learning algorithms is a function of the training set. Can you proposed a filtering algorithm that finds redundant instances? If we have a supervisor who can provide us with the label for any x, where should we choose x to learn with fewer queries? Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) 22


Download ppt "ETHEM ALPAYDIN © The MIT Press, 2010 Lecture Slides for."

Similar presentations


Ads by Google