MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Slides:

Advertisements

Similar presentations

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

CH2 - Supervised Learning Computational learning theory

Data mining in 1D: curve fitting

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Feasibility of learning: the issues solution for infinite hypothesis sets VC generalization bound (mostly lecture 5 on AMLbook.com)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

x – independent variable (input)

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

Computational Learning Theory PAC IID VC Dimension SVM Kunstmatige Intelligentie / RuG KI2 - 5 Marius Bulacu & prof. dr. Lambert Schomaker.

Northwestern University Winter 2007 Machine Learning EECS Machine Learning Lecture 13: Computational Learning Theory.

Measuring Model Complexity (Textbook, Sections ) CS 410/510 Thurs. April 27, 2007 Given two hypotheses (models) that correctly classify the training.

Statistics and Machine Learning Fall, 2005 鮑興國 and 李育杰 National Taiwan University of Science and Technology.

Computational Learning Theory

Probably Approximately Correct Model (PAC)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Part I: Classification and Bayesian Learning

CS Bayesian Learning1 Bayesian Learning. CS Bayesian Learning2 States, causes, hypotheses. Observations, effect, data. We need to reconcile.

Classification and Prediction: Regression Analysis

For Better Accuracy Eick: Ensemble Learning

Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.

Machine Learning CSE 681 CH2 - Supervised Learning.

Learning from observations

Curve-Fitting Regression

Introduction to Machine Learning Supervised Learning 姓名 : 李政軒.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

INTRODUCTION TO Machine Learning 3rd Edition

Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.

Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.

Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.

1 Chapter 10 Introduction to Machine Learning. 2 Chapter 10 Contents (1) l Training l Rote Learning l Concept Learning l Hypotheses l General to Specific.

CS Inductive Bias1 Inductive Bias: How to generalize on novel data.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Machine Learning Concept Learning General-to Specific Ordering

Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Bayesian decision theory: A framework for making decisions when uncertainty exit 1 Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e.

Machine Learning Tom M. Mitchell Machine Learning Department Carnegie Mellon University Today: Computational Learning Theory Probably Approximately.

MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

Data Mining and Decision Support

Machine Learning 5. Parametric Methods.

Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.

CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.

Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.

Evaluating Classifiers. Reading for this topic: T. Fawcett, An introduction to ROC analysis, Sections 1-4, 7 (linked from class website)

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.

Support Vector Machines

CH. 2: Supervised Learning

Christoph Eick: Learning Models to Predict and Classify 1 Learning from Examples Example of Learning from Examples  Classification: Is car x a family.

INTRODUCTION TO Machine Learning

Computational Learning Theory

Computational Learning Theory

Supervised Learning Berlin Chen 2005 References:

Supervised machine learning: creating a model

INTRODUCTION TO Machine Learning

Memory-Based Learning Instance-Based Learning K-Nearest Neighbor

INTRODUCTION TO Machine Learning 3rd Edition

Presentation transcript:

MACHINE LEARNING 3. Supervised Learning

Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Class C of a “family car”  Prediction: Is car x a family car?  Knowledge extraction: What do people expect from a family car?  Output:  Positive (+) and negative (–) examples  Input representation:  Expert suggestions  x 1 : price, x 2 : engine power  Ignore other attributes

Training set X Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3

Class C Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4 Assume class model (rectangle) (p1 ≤ price ≤ p2) & (e1 ≤ engine power ≤ e2)

S, G, and the Version Space Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5 most specific hypothesis, S most general hypothesis, G h  H, between S and G is consistent and make up the version space (Mitchell, 1997)

Hypothesis class H 6 Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) Error of h on H

Generalization Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7  Problem of generalization: how well our hypothesis will correctly classify future examples  In our example: hypothesis is characterized by 4 numbers (p1,p2,e1,e2)  Choose the best one  Include all positive and none negative  Infinitely many hypothesis for real-valued parameters

Doubt Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8  In some applications, a wrong decision is very costly  May reject an instance if fall between S (most specific) and G (most general)

VC Dimension Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Assumed that H (hypothesis space) includes true class C  H should be flexible enough or have enough capacity to include C  Need some measure of hypothesis space “flexibility” complexity  Can try to increase complexity of hypothesis space

VC Dimension Based on for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  N points can be labeled in 2 N ways as +/–  H shatters N if there exists h  H consistent for any of these: VC( H ) = N  An axis-aligned rectangle shatters 4 points only !

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11  Fix a probability of target classification error (planned future)  Actual error depends on training sample(past)  Want the actual probability error(actual future) be less than a target with high probability Probably Approximately Correct (PAC) Learning

Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12  How many training examples N should we have, such that with probability at least 1 ‒ δ, h has error at most ε ? (Blumer et al., 1989)  Let’s calculate how many samples wee need for S  Each strip is at most ε /4  Pr that we miss a strip 1 ‒ ε /4  Pr that N instances miss a strip (1 ‒ ε /4) N  Pr that N instances miss 4 strips 4(1 ‒ ε /4) N  1-4(1 ‒ ε /4) N >1- δ and (1 ‒ x)≤exp( ‒ x)  4exp( ‒ ε N/4) ≤ δ and N ≥ (4/ ε )log(4/ δ )

Noise Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Imprecision in recording the input attributes  Error in labeling data points (teacher noise)  Additional attributes not taken into account (hidden or latent)  Same price/engine with different label due to a color  Effect of this attributes modeled as a noise  Class boundary might be not simple  Need more complicated hypothesis space/model

Noise and Model Complexity Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14 Use the simpler one because  Simpler to use (lower computational complexity)  Easier to train (lower space complexity)  Easier to explain (more interpretable)  Generalizes better (lower variance - Occam’s razor)

Occam Razor Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  If actual class is simple and there is mislabeling or noise, the simpler model will generalized better  Simpler model result in more errors on training set  Will generalized better, won’t try to explain noise in training sample  Simple explanations are more plausible

Multiple Classes Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  General case K classes  Family, Sport, Luxury cars  Classes can overlap  Can use different/same hypothesis class  Fall into two classes? Sometimes worth to reject

Multiple Classes, C i i=1,...,K Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17 Train hypotheses h i (x), i =1,...,K:

Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18  Output is not Boolean (yes/no) or label but numeric value  Training Set of examples  Interpolation: fit function (polynomial)  Extrapolation: predict output for any x  Regression : added noise  Assumption: hidden variables  Approximate output by model: g(x)

Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Empirical error on training set  Hypothesis space is linear functions  Calculate best parameters to minimize error by taking partial derivatives

Higher-order polynomials Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20

Model Selection & Generalization Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21  Learning is an ill-posed problem; data is not sufficient to find a unique solution  Each sample remove irrelevant hypothesis  The need for inductive bias, assumptions about H  E.g. rectangles in our example  Generalization: How well a model performs on new data  Overfitting: H more complex than C or f  Underfitting: H less complex than C or f

Triple Trade-Off Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22  There is a trade-off between three factors (Dietterich, 2003): 1. Complexity of H, c ( H ), 2. Training set size, N, 3. Generalization error, E, on new data  As N  E   As c ( H )  first E  and then E 

Cross-Validation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  To estimate generalization error, we need data unseen during training. We split the data as  Training set (50%) To train a model  Validation set (25%) To select a model (e.g. degree of polynomials)  Test (publication) set (25%) Estimate the error  Resampling when there is few data

Dimensions of a Supervised Learner 1. Model: 2. Loss function: 3. Optimization procedure: Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24