Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Slides:

Advertisements

Similar presentations

Applications of one-class classification

Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Pattern Recognition and Machine Learning

Model generalization Test error Bias, variance and complexity

Ch11 Curve Fitting Dr. Deshi Ye

Model Assessment and Selection

Model assessment and cross-validation - overview

Chapter 4: Linear Models for Classification

Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.

Instance Based Learning

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Introduction to Predictive Learning

Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.

Lecture Notes for CMPUT 466/551 Nilanjan Ray

Statistical Decision Theory, Bayes Classifier

Flexible Metric NN Classification based on Friedman (1995) David Madigan.

Machine Learning CMPT 726 Simon Fraser University

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.

Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:

Classification and Prediction: Regression Analysis

Collaborative Filtering Matrix Factorization Approach

METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.

DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.

Optimal Bayes Classification

Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Linear Models for Classification

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Chapter 13 (Prototype Methods and Nearest-Neighbors )

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.

Machine Learning 5. Parametric Methods.

DATA MINING LECTURE 10b Classification k-nearest neighbor classifier

METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.

Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.

LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

Chapter 7. Classification and Prediction

Probability Theory and Parameter Estimation I

Empirical risk minimization

CH 5: Multivariate Methods

The Elements of Statistical Learning

Overview of Supervised Learning

Bias and Variance of the Estimator

Collaborative Filtering Matrix Factorization Approach

Instance Based Learning

Pattern Recognition and Machine Learning

Parametric Methods Berlin Chen, 2005 References:

Empirical risk minimization

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Presentation transcript:

Overview of Supervised Learning

Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision Theory Local Methods in High Dimensions Statistical Models, Supervised Learning and Function Approximation Structured Regression Models Classes of Restricted Estimators Model Selection and Bias

Overview of Supervised Learning3 Notation X : inputs, feature vector, predictors, independent variables. Generally X will be a vector of p values. Qualitative features are coded in X. –Sample values of X generally in lower case; x i is i -th of N sample values. Y : output, response, dependent variable. –Typically a scalar, can be a vector, of real values. Again y i is a realized value. G : a qualitative response, taking values in a discrete set G ; e.g. G ={ survived, died }. We often code G via a binary indicator response vector Y.

Overview of Supervised Learning4 Problem 200 points generated in IR 2 from a unknown distribution; 100 in each of two classes G ={ GREEN, RED }. Can we build a rule to predict the color of the future points?

Overview of Supervised Learning5 Code Y=1 if G=RED, else Y=0. We model Y as a linear function of X: Obtain  by least squares, by minimizing the quadratic criterion: Given an model matrix X and a response vector y, Linear regression

Overview of Supervised Learning6 Linear regression

Overview of Supervised Learning7 Linear regression Figure 2.1: A Classification example in two dimensions. The classes are coded as a binary variable (GREEN=0, RED=1) and then fit by linear regression. The line is the decision boundary defined by. The red shaded region denotes that part of input space classified as RED,while the green region is classified as GREEN.

Overview of Supervised Learning8 Possible scenarios

Overview of Supervised Learning9 K-Nearest Neighbors

Overview of Supervised Learning10 K-Nearest Neighbors Figure 2.2: The same classification example in two dimensions as in Figure 2.1. The classes are coded as a binary variable (GREEN=0, RED=1) and the fit by 15- nearest-neighbor. The predicted class is hence chosen by majority vote amongst the 15-nearest neighbors.

Overview of Supervised Learning11 K-Nearest Neighbors Figure 2.3: The same classification example are coded as a binary variable ( GREEN=0, RED=1), and then predicted by 1-nearest-neighbor classification.

Overview of Supervised Learning12 Linear regression vs. k-NN

Overview of Supervised Learning13 Linear regression vs. k-NN Figure 2.4: Misclassification curves for the simulation example above. a test sample of size 10,000 was used. The red curves are test and the green are training error for k- NN classification. The results for linear regression are the bigger green and red dots at three degrees of freedom. The purple line is the optimal Bayes Error Rate.

Overview of Supervised Learning14 Other Methods

Overview of Supervised Learning15 Statistical decision theory

Overview of Supervised Learning16 回归函数

Overview of Supervised Learning17

Overview of Supervised Learning18

Overview of Supervised Learning19 Bayes Classifier

Overview of Supervised Learning20 Bayes Classifier Figure 2.5: The optimal Bayes decision boundary for the simulation example above. Since the generating density is known for each class, this boundary can be calculated exactly.

Overview of Supervised Learning21 Curse of dimensionality

Overview of Supervised Learning22

Overview of Supervised Learning23

Overview of Supervised Learning24

Linear Model Linear Regression Test error Overview of Supervised Learning25

Overview of Supervised Learning26 Curse of dimensionality

Overview of Supervised Learning28

Overview of Supervised Learning29 Statistical Models

Overview of Supervised Learning30 Supervised Learning

Overview of Supervised Learning31 Two Types of Supervised Learning

Overview of Supervised Learning32 Learning Classification Models

Overview of Supervised Learning33 Learning Regression Models

Overview of Supervised Learning34 Function Approximation

Overview of Supervised Learning35 Function Approximation Figure 2.10: Least squares fitting of a function of two inputs. The parameters of f θ (x) are chosen so as to minimize the sum-of- squared vertical errors.

Overview of Supervised Learning36 Function Approximation More generally, Maximum Likelihood Estimation provides a natural basis for estimation. E.g. multinomial

Overview of Supervised Learning37 Structured Regression Models

Overview of Supervised Learning38 Classes of Restricted Estimators

Overview of Supervised Learning39 Model Selection & the Bias-Variance Tradeoff

Overview of Supervised Learning40 Model Selection & the Bias-Variance Tradeoff Test and training error as a function of model complexity.

Overview of Supervised Learning41 Page 27 Ex 2.1; 2.2 ； 2.6