Overview of Supervised Learning. 2015-10-23Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
Model generalization Test error Bias, variance and complexity
Ch11 Curve Fitting Dr. Deshi Ye
Model Assessment and Selection
Model assessment and cross-validation - overview
Chapter 4: Linear Models for Classification
Basis Expansion and Regularization Presenter: Hongliang Fei Brian Quanz Brian Quanz Date: July 03, 2008.
Instance Based Learning
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Introduction to Predictive Learning
Instance based learning K-Nearest Neighbor Locally weighted regression Radial basis functions.
Lecture Notes for CMPUT 466/551 Nilanjan Ray
Statistical Decision Theory, Bayes Classifier
Flexible Metric NN Classification based on Friedman (1995) David Madigan.
Machine Learning CMPT 726 Simon Fraser University
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Classification and Prediction: Regression Analysis
Collaborative Filtering Matrix Factorization Approach
METU Informatics Institute Min 720 Pattern Classification with Bio-Medical Applications PART 2: Statistical Pattern Classification: Optimal Classification.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
沈致远. Test error(generalization error): the expected prediction error over an independent test sample Training error: the average loss over the training.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
Optimal Bayes Classification
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Linear Models for Classification
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning 5. Parametric Methods.
DATA MINING LECTURE 10b Classification k-nearest neighbor classifier
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
FUZZ-IEEE Kernel Machines and Additive Fuzzy Systems: Classification and Function Approximation Yixin Chen and James Z. Wang The Pennsylvania State.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Chapter 7. Classification and Prediction
Probability Theory and Parameter Estimation I
Empirical risk minimization
CH 5: Multivariate Methods
The Elements of Statistical Learning
Overview of Supervised Learning
Bias and Variance of the Estimator
Collaborative Filtering Matrix Factorization Approach
Instance Based Learning
Pattern Recognition and Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Empirical risk minimization
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Presentation transcript:

Overview of Supervised Learning

Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision Theory Local Methods in High Dimensions Statistical Models, Supervised Learning and Function Approximation Structured Regression Models Classes of Restricted Estimators Model Selection and Bias

Overview of Supervised Learning3 Notation X : inputs, feature vector, predictors, independent variables. Generally X will be a vector of p values. Qualitative features are coded in X. –Sample values of X generally in lower case; x i is i -th of N sample values. Y : output, response, dependent variable. –Typically a scalar, can be a vector, of real values. Again y i is a realized value. G : a qualitative response, taking values in a discrete set G ; e.g. G ={ survived, died }. We often code G via a binary indicator response vector Y.

Overview of Supervised Learning4 Problem 200 points generated in IR 2 from a unknown distribution; 100 in each of two classes G ={ GREEN, RED }. Can we build a rule to predict the color of the future points?

Overview of Supervised Learning5 Code Y=1 if G=RED, else Y=0. We model Y as a linear function of X: Obtain  by least squares, by minimizing the quadratic criterion: Given an model matrix X and a response vector y, Linear regression

Overview of Supervised Learning6 Linear regression

Overview of Supervised Learning7 Linear regression Figure 2.1: A Classification example in two dimensions. The classes are coded as a binary variable (GREEN=0, RED=1) and then fit by linear regression. The line is the decision boundary defined by. The red shaded region denotes that part of input space classified as RED,while the green region is classified as GREEN.

Overview of Supervised Learning8 Possible scenarios

Overview of Supervised Learning9 K-Nearest Neighbors

Overview of Supervised Learning10 K-Nearest Neighbors Figure 2.2: The same classification example in two dimensions as in Figure 2.1. The classes are coded as a binary variable (GREEN=0, RED=1) and the fit by 15- nearest-neighbor. The predicted class is hence chosen by majority vote amongst the 15-nearest neighbors.

Overview of Supervised Learning11 K-Nearest Neighbors Figure 2.3: The same classification example are coded as a binary variable ( GREEN=0, RED=1), and then predicted by 1-nearest-neighbor classification.

Overview of Supervised Learning12 Linear regression vs. k-NN

Overview of Supervised Learning13 Linear regression vs. k-NN Figure 2.4: Misclassification curves for the simulation example above. a test sample of size 10,000 was used. The red curves are test and the green are training error for k- NN classification. The results for linear regression are the bigger green and red dots at three degrees of freedom. The purple line is the optimal Bayes Error Rate.

Overview of Supervised Learning14 Other Methods

Overview of Supervised Learning15 Statistical decision theory

Overview of Supervised Learning16 回归函数

Overview of Supervised Learning17

Overview of Supervised Learning18

Overview of Supervised Learning19 Bayes Classifier

Overview of Supervised Learning20 Bayes Classifier Figure 2.5: The optimal Bayes decision boundary for the simulation example above. Since the generating density is known for each class, this boundary can be calculated exactly.

Overview of Supervised Learning21 Curse of dimensionality

Overview of Supervised Learning22

Overview of Supervised Learning23

Overview of Supervised Learning24

Linear Model Linear Regression Test error Overview of Supervised Learning25

Overview of Supervised Learning26 Curse of dimensionality

Overview of Supervised Learning28

Overview of Supervised Learning29 Statistical Models

Overview of Supervised Learning30 Supervised Learning

Overview of Supervised Learning31 Two Types of Supervised Learning

Overview of Supervised Learning32 Learning Classification Models

Overview of Supervised Learning33 Learning Regression Models

Overview of Supervised Learning34 Function Approximation

Overview of Supervised Learning35 Function Approximation Figure 2.10: Least squares fitting of a function of two inputs. The parameters of f θ (x) are chosen so as to minimize the sum-of- squared vertical errors.

Overview of Supervised Learning36 Function Approximation More generally, Maximum Likelihood Estimation provides a natural basis for estimation. E.g. multinomial

Overview of Supervised Learning37 Structured Regression Models

Overview of Supervised Learning38 Classes of Restricted Estimators

Overview of Supervised Learning39 Model Selection & the Bias-Variance Tradeoff

Overview of Supervised Learning40 Model Selection & the Bias-Variance Tradeoff Test and training error as a function of model complexity.

Overview of Supervised Learning41 Page 27 Ex 2.1; 2.2 ; 2.6