Overview of Supervised Learning

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

The Software Infrastructure for Electronic Commerce Databases and Data Mining Lecture 4: An Introduction To Data Mining (II) Johannes Gehrke
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Kriging.
Pattern Recognition and Machine Learning
Supervised Learning Recap
Model assessment and cross-validation - overview
Chapter 4: Linear Models for Classification
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
Statistical Decision Theory, Bayes Classifier
Flexible Metric NN Classification based on Friedman (1995) David Madigan.
Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Sparse vs. Ensemble Approaches to Supervised Learning
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
1 An Introduction to Nonparametric Regression Ning Li March 15 th, 2004 Biostatistics 277.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Outline Separating Hyperplanes – Separable Case
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Empirical Research Methods in Computer Science Lecture 7 November 30, 2005 Noah Smith.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Optimal Bayes Classification
Single-Factor Studies KNNL – Chapter 16. Single-Factor Models Independent Variable can be qualitative or quantitative If Quantitative, we typically assume.
Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.
Lecture 2: Statistical learning primer for biologists
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Machine Learning 5. Parametric Methods.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Learning Kernel Classifiers 1. Introduction Summarized by In-Hee Lee.
Intro. ANN & Fuzzy Systems Lecture 16. Classification (II): Practical Considerations.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 1: INTRODUCTION.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Machine Learning – Classification David Fenyő
Chapter 7. Classification and Prediction
Chapter 3: Maximum-Likelihood Parameter Estimation
Deep Feedforward Networks
Ch8: Nonparametric Methods
CH 5: Multivariate Methods
10701 / Machine Learning.
The Elements of Statistical Learning
Data Mining Lecture 11.
Statistical Learning Dong Liu Dept. EEIS, USTC.
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Pattern Classification via Density Estimation
EE513 Audio Signals and Systems
Simple Linear Regression
Generally Discriminant Analysis
LECTURE 07: BAYESIAN ESTIMATION
Introduction to Radial Basis Function Networks
Model generalization Brief summary of methods
Parametric Methods Berlin Chen, 2005 References:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Machine Learning – a Probabilistic Perspective
Lecture 16. Classification (II): Practical Considerations
Support Vector Machines 2
Regression Models - Introduction
Empirical Distributions
Presentation transcript:

Overview of Supervised Learning Notation X: inputs, feature vector, predictors, independent variables. Generally X will be a vector of p real values. Qualitative features are coded in X using, for example, dummy variables. Sample values of X generally in lower case; xi is ith of N sample values. Y: output, response, dependent variable. Typically a scalar, can be a vector, of real values. Again yi is a realized value. G: a qualitative response, taking values in a discrete set G; e.g. G = f survived, died g. We often code G via a binary indicator response vector Y .

200 points generated in R2 from an unknown distribution; 100 in each of two classes G = f GREEN; REDg. Can we build a rule to predict the color of future points?

Linear Regression

Possible Scenarios Scenario 1: The data in each class are generated from a Gaussian distribution with uncorrelated components, same variances, and different means. Scenario 2: The data in each class are generated from a mixture of 10 gaussians in each class. For Scenario 1, the linear regression rule is almost optimal (Chapter 4). For Scenario 2, it is far too rigid.

K-Nearest Neighbors

15-nearest neighbor classification 15-nearest neighbor classification. Fewer training data are misclassified, and the decision boundary adapts to the local densities of the classes.

1-nearest neighbor classification 1-nearest neighbor classification. None of the training data are misclassified.

Discussion Linear regression uses 3 parameters to describe its fit. K-nearest neighbors uses 1, the value of k? More realistically, k-nearest neighbors uses N/k effective number of parameters Many modern procedures are variants of linear regression and K-nearest neighbors: Kernel smoothers Local linear regression Linear basis expansions Projection pursuit and neural networks

Linear regression vs k-nn? First we expose the oracle. The density for each class was an equal mixture of 10 Gaussians. For the GREEN class, its 10 means were generated from a N((1; 0)T ; I) distribution (and considered fixed). For the RED class, the 10 means were generated from a N((0; 1)T ; I) distribution. The within cluster variances were 1/5. See page 17 for more details, or the book website for the actual data.

The results of classifying 10,000 test observations generated from this distribution. The Bayes Error is the best performance possible.

Statistical Decision Theory Case 1: Quantitative Output Y

Statistical Decision Theory Case 2: Qualitative Output G

This is known as the Bayes classifier. It just says that we should pick the class having maximum probability at the input X Question: how do we construct the Bayes classifier for our simulation example?