MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Slides:

Advertisements

Similar presentations

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Advertisements

Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.

Unsupervised Learning

Data Mining Classification: Alternative Techniques

INTRODUCTION TO Machine Learning 3rd Edition

K Means Clustering , Nearest Cluster and Gaussian Mixture

1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)

Lecture 3 Nonparametric density estimation and classification

Instance Based Learning

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture Notes for CMPUT 466/551 Nilanjan Ray

Instance Based Learning

These slides are based on Tom Mitchell’s book “Machine Learning” Lazy learning vs. eager learning Processing is delayed until a new instance must be classified.

KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

INSTANCE-BASE LEARNING

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.

PATTERN RECOGNITION AND MACHINE LEARNING

CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI

MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.

INTRODUCTION TO Machine Learning 3rd Edition

Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.

Jakob Verbeek December 11, 2009

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Machine Learning 5. Parametric Methods.

Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

Eick: kNN kNN: A Non-parametric Classification and Prediction Technique Goals of this set of transparencies: 1.Introduce kNN---a popular non-parameric.

CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.

CS 8751 ML & KDDInstance Based Learning1 k-Nearest Neighbor Locally weighted regression Radial basis functions Case-based reasoning Lazy and eager learning.

KNN & Naïve Bayes Hongning Wang

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

INTRODUCTION TO Machine Learning 3rd Edition

Ch8: Nonparametric Methods

Non-Parametric Density Functions

K Nearest Neighbor Classification

Nearest-Neighbor Classifiers

Instance Based Learning

INTRODUCTION TO Machine Learning

Model generalization Brief summary of methods

Nonparametric density estimation and classification

Presentation transcript:

MACHINE LEARNING 9. Nonparametric Methods

Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Estimating distribution for  Classification  Regression  Clustering  Parametric: Assume model, find optimum parameters from data  ML,MAP, Least Squares  Semi-parametric: Assume distribution mixture. Use EM/Clustering to find parameters

Non-parametric estimation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3  Can’t assume a model for distribution densities  Might be a very complicated model with large number of parameters  Assuming wrong model leads to large error  Nonparametric estimation principle:  “Similar inputs have similar outputs”  Find similar data instances in the training data and interpolate/average their outputs

Nonparametric estimation Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4  Parametric estimation: all data instances affect the final global estimate  Global methods  Non-parametric estimation  No single global model  Local models are created as needed  Affected only by near-by instances  Also called: instance-based or memory-based methods

Memory-based method Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5  Lazy method  Store training data of size N  O(N) memory  O(N) search to for similar data  Eager method: parametric methods  d parameters, d<N  O(d) memory and processing

Local methods and “curse of dimensionality” Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6  500 of 2D points in unit square gives pretty much good picture of the density  Single 1000 dimension vector doesn’t have enough information about the joint distribution of 1000 random variables  Need much more samples in higher dimensional space

Density Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7  Given the training set X ={x t } t drawn iid from p(x)  Commutative distribution  For density estimation select length h

Histogram Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8  Divide input space into equal size bins and origin x 0

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9

Naïve Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11

Naïve Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12  Each training sample has symmetric region of influence of size h  Contribute 1 to x falling into region of influence  This region of influence “hard”, not continuous  Soft influence  Contribute as function of distance  Training samples that are close to input contribute more

Kernel Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Kernel function, e.g., Gaussian kernel:  Kernel estimator (Parzen windows)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14

K-Nearest Neighbors Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  Histogram/Kernel methods select uniform bin size  Actually we want bin size to be small if there are lot of samples in the neighborhood  Arrange distances from training samples

k-Nearest Neighbor Estimator Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width d k (x), distance to kth closest instance to x

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17

Multivariate Data Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18  Kernel density estimator Multivariate Gaussian kernel spheric ellipsoid

Nonparametric Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Training data “vote” for input label  Closer points get more influence  Kernel  Weight votes according to distance  K-NN  Weight k closest points  1-NN  Find closest point  Assign label of this point to input

Nonparametric Classification Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20  Estimate p(x| C i ) and use Bayes’ rule  Kernel estimator  k-NN estimator

Condensed Nearest Neighbor Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21  Time/space complexity of k-NN is O (N)  Find a subset Z of X that is small and is accurate in classifying X  Use 1-NN

Condensed Nearest Neighbor Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22  Incremental algorithm: Add instance if needed

Nonparametric Regression Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  Aka smoothing models  Take several closest points and weight/average their output  Parametric model: find polynomial coefficients and evaluate input on fitted function  Take average of output in the same bin: Regressogram

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24

Kernel Smoother Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26

Running Line Smoother Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27  Fit line locally  Can take into accounts distances (Kernel)

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 28

How to Choose k or h? Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 29  When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity  As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity  Cross-validation is used to finetune k or h.

Classifications Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 30