INTRODUCTION TO Machine Learning 3rd Edition

Slides:



Advertisements
Similar presentations
Non-Parametric Density Functions Christoph F. Eick 12/24/08.
Advertisements

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
INTRODUCTION TO Machine Learning 3rd Edition
INTRODUCTION TO Machine Learning 3rd Edition
Lecture 3 Nonparametric density estimation and classification
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Lecture Notes for CMPUT 466/551 Nilanjan Ray
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation X = {
CHAPTER 4: Parametric Methods. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Parametric Estimation Given.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
INTRODUCTION TO Machine Learning 3rd Edition
Jakob Verbeek December 11, 2009
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
KNN & Naïve Bayes Hongning Wang Today’s lecture Instance-based classifiers – k nearest neighbors – Non-parametric learning algorithm Model-based.
Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI
Introduction to Machine Learning Multivariate Methods 姓名 : 李政軒.
Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
KNN & Naïve Bayes Hongning Wang
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Ch8: Nonparametric Methods
3(+1) classifiers from the Bayesian world
Non-Parametric Density Functions
Nonparametric Density Estimation
K Nearest Neighbor Classification
Statistical Learning Dong Liu Dept. EEIS, USTC.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
INTRODUCTION TO Machine Learning 3rd Edition
Nonparametric methods Parzen window and nearest neighbor
INTRODUCTION TO Machine Learning
Parametric Methods Berlin Chen, 2005 References:
Nonparametric density estimation and classification
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Hairong Qi, Gonzalez Family Professor
INTRODUCTION TO Machine Learning 3rd Edition
Radial Basis Functions: Alternative to Back Propagation
Linear Discrimination
Test #1 Thursday September 20th
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
INTRODUCTION TO Machine Learning 3rd Edition
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Presentation transcript:

INTRODUCTION TO Machine Learning 3rd Edition Lecture Slides for INTRODUCTION TO Machine Learning 3rd Edition ETHEM ALPAYDIN © The MIT Press, 2014 alpaydin@boun.edu.tr http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Modified by Prof. Carolina Ruiz for CS539 Machine Learning at WPI

CHAPTER 8: Nonparametric Methods

Nonparametric Estimation Parametric (single global model), semiparametric (small number of local models) Nonparametric: Similar inputs have similar outputs Functions (pdf, discriminant, regression) change smoothly Keep the training data;“let the data speak for itself” Given x, find a small number of closest training instances and interpolate from these Aka lazy/memory-based/case-based/instance- based learning

Density Estimation Given the training set X={xt}t drawn iid from p(x) Divide data into bins of size h Histogram: Naive estimator: or equivalently,

N= 8

the kernel estimate is a sum of “boxes”

Kernel Estimator to get a smooth estimate Kernel function, e.g., Gaussian kernel: Kernel estimator (Parzen windows) the kernel function K(.) determines the shape of the “influence” h determines the width of the “influence” the kernel estimate is the sum of “bumps”

k-Nearest Neighbor Estimator Instead of fixing bin width h and counting the number of instances, fix the instances (neighbors) k and check bin width dk(x), distance to kth closest instance to x this is like a naïve estimator with

k-NN estimator is not continuous not a pdf since its integral is not 1 but it can be smoothed using a kernel function where density is high, bins are small and so is high where density is low, bins are large and so is low

Multivariate Data (d dimensions) Kernel density estimator Multivariate Gaussian kernel spheric Euclidean distance ellipsoid Mahalanobis distance inputs should be normalized to have the same variance

Nonparametric Classification Estimate p(x|Ci) and use Bayes’ rule Kernel estimator particular case: k-NN estimator The point here is that the k-NN classifier assign as the class of x, the majority class among the k neighbors of x. ki : # of k-nn in class Ci Vk(x): volume of hypersphere centered at x with radius ||x-x(k)|| assigns x to the class having more examples among the k-NN of x

Condensed Nearest Neighbor Time/space complexity of k-NN is O (N) Find a subset Z of X that is small and is accurate in classifying X (Hart, 1968) Voronoi diagram circled data points can be discarded

Condensed Nearest Neighbor Incremental algorithm: Add instance if needed

Distance-based Classification Find a distance function D(xr,xs) such that if xrand xsbelong to the same class, distance is small and if they belong to different classes, distance is large Assume a parametric model and learn its parameters using data, e.g.,

Learning a Distance Function The three-way relationship between distances, dimensionality reduction, and feature extraction. M=LTL is dxd and L is kxd Similarity-based representation using similarity scores Large-margin nearest neighbor (chapter 13)

Euclidean distance (circle) is not suitable, Mahalanobis distance using an M (ellipse) is suitable. After the data is projected along L, Euclidean distance can be used.

Outlier Detection Find outlier/novelty points Not a two-class problem because outliers are very few, of many types, and seldom labeled Instead, one-class classification problem: Find instances that have low probability In nonparametric case: Find instances far away from other instances

the larger LOF(x), the most likely x is an outlier Local Outlier Factor the larger LOF(x), the most likely x is an outlier

Nonparametric Regression Aka smoothing models Regressogram

Running Mean/Kernel Smoother Running mean smoother Running line smoother Kernel smoother where K( ) is Gaussian Additive models (Hastie and Tibshirani, 1990)

How to Choose k or h ? When k or h is small, single instances matter; bias is small, variance is large (undersmoothing): High complexity As k or h increases, we average over more instances and variance decreases but bias increases (oversmoothing): Low complexity Cross-validation is used to finetune k or h.