CSC2515 Fall 2007 Introduction to Machine Learning Lecture 7b: Kernel density estimators and nearest neighbors All lecture slides will be available as.

Slides:

Advertisements

Similar presentations

Pattern Recognition and Machine Learning

Advertisements

Principles of Density Estimation

Introduction to Non Parametric Statistics Kernel Density Estimation.

Nonparametric Methods: Nearest Neighbors

CSC321: Introduction to Neural Networks and Machine Learning Lecture 24: Non-linear Support Vector Machines Geoffrey Hinton.

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010

Curse of Dimensionality Prof. Navneet Goyal Dept. Of Computer Science & Information Systems BITS - Pilani.

Pattern Recognition and Machine Learning

INTRODUCTION TO Machine Learning 3rd Edition

Lecture 3 Nonparametric density estimation and classification

Regression. So far, we've been looking at classification problems, in which the y values are either 0 or 1. Now we'll briefly consider the case where.

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 10a Kernel density estimators and nearest neighbors All lecture slides will be available as.ppt,.ps,

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Kernel methods - overview

MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 

Lecture Notes for CMPUT 466/551 Nilanjan Ray

Non Parametric Classifiers Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall

Machine Learning CMPT 726 Simon Fraser University

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

INSTANCE-BASE LEARNING

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

Perceptual and Sensory Augmented Computing Machine Learning Summer’09 Machine Learning – Lecture 2 Probability Density Estimation Bastian Leibe.

CSC2515 Fall 2008 Introduction to Machine Learning Lecture 11a Boosting and Naïve Bayes All lecture slides will be available as.ppt,.ps, &.htm at

Overview of Supervised Learning Overview of Supervised Learning2 Outline Linear Regression and Nearest Neighbors method Statistical Decision.

Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.

Perceptual and Sensory Augmented Computing Machine Learning WS 13/14 Machine Learning – Lecture 3 Probability Density Estimation II Bastian.

Lecture3 – Overview of Supervised Learning Rice ELEC 697 Farinaz Koushanfar Fall 2006.

Jakob Verbeek December 11, 2009

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Nonparametric Density Estimation Riu Baring CIS 8526 Machine Learning Temple University Fall 2007 Christopher M. Bishop, Pattern Recognition and Machine.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Machine Learning 5. Parametric Methods.

Machine Learning ICS 178 Instructor: Max Welling Supervised Learning.

CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.

Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.

CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:

Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:

1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.

PREDICT 422: Practical Machine Learning

Machine Learning for Computer Security

Oliver Schulte Machine Learning 726

k-Nearest neighbors and decision tree

CS262: Computer Vision Lect 09: SIFT Descriptors

INTRODUCTION TO Machine Learning 3rd Edition

Dept. Computer Science & Engineering, Shanghai Jiao Tong University

Ch8: Nonparametric Methods

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Machine learning, pattern recognition and statistical data modelling

Overview of Supervised Learning

Statistical Learning Dong Liu Dept. EEIS, USTC.

CS 2770: Computer Vision Intro to Visual Recognition

An Introduction to Supervised Learning

Mathematical Foundations of BME Reza Shadmehr

Nonparametric methods Parzen window and nearest neighbor

CS4670: Intro to Computer Vision

CS5112: Algorithms and Data Structures for Applications

CS4670: Intro to Computer Vision

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Nonparametric density estimation and classification

CS639: Data Management for Data Science

Hairong Qi, Gonzalez Family Professor

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.

The “Margaret Thatcher Illusion”, by Peter Thompson

Presentation transcript:

CSC2515 Fall 2007 Introduction to Machine Learning Lecture 7b: Kernel density estimators and nearest neighbors All lecture slides will be available as .ppt, .ps, & .htm at www.cs.toronto.edu/~hinton Many of the figures are provided by Chris Bishop from his textbook: ”Pattern Recognition and Machine Learning”

Histograms as density models green curve is true density For low dimensional data we can use a histogram as a density model. How wide should the bins be? (width=regulariser) Do we want the same bin-width everywhere? Do we believe the density is zero for empty bins? is too narrow is too wide

Some good and bad properties of histograms as density estimators There is no need to fit a model to the data. We just compute some very simple statistics (the number of datapoints in each bin) and store them. The number of bins is exponential in the dimensionality of the dataspace. So high-dimensional data is tricky: We must either use big bins or get lots of zero counts (or adapt the local bin-width to the density) The density has silly discontinuities at the bin boundaries. We must be able to do better by some kind of smoothing.

Local density estimators Estimate the density in a small region to be Problem 1: Variance in estimate if K is small. Problem 2: Unmodelled variation across the region if V is big compared with the smoothness of the true density points in region volume of region total points

Kernel density estimators Use regions centered on the datapoints Allow the regions to overlap. Let each individual region contribute a total density of 1/N Use regions with soft edges to avoid discontinuities (e.g. isotropic Gaussians)

The density modeled by a kernel density estimator is too narrow is too wide

Nearest neighbor methods points in region volume of region total points Vary the size of a hyper-sphere around each test point so that exactly K training datapoints fall inside the hyper-sphere. Does this give a fair estimate of the density? Nearest neighbors is usually used for classification or regression: For regression, average the predictions of the K nearest neighbors. For classification, pick the class with the most votes. How should we break ties?

The decision boundary implemented by 3NN The boundary is always the perpendicular bisector of the line between two points (Vornoi tesselation)

Regions defined by using various numbers of neighbors