Non-Parametric Learning Prof. A.L. Yuille Stat 231. Fall 2004. Chp 4.1 – 4.3.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Principles of Density Estimation
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Lecture 3 Nonparametric density estimation and classification
Sampling distributions of alleles under models of neutral evolution.
Use of moment generating functions. Definition Let X denote a random variable with probability density function f(x) if continuous (probability mass function.
Probability distribution functions Normal distribution Lognormal distribution Mean, median and mode Tails Extreme value distributions.
Pattern recognition Professor Aly A. Farag
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Mean Shift A Robust Approach to Feature Space Analysis Kalyan Sunkavalli 04/29/2008 ES251R.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
1 Def: Let and be random variables of the discrete type with the joint p.m.f. on the space S. (1) is called the mean of (2) is called the variance of (3)
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Chapter 4 (Part 1): Non-Parametric Classification
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Prénom Nom Document Analysis: Non Parametric Methods for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Overview of STAT 270 Ch 1-9 of Devore + Various Applications.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
Speech Technology Lab Ƅ ɜ: m ɪ ŋ ǝ m EEM4R Spoken Language Processing - Introduction Training HMMs Version 4: February 2005.
Approximations to Probability Distributions: Limit Theorems.
LECTURE UNIT 4.3 Normal Random Variables and Normal Probability Distributions.
Choosing Statistical Procedures
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Statistical Hypothesis Testing. Suppose you have a random variable X ( number of vehicle accidents in a year, stock market returns, time between el nino.
Probability distribution functions
1 Patch Complexity, Finite Pixel Correlations and Optimal Denoising Anat Levin, Boaz Nadler, Fredo Durand and Bill Freeman Weizmann Institute, MIT CSAIL.
1 CSI5388: Functional Elements of Statistics for Machine Learning Part I.
2. Bayes Decision Theory Prof. A.L. Yuille Stat 231. Fall 2004.
1 Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND STATISTICS FOR SCIENTISTS AND ENGINEERS Systems.
Lecture note for Stat 231: Pattern Recognition and Machine Learning 4. Maximum Likelihood Prof. A.L. Yuille Stat 231. Fall 2004.
Image Modeling & Segmentation Aly Farag and Asem Ali Lecture #2.
Perceptual and Sensory Augmented Computing Machine Learning WS 13/14 Machine Learning – Lecture 3 Probability Density Estimation II Bastian.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Intro. ANN & Fuzzy Systems Lecture 23 Clustering (4)
CY1B2 Statistics1 (ii) Poisson distribution The Poisson distribution resembles the binomial distribution if the probability of an accident is very small.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
5. Maximum Likelihood –II Prof. Yuille. Stat 231. Fall 2004.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CHAPTER 8: Nonparametric Methods Alpaydin transparencies significantly modified, extended and changed by Ch. Eick Last updated: March 4, 2011.
Sums of Random Variables and Long-Term Averages Sums of R.V. ‘s S n = X 1 + X X n of course.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
4.3 Probability Distributions of Continuous Random Variables: For any continuous r. v. X, there exists a function f(x), called the density function of.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 7 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
CS479/679 Pattern Recognition Dr. George Bebis
Ch.4: Probability and Statistics
STAT 311 REVIEW (Quick & Dirty)
4.3 Probability Distributions of Continuous Random Variables:
Random Variable.
Ch8: Nonparametric Methods
3(+1) classifiers from the Bayesian world
Outline Parameter estimation – continued Non-parametric methods.
Random Variable.
Nonparametric methods Parzen window and nearest neighbor
4.3 Probability Distributions of Continuous Random Variables:
LECTURE 16: NONPARAMETRIC TECHNIQUES
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Nonparametric density estimation and classification
Hairong Qi, Gonzalez Family Professor
Continuous Probability Distributions
Random Variables A random variable is a rule that assigns exactly one value to each point in a sample space for an experiment. A random variable can be.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Presentation transcript:

Non-Parametric Learning Prof. A.L. Yuille Stat 231. Fall Chp 4.1 – 4.3.

Parametric versus Non-Parametric Previous lectures on MLE learning assumed a functional form for the probability distribution. We now consider an alternative non- parametric method based on window function.

Non-Parametric It is hard to develop probability models for some data. Example: estimate the distribution of annual rainfall in the U.S.A. Want to model p(x,y) – probability that a raindrop hits a position (x,y). Problems: (i) multi-modal density is difficult for parametric models, (ii) difficult/impossible to collect enough data at each point (x,y).

Intuition Assume that the probability density is locally smooth. Goal: estimate the class density model p(x) from data Method 1: Windows based on points x in space.

Windows For each point x, form a window centred at x with volume Count the number of samples that fall in the window. Probability density is estimated as:

Non-Parametric Goal: to design a sequence of windows so that at each point x (f(x) is the true density). Conditions for window design: (i)increasing spatial resolution. (ii) many samples at each point (iii)

Two Design Methods Parzen Window: Fix window size: K-NN: Fix no. samples in window:

Parzen Window Parzen window uses a window function Example: (i) Unit hypercube: and 0 otherwise. (ii) Gaussian in d-dimensions.

Parzen Windows No. of samples in the hypercube is Volume The estimate of the distribution is: More generally, the window interpolates the data.

Parzen Window Example Estimate a density with five modes using Gaussian windows at scales h=1,0.5, 0.2.

Convergence Proof. We will show that the Parzen window estimator converges to the true density at each point x with increasing number of samples.

Proof Strategy. Parzen distribution is a random variable which depends on the samples used to estimate it. We have to take the expectation of the distribution with respect to the samples. We show that the expected value of the Parzen distribution will be the true distribution. And the expected variance of the Parzen distribution will tend to 0 as no. samples gets large.

Convergence of the Mean Result follows.

Convergence of Variance Variance:

Example of Parzen Window Underlying density is Gaussian. Window volume decreases as

Example of Parzen Window Underlying Density is bi-modal.

Parzen Window and Interpolation. In practice, we do not have an infinite number of samples. The choice of window shape is important. This effectively interpolates the data. If the window shape fits the local structure of the density, then Parzen windows are effective.