Introduction and Motivation Approaches for DE: Known model → parametric approach: p(x;θ) (Gaussian, Laplace,…) Unknown model → nonparametric approach Assumes.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Probability and Maximum Likelihood. How are we doing on the pass sequence? This fit is pretty good, but… Hand-labeled horizontal coordinate, t The red.
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Introduction to Non Parametric Statistics Kernel Density Estimation.
Probabilistic models Haixu Tang School of Informatics.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
CS479/679 Pattern Recognition Dr. George Bebis
The Estimation Problem How would we select parameters in the limiting case where we had ALL the data? k → l  l’ k→ l’ Intuitively, the actual frequencies.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Model assessment and cross-validation - overview
Visual Recognition Tutorial
EE-148 Expectation Maximization Markus Weber 5/11/99.
Prof. Ramin Zabih (CS) Prof. Ashish Raj (Radiology) CS5540: Computational Techniques for Analyzing Clinical Data.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Maximum likelihood (ML) and likelihood ratio (LR) test
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Stat 321 – Lecture 26 Estimators (cont.) The judge asked the statistician if she promised to tell the truth, the whole truth, and nothing but the truth?
Maximum-Likelihood estimation Consider as usual a random sample x = x 1, …, x n from a distribution with p.d.f. f (x;  ) (and c.d.f. F(x;  ) ) The maximum.
Maximum Likelihood (ML), Expectation Maximization (EM)
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
Statistical analysis and modeling of neural data Lecture 4 Bijan Pesaran 17 Sept, 2007.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
PATTERN RECOGNITION AND MACHINE LEARNING
Random Sampling, Point Estimation and Maximum Likelihood.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Lab 3b: Distribution of the mean
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.
INTRODUCTION TO Machine Learning 3rd Edition
Real-Time Tracking with Mean Shift Presented by: Qiuhua Liu May 6, 2005.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Machine Learning 5. Parametric Methods.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Univariate Gaussian Case (Cont.)
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Chapter 15: Exploratory data analysis: graphical summaries CIS 3033.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.
Estimating standard error using bootstrap
Univariate Gaussian Case (Cont.)
Probability Distributions
CS479/679 Pattern Recognition Dr. George Bebis
12. Principles of Parameter Estimation
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Ch8: Nonparametric Methods
Parameter Estimation 主講人:虞台文.
Nonparametric Density Estimation
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
10701 / Machine Learning Today: - Cross validation,
LECTURE 09: BAYESIAN LEARNING
LECTURE 07: BAYESIAN ESTIMATION
Parametric Methods Berlin Chen, 2005 References:
Learning From Observed Data
Hairong Qi, Gonzalez Family Professor
12. Principles of Parameter Estimation
Data Exploration and Pattern Recognition © R. El-Yaniv
Probabilistic Surrogate Models
Maximum Likelihood Estimation (MLE)
Presentation transcript:

Introduction and Motivation Approaches for DE: Known model → parametric approach: p(x;θ) (Gaussian, Laplace,…) Unknown model → nonparametric approach Assumes only smoothness Lets the data speak for itself Only for Continuous random variables in this presentation.

Histogram Estimator Simplest definition: Uniform intervals: [x o +mh,x o +(m+1)h) Estimator:

Histogram Estimator Drawbacks Choice of the origin greatly affects the estimate. Produces a Discontinuous estimate. Not very accurate. One reason: Asymmetric influence (unnatural). Let’s solve this! Observation Point Samples

Naïve Estimator Uses intervals centered at the observation points: Is it a density? x w(x) Yes, since it can be written as:

Naïve Estimator No need to choose an origin, only a width h. Better, but still not continuous. Reason: Influence of the points decays abruptly: x w(x) x K(x) This is more natural!

Kernel Estimator Definition: or x K(x) Properties of the Kernel K (typically): - it’s a pdf - smooth (inherited by the estimate) - centered at the origin and decaying fast - even (alternative interpretation) 2h opt h opt h opt /2

Maximum Likelihood (ML) Sample Criterion Model Score Likelihood is the probability of the Sample (S) given a model (f ): Maximizing Likelihood is equivalent to Minimizing Entropy. if the samples are independent. or taking logs: this is the Empirical Entropy

Kernel Estimator: Choosing h Maximum Likelihood (ML): Problem when h → 0 Reason: same sample for training and validation. Solution: have different samples (or split the current sample). Intelligent way to do it: Leave-One-Out Cross-validation

Kernel Estimator: one example