Likelihood-tuned Density Estimator Yeojin Chung and Bruce G. Lindsay The Pennsylvania State University Nonparametric Maximum Likelihood Estimator Nonparametric.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Expectation Maximization Expectation Maximization A “Gentle” Introduction Scott Morris Department of Computer Science.
Chapter 7. Statistical Estimation and Sampling Distributions
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Statistical Estimation and Sampling Distributions
Outline input analysis input analyzer of ARENA parameter estimation
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Visual Recognition Tutorial
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 10 Statistical Modelling Martin Russell.
Maximum likelihood (ML) and likelihood ratio (LR) test
Sampling Distributions
Point estimation, interval estimation
Part 2b Parameter Estimation CSE717, FALL 2008 CUBS, Univ at Buffalo.
Maximum likelihood (ML)
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Evaluating Hypotheses
2. Point and interval estimation Introduction Properties of estimators Finite sample size Asymptotic properties Construction methods Method of moments.
Statistical Background
Visual Recognition Tutorial
Kernel Methods Part 2 Bing Han June 26, Local Likelihood Logistic Regression.
July 3, A36 Theory of Statistics Course within the Master’s program in Statistics and Data mining Fall semester 2011.
Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning 1 Evaluating Hypotheses.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Today Today: Chapter 8, start Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Maximum likelihood (ML)
Sampling Distributions & Point Estimation. Questions What is a sampling distribution? What is the standard error? What is the principle of maximum likelihood?
 Deviation is a measure of difference for interval and ratio variables between the observed value and the mean.  The sign of deviation (positive or.
Chapter 6: Sampling Distributions
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
ASYMPTOTIC AND FINITE-SAMPLE DISTRIBUTIONS OF THE IV ESTIMATOR
EM and expected complete log-likelihood Mixture of Experts
Random Sampling, Point Estimation and Maximum Likelihood.
A statistical model Μ is a set of distributions (or regression functions), e.g., all uni-modal, smooth distributions. Μ is called a parametric model if.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Chapter 7 Point Estimation
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
HMM - Part 2 The EM algorithm Continuous density HMM.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
1 Standard error Estimated standard error,s,. 2 Example 1 While measuring the thermal conductivity of Armco iron, using a temperature of 100F and a power.
Lecture 2: Statistical learning primer for biologists
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
Lecture 5 Introduction to Sampling Distributions.
Histograms h=0.1 h=0.5 h=3. Theoretically The simplest form of histogram B j = [(j-1),j)h.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Bias-Variance Analysis in Regression  True function is y = f(x) +  where  is normally distributed with zero mean and standard deviation .  Given a.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
Stat 223 Introduction to the Theory of Statistics
Chapter 6: Sampling Distributions
Measurement, Quantification and Analysis
STATISTICS POINT ESTIMATION
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
Stat 223 Introduction to the Theory of Statistics
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Probabilistic Surrogate Models
Presentation transcript:

Likelihood-tuned Density Estimator Yeojin Chung and Bruce G. Lindsay The Pennsylvania State University Nonparametric Maximum Likelihood Estimator Nonparametric Maximum Likelihood Estimator (NPMLE) of a latent distribution Let the likelihood function for an individual observation be and the objective function be Theorem ([5]) Suppose ‘s are nonnegative and bounded. Then there exists a maximum likelihood estimator that is a discrete distribution with no more than D distinct points of support where D is the number of distinct x i. Introduction Summary We consider an improved density estimator which arises from treating the kernel density estimator as an element of the model that consists of all mixtures of the kernel, continuous or discrete. One can then "likelihood tune" the kernel density estimator by using it appropriately as the starting value in an EM algorithm. If we do so, then one EM step leads to a fitted density with higher likelihood than the kernel density estimator. The one step EM estimator can be written explicitly, and its bias is one order of magnitude smaller than the kernel estimator. In addition, the order of magnitude of the variance stays of the same order, so that the asymptotic mean square error can be reduced significantly. Compared with other important adaptive density estimators, we find that their biases are in the same order but our estimator is still superior, particularly when the density is small. Asymptotic Properties References Likelihood tuning procedure IDEA 1.Start from the uniform density function 2.Do several EM steps to update  Give information about where we need more mass. EM steps for Tuning the Density Estimator STEP 1 Update the derivative of with where STEP 2 Update the density estimator The fixed kernel density estimator can be expressed using the emprical distribution as the deviation from π old to π new Latent density of f With normal kernel, the first EM step gives and the second EM step gives where and. We call as the Likelihood Tuned Density Estimator. Asymptotic Bias of Asymptotic Variance of The Fixed Kernel Density Estimator The simplest form of kernel method of estimating density is the fixed kernel density estimator where. The bandwidth of kernel function is fixed for all data points and the point of estimation. With the normal kernel, it has Asymptotic Bias Asymptotic Variance.[1] The Adaptive Denstiy Estimator There have been many studies on improving kernel density estimators with adaptive bandwidths. Abramson [2] found that the adaptive density estimator with the bandwidth reduces the bias significantly. With normal kernel, it has Asymptotic Bias [3] Asymptotic Variance [4] With 100 observations from N(0,1), MISE of each density estimator is calculated by 500 replicates. Compared with the fixed kernel estimator and the adaptive density estimator with Abramson’s bandwidth, the likelihood-tuned estimator has smaller MISE with larger optimal bandwidth than the others. The likelihood tuned estimator has less MSE than the kernel estimator uniformly with respect to x. Although liklihood estimator has a little larger MSE than the adaptive one where its bias is 0, it is still better in the sparse area, which is promising especially in the large dimensional case. Fixed kernel density estimator ! [1] M.P. Wand and M.C. Jones, Kernel Smoothing, Chapman & Hall/CRC, [2] I.S. Abramson, On bandwidth variation in kernel estimates-a square root law, The Annals of Statis- tics 10 (1982), no. 4, 1217–1223. [3] BW Silverman, Density Estimation for Statistics and Data Analysis, Chapman & Hall/CRC, [4] G.R. Terrell and D.W. Scott, Variable kernel density estimation, The Annals of Statistics 20 (1992), no. 3, 1236–1265. [5] B.G. Lindsay, Mixture Models: Theory, Geometry, and Applications, Ims, Asymp. Bias of kernel and likelihood estimators are 0 Asymp. Bias of adaptive estimator is 0