ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Component Analysis (Review)
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The FIR Adaptive Filter The LMS Adaptive Filter Stability and Convergence.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
Visual Recognition Tutorial
Maximum likelihood (ML) and likelihood ratio (LR) test
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Maximum likelihood (ML)
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Maximum likelihood (ML) and likelihood ratio (LR) test
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
Computer vision: models, learning and inference
Introduction to Bayesian Parameter Estimation
Maximum likelihood (ML)
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Adaptive Noise Cancellation ANC W/O External Reference Adaptive Line Enhancement.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Model Inference and Averaging
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Overview Particle filtering is a sequential Monte Carlo methodology in which the relevant probability distributions are iteratively estimated using the.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Signal and Noise Models SNIR Maximization Least-Squares Minimization MMSE.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Definitions Random Signal Analysis (Review) Discrete Random Signals Random.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
BCS547 Neural Decoding. Population Code Tuning CurvesPattern of activity (r) Direction (deg) Activity
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Normal Equations The Orthogonality Principle Solution of the Normal Equations.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:
Autoregressive (AR) Spectral Estimation
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Introduction to Estimation Theory: A Tutorial
Univariate Gaussian Case (Cont.)
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Computacion Inteligente Least-Square Methods for System Identification.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Univariate Gaussian Case (Cont.)
12. Principles of Parameter Estimation
LECTURE 06: MAXIMUM LIKELIHOOD ESTIMATION
Probability Theory and Parameter Estimation I
LECTURE 11: Advanced Discriminant Analysis
Model Inference and Averaging
Ch3: Model Building through Regression
Chapter 2 Minimum Variance Unbiased estimation
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Computing and Statistical Data Analysis / Stat 7
LECTURE 09: BAYESIAN LEARNING
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
16. Mean Square Estimation
Presentation transcript:

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum Mean-Square Error Resources: ECE 8443: Chapters 2 and 3 Wiki: Bayes Estimator SP: ML Estimation Tutorial ECE 8443: Chapters 2 and 3 Wiki: Bayes Estimator SP: ML Estimation Tutorial URL:.../publications/courses/ece_8423/lectures/current/lecture_02.ppt.../publications/courses/ece_8423/lectures/current/lecture_02.ppt MP3:.../publications/courses/ece_8423/lectures/current/lecture_01.mp3.../publications/courses/ece_8423/lectures/current/lecture_01.mp3 LECTURE 02: OPTIMAL ESTIMATION PROCEDURES

ECE 8423: Lecture 02, Slide 1 Estimation theory plays a major role in the design of statistical signal processing systems. Straightforward procedures such as mean-square error have given way to more powerful techniques that exploit all known statistics about the problem (e.g., Bayes). The goal of this lecture is to provide a high- level overview (Pattern Recognition 101) of parameter estimation from a series of measurements for which the exact generating distributions are unknown. Estimation problems are often delineated into deterministic (or classical) parameter estimation and random (or Bayesian) parameter estimation. In classical estimation, observations are random and parameters are regarded as unknown constant (deterministic) values. In Bayesian parameter estimation, the parameters are also viewed as random, and our prior knowledge of their behavior is expressed through an a priori density. Classical estimation operates without prior information; Bayes estimation exploits this information and attempts to estimate it as part of the process. Three estimation procedures have been widely used:  Maximum A Posteriori (MAP) -- Bayesian  Maximum Likelihood (ML) -- Classical  Minimum Mean-Squared Error (MMSE) – actually Bayesian Introduction

ECE 8423: Lecture 02, Slide 2 Maximum A Posteriori (MAP) Estimation Suppose we are given a set of M samples from a discrete random variable, x(n), for n=0,1,…,M-1, and we need to estimate a parameter, , from this data. We assume the parameter is characterized by a probability density function, f(  ), that is called the a priori density. The probability density function f(  |x) is the probability density of  conditioned on the observed data, x, and is referred to as the a posteriori density for . The Maximum A Posteriori (MAP) estimate of  is the peak value of this density. The maximum can generally be found by:. We refer to the value of  that maximizes this equation as  MAP. We often maximize the log of the posterior:. Determination of f(  |x) is difficult. However, we use Bayes’ rule to write: where is the prior density of . Applying a logarithm, we can write the MAP solution as:

ECE 8423: Lecture 02, Slide 3 Maximum Likelihood (ML) Estimation The ML estimate is obtained by maximizing the density of the observed data, x, given . Let us define a likelihood function:. Note that we consider x a fixed set of observations. Also: The ML estimate is found by maximizing: The MAP and ML estimates are similar, though the MAP estimate takes into account prior information. ML estimates enjoy a number of important properties:  Sufficiency: If a sufficient statistics for  exists, then the ML estimate will be a sufficient statistic.  Efficiency: An estimator that attains the lower bound of the variance attainable by an unbiased estimator is referred to as fully efficient. If such a statistic exists, the ML estimate will be that statistic.  Gaussianity: ML estimates are asymptotically Gaussian. ML estimates are often a good start because no prior information is required. Once found, it can be used as an initial value for an MAP or Bayes solution.

ECE 8423: Lecture 02, Slide 4 Example – ML Estimate of a Noisy Measurement Consider a single measurement, x, consisting of the sum of a parameter, , and a random variable, w, which is zero-mean Gaussian with a variance : To construct the ML estimate, we form the likelihood function: The log likelihood function is: Differentiating with respect to  and equating to zero: Thus, the best estimate in the ML sense is the value of the raw data. This is intuitively satisfying because in the absence of any other information, the best thing we can do is to trust the data. The variance can be shown to be: Again, this makes sense, why?

ECE 8423: Lecture 02, Slide 5 Example – MAP Estimate The MAP estimate is obtained from: The log of the density is: The a priori density,, is given by: Taking the log: Substituting into the first equation, differentiating, and solving for :

ECE 8423: Lecture 02, Slide 6 Example – MAP Estimate (Cont.) The significance of this result gives us insight into the MAP estimate. The estimate is a weighted sum of the ML estimate and the prior mean. The ratio of the variances can be viewed as a measure of confidence in the a priori model of . The smaller the variance of , the less weight we give to the ML estimate, x. In the limit as, the MAP estimate is simply, the a priori mean. As the a priori variance increases, the MAP estimate converges to x. Exactly how we estimate the parameters of the a priori distribution, or how we deal with this information when it is unknown, will be discussed later in the course, and is discussed extensively in the Pattern Recognition course. This derivation can be extended to the important case of 1 parameter and M measurements:

ECE 8423: Lecture 02, Slide 7 Cramer-Rao Lower Bound (CRLB) If  is an (Lx1) unknown (deterministic) constant parameter vector, and if is any unbiased estimator of , then. The inequality means that the difference of the two matrices is positive definite (an alternate way of stating that the lower bound is J -1. The matrix, J, is known as the Fisher Information matrix: The CRLB provides a lower limit on the variance attainable with an unbiased estimator for an unknown parameter vector. An estimate which has this property is known as the Minimum Variance Unbiased Estimator. For the scalar parameter case, the CRLB can be written as: For our single sample example, the CRLB is which coincides with the ML estimate. The ML estimate achieves the CRLB even for finite M.

ECE 8423: Lecture 02, Slide 8 Minimum Mean-Squared Error (MMSE) MMSE estimators typically define a quadratic cost function: The cost function can be evaluated via the expected value: The joint density can be expanded: We can rewrite the cost function: Both the inner and outer integrals are positive everywhere, and the outer integral is unaffected by the choice of h(x). We can minimize the inner integral: This is done by differentiating with respect to h(x):

ECE 8423: Lecture 02, Slide 9 But the marginal integral is equal to 1: Hence: The MMSE estimator is obtained when h(x) is chosen as the expectation of  conditioned on the data. Note that unlike MAP and ML, this estimator requires the conditional mean value of the posterior density but does not rely on explicit knowledge of the density itself. The MMSE solution is often a nonlinear function of the data, though in certain cases, such as when the posterior is Gaussian, the equations are linear. Later we will invoke this to develop a technique called linear prediction used in adaptive filtering. Note that both MAP and MMSE produce estimates that are a function of the posterior density for . However, for MAP, the estimate is the peak of the posterior, while for MMSE, it is the mean value. The MMSE estimator can be extended to estimating a vector of parameters. It can also minimize something other than a quadratic form (L 2 norm). MMSE Estimation (Cont.)

ECE 8423: Lecture 02, Slide 10 Suppose the estimator is constrained to be a linear function. (Why?) For a single parameter, M observation problem: where the {h i ’} are weights chosen to minimize the mean-squared error: The minimization is performed by differentiating with respect to each h i : We can define the estimation error as: and write the minimization as: This equation states that the error is orthogonal to the data. This is known as the orthogonality principle and is a fundamental property of linear mean- squared error estimation (to be discussed later in greater detail). Linear MMSE Estimation

ECE 8423: Lecture 02, Slide 11 Given some signal, x(n), how do we estimate another signal, d(n), that might represent for example a noise-reduced version of x(n)? The estimate of the desired signal can be written as: where T{} represents a functional mapping of the data. We can invoke our four estimators:  MAP: minimize  ML: maximize  MMSE:  Linear MMSE: The last one is the least general, but easiest to implement, since it is essentially a linear filter. The others require explicit knowledge of the signal/parameter densities or conditional expectations. In the Pattern Recognition course, we learn about ways to estimate the distributions from the data as part of the overall estimation problem. Estimation of Signals

ECE 8423: Lecture 02, Slide 12 Adaptive signal processing requires estimating parameters from data. There are two classes of approaches to this problem, classical and Bayesian. We explored three approaches to parameter estimation: ML, MAP, MMSE, and MMSE. We briefly discussed the Cramer-Rao bound as a way to evaluate the quality of an estimator. We discussed two minimum mean-square error (MMSE) approaches: generalized and linear. We compared and contrasted these to MAP and ML. We also described a framework in which these estimators could be used to estimate a signal. We will expand on this later when we talk about adaptive filters and noise cancellation. Summary