Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

The Maximum Likelihood Method
Environmental Data Analysis with MatLab
Lecture 1 Describing Inverse Problems. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability.
Lecture 20 Continuous Problems Linear Operators and Their Adjoints.
Environmental Data Analysis with MatLab Lecture 21: Interpolation.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
ECE 8443 – Pattern Recognition LECTURE 05: MAXIMUM LIKELIHOOD ESTIMATION Objectives: Discrete Features Maximum Likelihood Resources: D.H.S: Chapter 3 (Part.
Lecture 22 Exemplary Inverse Problems including Filter Design.
Lecture 18 Varimax Factors and Empircal Orthogonal Functions.
Lecture 7 Backus-Gilbert Generalized Inverse and the Trade Off of Resolution and Variance.
Lecture 3 Probability and Measurement Error, Part 2.
Visual Recognition Tutorial
Lecture 2 Probability and Measurement Error, Part 1.
Lecture 5 A Priori Information and Weighted Least Squared.
Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.
Lecture 15 Nonlinear Problems Newton’s Method. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 4 The L 2 Norm and Simple Least Squares. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 17 Factor Analysis. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 21 Continuous Problems Fréchet Derivatives.
Lecture 24 Exemplary Inverse Problems including Vibrational Problems.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 8 Advanced Topics in Least Squares - Part Two -
Environmental Data Analysis with MatLab Lecture 5: Linear Models.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Lecture 3 Review of Linear Algebra Simple least-squares.
Lecture 12 Equality and Inequality Constraints. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 7 Advanced Topics in Least Squares. the multivariate normal distribution for data, d p(d) = (2  ) -N/2 |C d | -1/2 exp{ -1/2 (d-d) T C d -1 (d-d)
Maximum likelihood (ML) and likelihood ratio (LR) test
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Lecture 11 Vector Spaces and Singular Value Decomposition.
Linear and generalised linear models
Linear and generalised linear models
Basics of regression analysis
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Lecture II-2: Probability Review
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Physics 114: Exam 2 Review Lectures 11-16
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
G(m)=d mathematical model d data m model G operator d=G(m true )+  = d true +  Forward problem: find d given m Inverse problem (discrete parameter estimation):
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
6. Population Codes Presented by Rhee, Je-Keun © 2008, SNU Biointelligence Lab,
Geology 5670/6670 Inverse Theory 20 Feb 2015 © A.R. Lowry 2015 Read for Mon 23 Feb: Menke Ch 9 ( ) Last time: Nonlinear Inversion Solution appraisal.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
Estimating standard error using bootstrap
The Maximum Likelihood Method
Physics 114: Lecture 13 Probability Tests & Linear Fitting
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
The Maximum Likelihood Method
The Maximum Likelihood Method
Filtering and State Estimation: Basic Concepts
Kalman Filter: Bayes Interpretation
Presentation transcript:

Lecture 8 The Principle of Maximum Likelihood

Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and Measurement Error, Part 2 Lecture 04The L 2 Norm and Simple Least Squares Lecture 05A Priori Information and Weighted Least Squared Lecture 06Resolution and Generalized Inverses Lecture 07Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08The Principle of Maximum Likelihood Lecture 09Inexact Theories Lecture 10Nonuniqueness and Localized Averages Lecture 11Vector Spaces and Singular Value Decomposition Lecture 12Equality and Inequality Constraints Lecture 13L 1, L ∞ Norm Problems and Linear Programming Lecture 14Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15Nonlinear Problems: Newton’s Method Lecture 16Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17Factor Analysis Lecture 18Varimax Factors, Empircal Orthogonal Functions Lecture 19Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20Linear Operators and Their Adjoints Lecture 21Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture Introduce the spaces of all possible data, all possible models and the idea of likelihood Use maximization of likelihood as a guiding principle for solving inverse problems

Part 1 The spaces of all possible data, all possible models and the idea of likelihood

viewpoint the observed data is one point in the space of all possible observations or d obs is a point in S(d)

d2d2 d3d3 d1d1 O plot of d obs

d2d2 d3d3 d1d1 O d obs plot of d obs

now suppose … the data are independent each is drawn from a Gaussian distribution with the same mean m 1 and variance σ 2 (but m 1 and σ unknown)

d2d2 d3d3 d1d1 O plot of p(d)

d2d2 d3d3 d1d1 O cloud centered on d 1 =d 2 =d 3 with radius proportional to σ

now interpret … p(d obs ) as the probability that the observed data was in fact observed L = log p(d obs ) called the likelihood

find parameters in the distribution maximize p(d obs ) with respect to m 1 and σ maximize the probability that the observed data were in fact observed the Principle of Maximum Likelihood

Example

solving the two equations

usual formula for the sample mean almost the usual formula for the sample standard deviation

these two estimates linked to the assumption of the data being Gaussian-distributed might get a different formula for a different p.d.f.

L(m 1, σ) σ m1m1 maximum likelihood point example of a likelihood surface

d1d1 d2d2 d2d2 d1d1 p(d 1,,d 1 ) (A)(B) likelihood maximization process will fail if p.d.f. has no well-defined peak

Part 2 Using the maximization of likelihood as a guiding principle for solving inverse problems

linear inverse problem for with Gaussian-distibuted data with known covariance [cov d] assume Gm=d gives the mean d T

principle of maximum likelihood maximize L = log p(d obs ) minimize with respect to m T

principle of maximum likelihood maximize L = log p(d obs ) minimize This is just weighted least squares E = T

principle of maximum likelihood when data Gaussian-distributed solve Gm=d with weighted least squares with weighting of

special case of uncorrelated data each datum with a different variance [cov d] ii = σ di 2 minimize

errors weighted by their certainty

but what about a priori information?

probabilistic representation of a priori information probability that the model parameters are near m given by p.d.f. p A (m)

probabilistic representation of a priori information probability that the model parameters are near m given by p.d.f. p A (m) centered at a priori value

probabilistic representation of a priori information probability that the model parameters are near m given by p.d.f. p A (m) variance reflects uncertainty in a priori information

certain uncertain m1m1 m1m1 m2m2 m2m2

m1m1 m2m2

linear relationship approximation with Gaussian m1m1 m1m1 m2m2 m2m2

m1m1 m2m2 p=constant p=0

assessing the information content in p A (m) Do we know a little about m or a lot about m ?

Information Gain, S -S called Relative Entropy,

Relative Entropy, S also called Information Gain null p.d.f. state of no knowledge

Relative Entropy, S also called Information Gain uniform p.d.f. might work for this

probabilistic representation of data probability that the data are near d given by p.d.f. p A (d)

probabilistic representation of data probability that the data are near d given by p.d.f. p(d) centered at observed data d obs

probabilistic representation of data probability that the data are near d given by p.d.f. p(d) variance reflects uncertainty in measurements

probabilistic representation of both prior information and observed data assume observations and a priori information are uncorrelated

d obs model, m datum, d m ap Example of

the theory d = g(m) is a surface in the combined space of data and model parameters on which the estimated model parameters and predicted data must lie

the theory d = g(m) is a surface in the combined space of data and model parameters on which the estimated model parameters and predicted data must lie for a linear theory the surface is planar

the principle of maximum likelihood says maximize on the surface d=g(m)

d obs model, m datum, d m ap m est d pre d=g(m) position along curve, s p(s) (B) s max (A)

d obs model, m datum, d m est ≈m ap d pre d=g(m) position along curve, s p(s) s max

d pre ≈ d obs model, m datum, d m est d=g(m ) position along curve, s p(s) m ap s max (A) (B)

principle of maximum likelihood with Gaussian-distributed data Gaussian-distributed a priori information minimize

this is just weighted least squares with so we already know the solution

solve Fm=f with simple least squares

when [cov d]=σ d 2 I and [cov m]=σ m 2 I

this provides and answer to the question What should be the value of ε 2 in damped least squares? The answer it should be set to the ratio of variances of the data and the a priori model parameters

if the a priori information is Hm=h with covariance [cov h] A then the Fm=f becomes

Gm=d obs with covariance [cov d] Hm=h with covariance [cov h] A m est = (F T F) -1 F T d obs with the most useful formula in inverse theory