Geology 5670/6670 Inverse Theory 6 Feb 2015 © A.R. Lowry 2015 Read for Mon 9 Feb: Menke Ch 5 (89-114) Last time: The Generalized Inverse; Damped LS The.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Tests of Static Asset Pricing Models
Face Recognition Ying Wu Electrical and Computer Engineering Northwestern University, Evanston, IL
Pattern Recognition and Machine Learning
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Integration of sensory modalities
Nonlinear Regression Ecole Nationale Vétérinaire de Toulouse Didier Concordet ECVPT Workshop April 2011 Can be downloaded at
The General Linear Model. The Simple Linear Model Linear Regression.
Visual Recognition Tutorial
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Some useful linear algebra. Linearly independent vectors span(V): span of vector space V is all linear combinations of vectors v i, i.e.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Minimaxity & Admissibility Presenting: Slava Chernoi Lehman and Casella, chapter 5 sections 1-2,7.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
G. Cowan 2011 CERN Summer Student Lectures on Statistics / Lecture 41 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability.
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Geology 5670/6670 Inverse Theory 26 Jan 2015 © A.R. Lowry 2015 Read for Wed 28 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares (   Statistics)
PATTERN RECOGNITION AND MACHINE LEARNING
1 More about the Sampling Distribution of the Sample Mean and introduction to the t-distribution Presentation 3.
Principles of Pattern Recognition
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
G. Cowan 2009 CERN Summer Student Lectures on Statistics1 Introduction to Statistics − Day 4 Lecture 1 Probability Random variables, probability densities,
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
Method of Least Squares. Least Squares Method of Least Squares:  Deterministic approach The inputs u(1), u(2),..., u(N) are applied to the system The.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2014 Professor Brandon A. Jones Lecture 26: Singular Value Decomposition.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
CY3A2 System identification1 Maximum Likelihood Estimation: Maximum Likelihood is an ancient concept in estimation theory. Suppose that e is a discrete.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
Geology 6600/7600 Signal Analysis 02 Sep 2015 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.
1 Introduction to Statistics − Day 3 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Brief catalogue of probability densities.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
Geology 5670/6670 Inverse Theory 20 Feb 2015 © A.R. Lowry 2015 Read for Mon 23 Feb: Menke Ch 9 ( ) Last time: Nonlinear Inversion Solution appraisal.
Geology 5670/6670 Inverse Theory 27 Feb 2015 © A.R. Lowry 2015 Read for Wed 25 Feb: Menke Ch 9 ( ) Last time: The Sensitivity Matrix (Revisited)
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Stats & Summary. The Woodbury Theorem where the inverses.
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
Geology 5670/6670 Inverse Theory 16 Mar 2015 © A.R. Lowry 2015 Last time: Review of Inverse Assignment 1 Expect Assignment 2 on Wed or Fri of this week!
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Geology 6600/7600 Signal Analysis 04 Sep 2014 © A.R. Lowry 2015 Last time: Signal Analysis is a set of tools used to extract information from sequences.
Geology 5670/6670 Inverse Theory 4 Feb 2015 © A.R. Lowry 2015 Read for Fri 6 Feb: Menke Ch 4 (69-88) Last time: The Generalized Inverse The Generalized.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2015 Professor Brandon A. Jones Lecture 26: Cholesky and Singular Value.
Computacion Inteligente Least-Square Methods for System Identification.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Lecture 1.31 Criteria for optimal reception of radio signals.
Probability Theory and Parameter Estimation I
Ch3: Model Building through Regression
Computing and Statistical Data Analysis / Stat 8
Singular Value Decomposition SVD
Integration of sensory modalities
OVERVIEW OF LINEAR MODELS
Introduction to Unfolding
Parametric Methods Berlin Chen, 2005 References:
Probabilistic Surrogate Models
Presentation transcript:

Geology 5670/6670 Inverse Theory 6 Feb 2015 © A.R. Lowry 2015 Read for Mon 9 Feb: Menke Ch 5 (89-114) Last time: The Generalized Inverse; Damped LS The Generalized Inverse uses Singular Value Decomposition to recast the problem as p non-zero eigenvalues & eigenvectors of G The resulting singular value decomposition of G is, p ≤ min(N, M) with pseudoinverse: minimizes both e T e and m T m. Solution variance can be reduced by setting small i = 0 (especially if i <  !) This leads to a fundamental trade-off between solution variance and model resolution…

So we have a tradeoff between resolution and variance: (solution variance) decreasing p (model irresolution) This tradeoff (degraded model resolution is required to get reduced solution variance) is an inherent limitation of all inverse problems…

Damped Least Squares (Menke § ) Suppose we have an over-determined problem that is ill-conditioned (i.e., M << 1 ) so the determinant of G + is close to zero. Can we reduce solution variance without throwing away parameters? Idea : Combine a minimization of e T e and m T m for the over-determined (least-squares) problem! Define a new objective function that combines residual length & solution length: To minimize set

Recall so: or: Thus, the pseudoinverse for damped least squares ( DLS ) is: The condition number for OLS is ; Identity: If eigenvalues of A are i, eigenvalues of A + kI are i + k So condition number for DLS is

The covariance matrix for DLS (assuming C  =   2 I ) Gives: As compared to OLS: The resolution matrix is now where:

(solution variance) increasing   2 (model irresolution) With the important difference that the dependence on parameters for which the solution is ill-conditioned is tapered instead of sharply cut-off. So resolution/variance curve is similar to that for the generalized inverse:

(solution variance) increasing   2 (model irresolution) Can minimize length of Can use a Bayesian statistical criterion (we’ll get to this later).   2 should be m. How do we choose an “optimal”   2 ? minimum length

Maximum Likelihood (Menke § ) Suppose we have data d with known probability density function (pdf): Here we are assuming data d depend much more strongly on parameters m than on any other possible parameters… The probability P that random variable X lies on x 1 ≤ X ≤ x 2 is so the probability of making observations within ±  of those we actually measured is We assume  very small so that

Method : Find the member of a family of distributions f(d | m) which maximizes the probability of “getting the d that we got” from among all possible m. The likelihood function L of m given d is and we want to maximize L as a function of m : Case 1 : Assume jointly normal, zero mean errors. Recall that Then: