Lecture 3 Probability and Measurement Error, Part 2.

Slides:



Advertisements
Similar presentations
Lecture 1 Describing Inverse Problems. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability.
Advertisements

Lecture 20 Continuous Problems Linear Operators and Their Adjoints.
Lecture 10 Nonuniqueness and Localized Averages. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 14 Nonlinear Problems Grid Search and Monte Carlo Methods.
A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
Environmental Data Analysis with MatLab Lecture 8: Solving Generalized Least Squares Problems.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Lecture 22 Exemplary Inverse Problems including Filter Design.
Lecture 18 Varimax Factors and Empircal Orthogonal Functions.
Lecture 7 Backus-Gilbert Generalized Inverse and the Trade Off of Resolution and Variance.
Environmental Data Analysis with MatLab
Visual Recognition Tutorial
Lecture 2 Probability and Measurement Error, Part 1.
Lecture 5 A Priori Information and Weighted Least Squared.
Environmental Data Analysis with MatLab Lecture 17: Covariance and Autocorrelation.
Lecture 19 Continuous Problems: Backus-Gilbert Theory and Radon’s Problem.
Lecture 15 Nonlinear Problems Newton’s Method. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 4 The L 2 Norm and Simple Least Squares. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Lecture 16 Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 17 Factor Analysis. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Lecture 21 Continuous Problems Fréchet Derivatives.
Lecture 24 Exemplary Inverse Problems including Vibrational Problems.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
G. Cowan Lectures on Statistical Data Analysis Lecture 2 page 1 Statistical Data Analysis: Lecture 2 1Probability, Bayes’ theorem 2Random variables and.
Lecture 12 Equality and Inequality Constraints. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
Adaptive Rao-Blackwellized Particle Filter and It’s Evaluation for Tracking in Surveillance Xinyu Xu and Baoxin Li, Senior Member, IEEE.
Lecture 5 Probability and Statistics. Please Read Doug Martinson’s Chapter 3: ‘Statistics’ Available on Courseworks.
Environmental Data Analysis with MatLab Lecture 3: Probability and Measurement Error.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
Environmental Data Analysis with MatLab Lecture 24: Confidence Limits of Spectra; Bootstraps.
Lecture 2 Probability and what it has to do with data analysis.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Lecture 11 Vector Spaces and Singular Value Decomposition.
Lecture 4 Probability and what it has to do with data analysis.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Lecture 3: Inferences using Least-Squares. Abstraction Vector of N random variables, x with joint probability density p(x) expectation x and covariance.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Environmental Data Analysis with MatLab Lecture 7: Prior Information.
Lecture II-2: Probability Review
Modern Navigation Thomas Herring
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Colorado Center for Astrodynamics Research The University of Colorado STATISTICAL ORBIT DETERMINATION Project Report Unscented kalman Filter Information.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.
Linear Regression Andy Jacobson July 2006 Statistical Anecdotes: Do hospitals make you sick? Student’s story Etymology of “regression”
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
PROBABILITY CONCEPTS Key concepts are described Probability rules are introduced Expected values, standard deviation, covariance and correlation for individual.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
University of Colorado Boulder ASEN 5070: Statistical Orbit Determination I Fall 2015 Professor Brandon A. Jones Lecture 14: Probability and Statistics.
Geology 5670/6670 Inverse Theory 20 Feb 2015 © A.R. Lowry 2015 Read for Mon 23 Feb: Menke Ch 9 ( ) Last time: Nonlinear Inversion Solution appraisal.
Geology 5670/6670 Inverse Theory 28 Jan 2015 © A.R. Lowry 2015 Read for Fri 30 Jan: Menke Ch 4 (69-88) Last time: Ordinary Least Squares: Uncertainty The.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Lecture 2 Probability and what it has to do with data analysis.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Canadian Bioinformatics Workshops
More about Posterior Distributions
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Probabilistic Surrogate Models
Presentation transcript:

Lecture 3 Probability and Measurement Error, Part 2

Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and Measurement Error, Part 2 Lecture 04The L 2 Norm and Simple Least Squares Lecture 05A Priori Information and Weighted Least Squared Lecture 06Resolution and Generalized Inverses Lecture 07Backus-Gilbert Inverse and the Trade Off of Resolution and Variance Lecture 08The Principle of Maximum Likelihood Lecture 09Inexact Theories Lecture 10Nonuniqueness and Localized Averages Lecture 11Vector Spaces and Singular Value Decomposition Lecture 12Equality and Inequality Constraints Lecture 13L 1, L ∞ Norm Problems and Linear Programming Lecture 14Nonlinear Problems: Grid and Monte Carlo Searches Lecture 15Nonlinear Problems: Newton’s Method Lecture 16Nonlinear Problems: Simulated Annealing and Bootstrap Confidence Intervals Lecture 17Factor Analysis Lecture 18Varimax Factors, Empircal Orthogonal Functions Lecture 19Backus-Gilbert Theory for Continuous Problems; Radon’s Problem Lecture 20Linear Operators and Their Adjoints Lecture 21Fréchet Derivatives Lecture 22 Exemplary Inverse Problems, incl. Filter Design Lecture 23 Exemplary Inverse Problems, incl. Earthquake Location Lecture 24 Exemplary Inverse Problems, incl. Vibrational Problems

Purpose of the Lecture review key points from last lecture introduce conditional p.d.f.’s and Bayes theorem discuss confidence intervals explore ways to compute realizations of random variables

Part 1 review of the last lecture

Joint probability density functions p(d) =p(d 1,d 2,d 3,d 4 …d N ) probability that the data are near d p(m) =p(m 1,m 2,m 3,m 4 …m M ) probability that the model parameters are near m

d1d1 d2d p Joint p.d.f. or two data, p(d 1,d 2 )

d1d1 d2d p means and

d1d1 d2d p 2σ12σ1 2σ12σ1 2σ22σ2 variances σ 1 2 and σ 2 2

d1d1 d2d p 2σ12σ1 2σ12σ1 2σ22σ2 θ covariance – degree of correlation

summarizing a joint p.d.f. mean is a vector covariance is a symmetric matrix diagonal elements: variances off-diagonal elements: covariances

data with measurement error data analysis process inferences with uncertainty error in measurement implies uncertainty in inferences

functions of random variables given p(d) with m=f(d) what is p(m) ?

given p(d) and m(d) then det ∂d 1 /∂m 1 ∂d 1 /∂m 2 … ∂d 2 /∂m 1 ∂d 2 /∂m 2 … ……… Jacobian determinant

multivariate Gaussian example N data, d Gaussian p.d.f. m=Md+v linear relationship M=N model parameters, m

given and the linear relation m=Md+v what’s p(m) ?

answer with

answer with also Gaussian rule for error propagation

holds even when M≠N and for non-Gaussian distributions

rule for error propagation holds even when M≠N and for non-Gaussian distributions memorize

example given given N uncorrelated Gaussian data with uniform variance σ d 2 and formula for sample mean i

[cov d ] = σ d 2 I and [cov m ] = σ d 2 MM T = σ d 2 N/N 2 = (σ d 2 /N)I = σ m 2 I or σ m 2 = (σ d 2 /N)

σ m = σ d /√N so error of sample mean decreases with number of data decrease is rather slow, though, because of the square root

Part 2 conditional p.d.f.’s and Bayes theorem

joint p.d.f. p(d 1,d 2 ) probability that d 1 is near a given value and probability that d 2 is near a given value conditional p.d.f. p(d 1 |d 2 ) probability that d 1 is near a given value given that we know that d 2 is near a given value

d1d1 d2d p Joint p.d.f.

d1d1 d2d p 2σ12σ1 Joint p.d.f. d 2 here d 1 centered here

d1d1 d2d p 2σ12σ1 Joint p.d.f. d 2 here d 1 centered here

so, to convert a joint p.d.f. p(d 1,d 2 ) to a conditional p.d.f.’s p(d 1 |d 2 ) evaluate the joint p.d.f. at d 2 and normalize the result to unit area

area under p.d.f. for fixed d 2

similarly conditional p.d.f. p(d 2 |d 1 ) probability that d 2 is near a given value given that we know that d 1 is near a given value

putting both together

rearranging to achieve a result called Bayes theorem

three alternate ways to write p(d 2 ) three alternate ways to write p(d 1 )

Important p(d 1 |d 2 ) ≠ p(d 2 |d 1 ) example probability that you will die given that you have pancreatic cancer is 90% (fatality rate of pancreatic cancer is very high) but probability that a dead person died of pancreatic cancer is 1.3% (most people die of something else)

Example using Sand discrete values d 1 : grain size S=small B=Big d 2 : weight L=Light H=heavy joint p.d.f.

univariate p.d.f.’s joint p.d.f.

univariate p.d.f.’s joint p.d.f. most grains are small most grains are light most grains are small and light

conditional p.d.f.’s if a grain is light it’s probably small

conditional p.d.f.’s if a grain is heavy it’s probably big

conditional p.d.f.’s if a grain is small it’s probabilty light

conditional p.d.f.’s if a grain is big the chance is about even that its light or heavy

If a grain is big the chance is about even that its light or heavy ? What’s going on?

Bayes theorem provides the answer

probability of a big grain given it’s heavy the probability of a big grain = probability of a big grain given it’s light + probability of a big grain given its heavy

Bayes theorem provides the answer only a few percent of light grains are big but there are a lot of light grains this term dominates the result

before the observation: probability that its heavy is 10%, because heavy grains make up 10% of the total. observation: the grain is big after the observation: probability that the grain is heavy has risen to 49.74% Bayesian Inference use observations to update probabilities

Part 2 Confidence Intervals

suppose that we encounter in the literature the result m 1 = 50 ± 2 (95%) and m 2 = 30 ± 1 (95%) what does it mean?

joint p.d.f. p(m 1,m 2 ) m 1 = 50 ± 2 (95%) and m 2 = 30 ± 1 (95%) compute mean and variance σ 1 2 univariate p.d.f. p(m 2 ) univariate p.d.f. p(m 1 ) compute mean and variance σ 2 2 2σ12σ1 2σ22σ2

m 1 = 50 ± 2 (95%) and m 2 = 30 ± 1 (95%) irrespective of the value of m 2, there is a 95% chance that m 1 is between 48 and 52, irrespective of the value of m 1, there is a 95% chance that m 1 is between 29 and 31,

So what’s the probability that both m 1 and m 2 are within 2σ of their means? That will depend upon the degree of correlation For uncorrelated model parameters, it’s (0.95) 2 = 0.90

m1m1 m2m2 m1m1 m2m2 m1m1 m2m2 m 1 = ± 2σ 1 m 2 = ± 2σ 2 m 1 = ± 2σ 1 and m 2 = ± 2σ 2

Suppose that you read a paper which states values and confidence limits for 100 model parameters What’s the probability that they all fall within their 2σ bounds?

Part 4 computing realizations of random variables

Why? create noisy “synthetic” or “test” data generate a suite of hypothetical models, all different from one another

MatLab function random() can do many different p.d.f’s

But what do you do if MatLab doesn’t have the one you need?

It requires that you: 1) evaluate the formula for p(d) 2) already have a way to generate realizations of Gaussian and Uniform p.d.f.’s One possibility is to use the Metropolis-Hasting algorithm

goal: generate a length N vector d that contains realizations of p(d)

steps: set d i with i=1 to some reasonable value now for subsequent d i+1 generate a proposed successor d’ from a conditional p.d.f. q(d’|d i ) that returns a value near d i generate a number α from a uniform p.d.f. on the interval (0,1) accept d’ as d i+1 if else set d i+1 = d i repeat

A commonly used choice for the conditional p.d.f. is here σ is chosen to represent the sixe of the neighborhood, the typical distance of d i+1 from d i

example exponential p.d.f. p(d)=½c exp(-|d|/c)

Histogram of 5000 realizations