Advanced Preconditioning for Generalized Least Squares Recall: To stabilize the inversions, we minimize the objective function J where where  is the.

Slides:



Advertisements
Similar presentations
Chapter Outline 3.1 Introduction
Advertisements

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: The Linear Prediction Model The Autocorrelation Method Levinson and Durbin.
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Least squares CS1114
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
AGC DSP AGC DSP Professor A G Constantinides©1 Modern Spectral Estimation Modern Spectral Estimation is based on a priori assumptions on the manner, the.
Lecture 3 Probability and Measurement Error, Part 2.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2011 –47658 Determining ODE from Noisy Data 31 th CIE, Washington.
GG450 April 22, 2008 Seismic Processing.
Aspects of Conditional Simulation and estimation of hydraulic conductivity in coastal aquifers" Luit Jan Slooten.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Lecture 5 A Priori Information and Weighted Least Squared.
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
ECIV 201 Computational Methods for Civil Engineers Richard P. Ray, Ph.D., P.E. Error Analysis.
Lecture 21 Continuous Problems Fréchet Derivatives.
Lecture 6 Resolution and Generalized Inverses. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
x – independent variable (input)
Function Optimization Newton’s Method. Conjugate Gradients
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Curve-Fitting Regression
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
The Basics of Earthquake Location William Menke Lamont-Doherty Earth Observatory Columbia University.
Lecture 11 Vector Spaces and Singular Value Decomposition.
Lecture 11 Multivariate Regression A Case Study. Other topics: Multicollinearity  Assuming that all the regression assumptions hold how good are our.
Linear and generalised linear models
Single station location Multiple station location
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Linear and generalised linear models
Basics of regression analysis
Radial Basis Function Networks
So where does the ADAPS optimization method fit into this framework? The seismic trace equation does not fit the linear definition, since the values on.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Surface wave tomography: part3: waveform inversion, adjoint tomography
Making Models from Data A Basic Overview of Parameter Estimation and Inverse Theory or Four Centuries of Linear Algebra in 10 Equations Rick Aster Professor.
1 Woodhouse and Dziewonski, 1986 Inversion (part I, history & formulation): 1. Earth is highly heterogeneous 2. The pattern can be roughly quantified.
Body Waves and Ray Theory
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Surface wave tomography : 1. dispersion or phase based approaches (part A) Huajian Yao USTC April 19, 2013.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
MEGN 537 – Probabilistic Biomechanics Ch.7 – First Order Reliability Methods Anthony J Petrella, PhD.
Physics 114: Exam 2 Review Lectures 11-16
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Quality of Curve Fitting P M V Subbarao Professor Mechanical Engineering Department Suitability of A Model to a Data Set…..
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
Modern Navigation Thomas Herring
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Curve-Fitting Regression
Motion Segmentation By Hadas Shahar (and John Y.A.Wang, and Edward H. Adelson, and Wikipedia and YouTube) 1.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
GG 313 Geological Data Analysis Lecture 13 Solution of Simultaneous Equations October 4, 2005.
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
Lecture 26 Molecular orbital theory II
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Seismological Analysis Methods Receiver FunctionsBody Wave Tomography Surface (Rayleigh) wave tomography Good for: Imaging discontinuities (Moho, sed/rock.
Machine Learning 5. Parametric Methods.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Seismology Part II: Body Waves and Ray Theory. Some definitions: Body Waves: Waves that propagrate through the "body" of a medium (in 3 dimensions) WRONG!
Problems in solving generic AX = B Case 1: There are errors in data such that data cannot be fit perfectly (analog: simple case of fitting a line while.
The accuracy of averages We learned how to make inference from the sample to the population: Counting the percentages. Here we begin to learn how to make.
Regularized Least-Squares and Convex Optimization.
Chapter 7. Classification and Prediction
Unfolding Problem: A Machine Learning Approach
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Unfolding with system identification
Presentation transcript:

Advanced Preconditioning for Generalized Least Squares Recall: To stabilize the inversions, we minimize the objective function J where where  is the damping or regularization parameter. Drawback: This method is often labeled “Norm damping” which can make the sum of model coefficients small, but does not specifically reduce the the gradients or changes for neighboring elements of the solution vector. Obvious improvement: minimize the gradient or smoothness (equivalent to 1 st & 2 nd derivative) between 3 neighboring elements by minimizing the following cost (objective) function, where  is the Gradient/smoothness damping coefficient, L is a constant matrix with elements close to diagonal that reflect the “finite difference” (derivative) of model coefficients. (Tikhonov Regularization)

(1) Newton Forward A crude way to estimate the derivative at a point x i is by a forward difference operator (h is x spacing) : (2) Newton Backward estimate the derivative at a point x i by a backward difference operator : Calculation of L matrix (a matrix that acts on model vector and returns a vector containing second derivatives of model coefficients) To find elements to this matrix, let’s start with 1 st derivative of a function f (3) Realizing that the derivative approximations above are really derivatives at (i+h/2) and (i- h/2), we can rewrite

This is what is often called “1 st -order finite difference” form, where h is the x-spacing or index spacing, depending on application (it is assumed constant), so if h=1 (index change from one to another element of filter), using Forward Newton we have Note: This is not a square matrix! By minimizing first derivative with model coef f, But after doing L T L, it becomes a square matrix with the same dimension as C T C

4 Now, the second derivative can be estimated from these two first-order ones at index (i), if h=1 (index change from one to another element of filter), we have So the L matrix for smoothness (2 nd derivative of model parameter) can be computed as follows

Resolution of an Inverse Problem Resolution of a given inverse problem is about how well a given model (assume we know already, even though we don’t for most problems) can be recovered by the inversion procedure (with the damping ---- norm, gradient, or smoothness constraints). Two ways to assess model resolution for Generalized Least Squares (example below is for normal damping): 1. Through analysis of Resolution Matrix 2. Through simulated inversion procedure of known model predictions. Resolution Matrix: Suppose then

Now, of course, to get stable solutions to inverse problem we regularize and seek: If things work well, we are hoping that the following approximate relationship holds (R is resolution Matrix): Reality: If there is damping, R is not the identify matrix. Even if there is no damping, finding the inverse is only a least-squares estimate and therefore the above is Not true! That said, R gives us a way to assess how close we are in finding the exact solution. Key here: We must have a way of obtaining that q matrix, which is often not given by various inversion procedures. SVD comes to rescue!

7 2 nd way to assess resolution: Checkerboard Tests Basic Procedure: 1.Make the solution vector look like a known, simple pattern (‘checkerboards’ are used in testing the Earth’s structure in 2D or 3D, while square-wave looking signals (on and off) are used for 1D problems) 2.Using this known, hypothetical solution to make predictions of data. Assume these computed values as parts of the data vector (right-hand side). In other words, for any A X = B problem, assume X’ to have the following pattern 1D 2D 3D Then make predictions B’ based on this geometrical solution.

Continued: Checkerboard Test Procedures 3.One can use B’ as the data vector, or add some random noise to B’ to simulate imperfect data (where the level of noise should reflect that of the real data vector) 4. Solve the problem AX’’=B’ where A is the same as before (with all sorts of operations like norm & roughness regularizations). For fairness, make sure it is same procedure and matrix as you would find the real solution (which, does not look like checkerboard, one that we hope to assess the accuracy and resolution). Find X’’. 5.The similarity (amplitude, phase) between X’ (input checkerboard) and X’’ (recovered checkerboard under all conditions of the real data inversion) provides an effective measure of the resolution the inversion problem using the damping/regularization (or Sing. Value removal) that you used for the real data.

9 9 Resolution Test Idea: 1.Introduce a spherical harmonic pattern 2.shoot rays through that model (must be same rays as the real data set), get a set of fake observations 3.Based on the ray geometries and this hypothetical data set, invert for the structure back. See if the recovered checkerbord agrees with input model. Issue: This is the minimum requirement, not a sufficient one. An inversion procedure that does not produce a good checkerboard has no merit, but one that does still may or may not be correct. Surface wave velocity structure of Alberta: Shen & Gu, 2012

10

11 Hypothesis Test Based on the tomographic map of the final model, generate a simple representation of it, then try to recover this simple representation with the same inversion/damping criterion and the same data set. In principle, this is not too different from checkerboard test. Advantage over checkerboard: more relevant scales and sizes of the anomaly to resolve. Sometime used in combination with standard checkerboard. Atlas &Eaton (2006), on structure beneath Great Lakes Standard Checkerboard Test

12 Hypothesis Testing for the same region

13 Another example of hypothesis testing Rawlinson et al., 2010

14 Jackknifing/Bootstrap Test Throw in randomly selected subsets of the data, recover a model and compare with one that uses the entire data set. The consistency is a good indication of the data & method resolution. To a great extent, the models produced by various researchers of the same structure could be considered an in-situ test as they represents results from different subsets of the data (with overlaps) and infer similar but also different structures. The stability of the results (without considerations of modeling strategies) are indication of resolution. Rawlinson et al., 2010

Alternative Solution: Monte-Carlo Method and Genetic Algorithms: Random walk but with probalistic models. A more probable solution (say a solution that decreases error) would have a greater chance of survival for the next step. To speed up the solution, can use derivatives of the models and residuals as a guide. But nothing is absolute, so that one won’t easily get trapped into a local minimum. These are forward approaches to solving an inverse problem. The Monte-Carlo approach is somewhat “blind” in finding a final solution that produce a goodness of fit. GA is more scientific using the law of “natural selection” to find models with better “genes” in an iterative way. Radomness An example of finding area of a region by randomly shooting darts to dartboard

Another Example of Monte Carlo: Observed seismogram (top) near Italy. The bottom traces are results of Monte-Carlo fit (by varying the seismic structure randomly within certain limits at depths). In a strict sense, the Monte-Carlo approach is really forward fitting rather than inversions!! Okeler, Gu et al., 2009, GJI.

Resulting seismic (shear velocity) structure with depth. Some “training” must be done to impose reasonable bounds to the randomly varying seismic speeds. Statistics will help quantifying the amount of variations and obtain the “average” structure.

Earthquake Location Inversion (highly nonlinear) Epicenter Station i (x i, y i, 0) Earthquake Focus (x, y, z) Suppose station i detects an earthquake at arrival time d’ i which depends on origin time t 0 and travel time between source and station T(x, x i ) d’ i = t 0 + T(x, x i ) m=(m1, m2, m3, m4) = (T, x, y, z) Nonlinear Problems Solutions: (1) use Monte-carlo to explore the answer (2) use Iterative procedures to linearize

19 Forward approach: (1) make an educated guess where this earthquake may have occurred based on travel time curve (2) Assume an earth structure model, compute predicted travel time d i_pred = F i (m) (3) Compute residual between actual and predicted time: r i =d’ i - F i (m) (4) Grid search for m such that tolerance < SUM |r i | (L1 norm) or tolerance < SUM (r i ) 2 (L2 norm) Inversion approach: (1) Again, assume a good “educated” guess of location and time, get a model m 0 (2) Setup the problem m = m0 +  m

20 In order to minimize residuals, we seek  m such that Now, this becomes a linear AX = b problem. In other words, through iterations, we have obtained a linear problem involving PERTURBATIONS of model parameters.