Steepest Descent Optimization

Slides:



Advertisements
Similar presentations
Lecture 23 Exemplary Inverse Problems including Earthquake Location.
Advertisements

Multi-source Least Squares Migration and Waveform Inversion
1 Numerical Solvers for BVPs By Dong Xu State Key Lab of CAD&CG, ZJU.
Least Squares example There are 3 mountains u,y,z that from one site have been measured as 2474 ft., 3882 ft., and 4834 ft.. But from u, y looks 1422 ft.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Jonathan Richard Shewchuk Reading Group Presention By David Cline
Lecture 21 Continuous Problems Fréchet Derivatives.
1cs542g-term Notes  Extra class this Friday 1-2pm  If you want to receive s about the course (and are auditing) send me .
Tutorial 12 Unconstrained optimization Conjugate gradients.
Advanced Seismic Imaging GG 6770 Variance Analysis of Seismic Refraction Tomography Data By Travis Crosby.
Advanced data assimilation methods with evolving forecast error covariance Four-dimensional variational analysis (4D-Var) Shu-Chih Yang (with EK)
Key Result Seismic CAT Scan vs Trenching.
Advanced Seismic Imaging GG 6770 Tomography Inversion Project By Travis Crosby.
12 1 Variations on Backpropagation Variations Heuristic Modifications –Momentum –Variable Learning Rate Standard Numerical Optimization –Conjugate.
Advanced Topics in Optimization
Monica Garika Chandana Guduru. METHODS TO SOLVE LINEAR SYSTEMS Direct methods Gaussian elimination method LU method for factorization Simplex method of.
CSE 245: Computer Aided Circuit Simulation and Verification
PETE 603 Lecture Session #29 Thursday, 7/29/ Iterative Solution Methods Older methods, such as PSOR, and LSOR require user supplied iteration.
CE 311 K - Introduction to Computer Methods Daene C. McKinney

9 1 Performance Optimization. 9 2 Basic Optimization Algorithm p k - Search Direction  k - Learning Rate or.
By Mary Hudachek-Buswell. Overview Atmospheric Turbulence Blur.
UNCONSTRAINED MULTIVARIABLE
MATH 685/ CSI 700/ OR 682 Lecture Notes Lecture 9. Optimization problems.
Collaborative Filtering Matrix Factorization Approach
Surface wave tomography: part3: waveform inversion, adjoint tomography
Linear(-ized) Inverse Problems
Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.
Matrices Matrices A matrix (say MAY-trix) is a rectan- gular array of objects (usually numbers). An m  n (“m by n”) matrix has exactly m horizontal.
2009/9 1 Matrices(§3.8)  A matrix is a rectangular array of objects (usually numbers).  An m  n (“m by n”) matrix has exactly m horizontal rows, and.
Ch. 3: Geometric Camera Calibration
CSE 245: Computer Aided Circuit Simulation and Verification Matrix Computations: Iterative Methods I Chung-Kuan Cheng.
Module #9: Matrices Rosen 5 th ed., §2.7 Now we are moving on to matrices, section 7.
Discrete Mathematics 1 Kemal Akkaya DISCRETE MATHEMATICS Lecture 16 Dr. Kemal Akkaya Department of Computer Science.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
Seismological Analysis Methods Receiver FunctionsBody Wave Tomography Surface (Rayleigh) wave tomography Good for: Imaging discontinuities (Moho, sed/rock.
Migration Velocity Analysis 01. Outline  Motivation Estimate a more accurate velocity model for migration Tomographic migration velocity analysis 02.
Variations on Backpropagation.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Migration Velocity Analysis of Multi-source Data Xin Wang January 7,
ECE 530 – Analysis Techniques for Large-Scale Electrical Systems
Fast 3D Least-squares Migration with a Deblurring Filter Wei Dai.
Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.
Zero-Offset Data d = L o ò r ) ( g = d dr r ) ( g = d
The Inverse of a Square Matrix
Computational Optimization
Quasi-Newton Methods Problem: SD, CG too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) QN:
CSE 245: Computer Aided Circuit Simulation and Verification
Iterative Non-Linear Optimization Methods
CS5321 Numerical Optimization
Non-linear Least-Squares
Conjugate Gradient Problem: SD too slow to converge if NxN H matrix is ill-conditioned. SD: dx = - g (slow but no inverse to store or compute) CG: dx =
Collaborative Filtering Matrix Factorization Approach
Variations on Backpropagation.
Outline Single neuron case: Nonlinear error correcting learning
CSE 245: Computer Aided Circuit Simulation and Verification
Overview of Multisource and Multiscale Seismic Inversion
LSM with Sparsity Constraints
Overview of Multisource and Multiscale Seismic Inversion
Introduction to Scientific Computing II
Introduction to Scientific Computing II
~ Least Squares example
Integrated high-resolution tomography
[MATRICES ].
~ Least Squares example
Variations on Backpropagation.
Performance Optimization
Steepest Descent Optimization
[MATRICES ].
Conjugate Gradient Optimization
Presentation transcript:

Steepest Descent Optimization

Outline Regularized Newton Method Trust Region Method for Line Search Solving Linear System of Equations Traveltime Tomography Conjugate Gradient Method

Problem: Ill-conditioned functional f(x) X1 X2 Examples: 1). Many models fit the same data 2). Seismic Data with short src-rec offset insensitive to deep part of model 3). Seismic Data from shallow loud vs soft from deep part of model 4). More unknowns than equations -> non-unique solution 5). Traveltime tomography LVZ

Tomography Results Tomographic Image Z (m) 400 Z (m) 400 2500 X (m) 400 Z (m) m/s Final smoothing operator: 50m in X , 25m in Z 400 Z (m) Ray Path Image (Limited Offset Limits Resolution Depth) 2500 X (m) # of Ray Final smoothing grid size is 10x5 (50m in X , 25m in Z)

Seismic Refraction Data Line 1 – Intermediate Models 1 & 2 and Final Model (Schedule 1-3) Intermediate Model 1 Schedule 1 Intermediate Model 2 Schedule 2 Final Model Schedule 3

Given: f(x) ~ f(x0) + dx g + 1/2dx H dx + ½ l dx G dx Solution: Regularized Newton Given: f(x) ~ f(x0) + dx g + 1/2dx H dx + ½ l dx G dx Damping parameter > 0 T T T Misfit function Penalty function e= (Ls-t)’(Ls-t) = t’t - s’L’t + s’L’Ls 2 Traveltime Tomography Gdx=Idx Gdx ~ dx Gdx ~ D dx 2 Find: stationary point x* s.t. f(x*)=0 D Soln: Newton’s Method x = x – [H + l G] g (k) -1 (k+1) (k) a (k) f(x) X1 X2 .02 max(H_ij) l Iteration number

f(x) ~ f(x0) + dx g + 1/2dx H dx + ½ l dx G dx Solution: Regularized Newton Find: stationary point x* s.t. f(x*)=0 Soln: Newton’s Method x = x – [H + l G] g (k+1) (k) (k) -1 (k) a .02 max(H_ij) f(x) X1 X2 l Iteration number Choosing l SD->Levenburg-Marquardt (G=I) ->Newton

f(x) ~ f(x0) + dx g + 1/2dx H dx + ½ l dx G dx Solution: Regularized Newton Find: stationary point x* s.t. f(x*)=0 Soln: Newton’s Method x = x – [H + l G] g (k+1) (k) (k) -1 (k) a .02 max(H_ij) f(x) X1 X2 l Iteration number If Hij = Hii dij then Regularized SD x = x – g (k+1) (k) (k) [H + l G] i i i (k) ii

Outline Regularized Newton Method Trust Region Method for Line Search Solving Linear System of Equations Traveltime Tomography Conjugate Gradient Method

Soln: Let f(x) ~ -x g + 1/2x H x Solving Square Linear Systems by SD Given: H square matrix with SPD s.t. Hx=g Find: x by S.D. Soln: Let f(x) ~ -x g + 1/2x H x Square SPD T T D Step 1: Set f(x)=0 x = x – [H x - g] (k+1) (k) a Step 2: Iterative Steepest Descent

Outline Regularized Newton Method Trust Region Method for Line Search Solving Linear System of Equations Traveltime Tomography Conjugate Gradient Method Rectangular & Regularization

Soln: Let f(x) ~ (H x-g) (Hx-g) Solving Rectangular Linear Systems by SD Given: H rectangular matrix s.t. Hx=g Find: x by S.D. Soln: Let f(x) ~ (H x-g) (Hx-g) Previous strategy won’t work f(x) ~ -x’ g + 1/2x’ H x T 1/2 Step 1: Set f(x)=0 D Step 2: Iterative Steepest Descent Square SPD x = x – H(H x - g) (k+1) (k) a T (k) residual

Soln: Let f(x) ~ (H x-g) (Hx-g) + l/2 x G x Solving Rectangular Linear Systems by Regularized SD Given: H rectangular matrix s.t. Hx=g Find: x by Regularized S.D. Soln: Let f(x) ~ (H x-g) (Hx-g) + l/2 x G x T T 1/2 Step 1: Set f(x)=0 D Step 2: Iterative Steepest Descent x = x – [H(H x - g) + l Gx (k+1) (k) a T (k) (k) Gradient or residual Adjoint applied to residual (diff. between pred. & observed) Migration of residual

Solving Rectangular Linear Systems by Regularized SD Given: H rectangular matrix s.t. Hx=g x = x – [H(H x - g) + l Gx (k+1) (k) a T (k) 1 1 4 5 2 x1 x2 =

Outline Regularized Newton Method Trust Region Method for Line Search Solving Linear System of Equations Traveltime Tomography Conjugate Gradient Method Rectangular & Regularization& Scaling

Solving Rectangular Linear Systems by Regularized SD with Scaling Given: H rectangular matrix s.t. Hx=g ill-conditioned Let CH H x = Cg s.t. C approximates inverse H H T T x = x – [CH( H x - g) + l Gx Soln: (k+1) (k) (k) a T 1 1 4 5 2 x1 x2 =

Solving Rectangular Linear Systems by Regularized SD with Scaling MATLAB Code f(x) X1 X2 x = x – [CH( H x - g) + l Gx (k+1) (k) T (k)

Outline Regularized Newton Method Trust Region Method for Line Search Solving Linear System of Equations Traveltime Tomography Conjugate Gradient Method

Ray Based Tomography t = L s ~ s - L’dt 1. Modeling: t = Ls ij j ith ray Problem: L is a function of s so this is a non-linear set of equations! L ij 1. Modeling: t = Ls jth cell 2. Linearize: t = Ls subtract t’=L’s’ t-t’ = Ls-L’s’ ~ L(s-s’) dt = L ds 3. Find m that minimizes e=||t-Ls|| + l penalty 2 4. Solve: ds = [L’L] L’dt -1 5. Iterate: s = s - [L’L] L’dt -1 (k+1) (k) Steepest Descent Step length ~ s - L’dt (k) a

Ray Based Tomography t = L s ds ~ L dt 4. Iterate: s = s - [L’L] L’dt ith ray t = L s i ij j s g L ij 4. Iterate: s = s - [L’L] L’dt -1 (k+1) (k) ~ s - L’dt (k) jth slowness ds ~ L dt j ij i Smearing residuals that visit jth cell Note: We never store matrix. We simply compute a row of segment lengths (i.e., trace a ray) and then do a dot product between that ith row vector and the column vector dt to get the ith update to ds. Cots of each iteration is that of a matrix-vector mulitply O(N*N) rather than O(N*N*N). jth cell

Ray Based Tomography ds ~ L dt jth slowness Smearing residuals ij i Smearing residuals that visit jth cell jth cell Diagonal Dominance Preconditioning is using an approximate inverse Regularized Steepest Descent with Preconditioning Iterative Regularized Steepest Descent Soln.: small memory, no matrix inverse Smearing weighted residuals that visit jth cell

Multiscale Traveltime Inversion # picks Line 1 – Intermediate Models 1 & 2 and Final Model # unknowns (Schedule 1-3) Coarse-grain Model M=3N>N Intermediate Model 1 Schedule 1 Intermed.-grain Model Intermediate Model 2 Schedule 2 Finegrain Model Final Model Schedule 3 dx < l/4

Best Resolved Features Perpendicular to Ray Note: Anomaly can be moved laterally between wells along ray and still explain data. But anomaly is restricted vertically to explain data Where is Anomaly? Time

Transmission Fresnel Zone

Transmission Fresnel Zone

Transmission Fresnel Zone Fresnel Volume T/2 L

Best Resolved Features Perpendicular to Ray { Diffraction Ray Snell Ray Wavepath Time

Best Resolved Features Perpendicular to Ray Time

Summary t = L s ds ~ L dt = Modeling t = Ls Adjoint Modeling s = L’t j ij j Modeling t = Ls Note: We sum over model space variable j ith ray Sum weighted slowness Along ith ray L ij ds ~ L dt j ij i Adjoint Modeling s = L’t Note: We sum over data space variable i rays ith ray Sum weighted residuals For rays that visit jth cell

Outline Regularized Newton Method Trust Region Method for Line Search Solving Linear System of Equations Traveltime Tomography Conjugate Gradient Method

Conjugate Gradient g’ = f(x*)=0 ? D dx Quasi-Newton Condition: g’ – g = Hdx’ (1) dx’ g’ = f(x*)=0 ? D x* g dx’ dx’ dx’ Kiss point dx For dx’ at the bullseye x*, g’=0 so eqn. 1 becomes, after multiplying by dx Conjugacy Condition: 0 = dxHdx’ (2) x = x – a p (where p is conjugate to previous direction) (3)

Conjugate Gradient dx Quasi-Newton Condition: g’ – g = Hdx’ (1) Conjugacy Condition: 0 = dxHdx’ (2) x = x – a p (where p is conjugate to previous direction and a linear combo of dx & g) (3) For i = 1:nit 0 = dxH(dx + b g) Solve for b find b find a p= dx + bg x* g dx’ dx’ = dx + ap dx= dx’ Kiss point dx x=x+ dx’ end