Understanding the 3DVAR in a Practical Way

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Assimilation Algorithms: Tangent Linear and Adjoint models Yannick Trémolet ECMWF Data Assimilation Training Course March 2006.
The Inverse Regional Ocean Modeling System:
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
The General Linear Model. The Simple Linear Model Linear Regression.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Linear Algebraic Equations
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Maximum likelihood (ML) and likelihood ratio (LR) test
Factor Analysis Purpose of Factor Analysis
458 Interlude (Optimization and other Numerical Methods) Fish 458, Lecture 8.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
Motion Analysis (contd.) Slides are from RPI Registration Class.
Curve-Fitting Regression
Maximum likelihood (ML)
SYSTEMS Identification
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Maximum likelihood (ML) and likelihood ratio (LR) test
Advanced data assimilation methods with evolving forecast error covariance Four-dimensional variational analysis (4D-Var) Shu-Chih Yang (with EK)
Advanced data assimilation methods- EKF and EnKF Hong Li and Eugenia Kalnay University of Maryland July 2006.
Course AE4-T40 Lecture 5: Control Apllication
Lecture 13 (Greene Ch 16) Maximum Likelihood Estimation (MLE)
Linear and generalised linear models
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Maximum likelihood (ML)
Normalised Least Mean-Square Adaptive Filtering
Today Wrap up of probability Vectors, Matrices. Calculus
Component Reliability Analysis
Kalman filtering techniques for parameter estimation Jared Barber Department of Mathematics, University of Pittsburgh Work with Ivan Yotov and Mark Tronzo.
Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis for Speech Recognition Bing Zhang and Spyros Matsoukas BBN Technologies Present.
Computing a posteriori covariance in variational DA I.Gejadze, F.-X. Le Dimet, V.Shutyaev.
Section 8.1 Estimating  When  is Known In this section, we develop techniques for estimating the population mean μ using sample data. We assume that.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
1 Lesson 8: Basic Monte Carlo integration We begin the 2 nd phase of our course: Study of general mathematics of MC We begin the 2 nd phase of our course:
ENM 503 Lesson 1 – Methods and Models The why’s, how’s, and what’s of mathematical modeling A model is a representation in mathematical terms of some real.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Conjugate Priors Multinomial Gaussian MAP Variance Estimation Example.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
ECE 8443 – Pattern Recognition LECTURE 10: HETEROSCEDASTIC LINEAR DISCRIMINANT ANALYSIS AND INDEPENDENT COMPONENT ANALYSIS Objectives: Generalization of.
Variational Data Assimilation - Adjoint Sensitivity Analysis Yan Ding, Ph.D. National Center for Computational Hydroscience and Engineering The University.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION ASEN 5070 LECTURE 11 9/16,18/09.
Research Vignette: The TransCom3 Time-Dependent Global CO 2 Flux Inversion … and More David F. Baker NCAR 12 July 2007 David F. Baker NCAR 12 July 2007.
Data assimilation and forecasting the weather (!) Eugenia Kalnay and many friends University of Maryland.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Introducing Error Co-variances in the ARM Variational Analysis Minghua Zhang (Stony Brook University/SUNY) and Shaocheng Xie (Lawrence Livermore National.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: MLLR For Two Gaussians Mean and Variance Adaptation MATLB Example Resources:
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Lecture II-3: Interpolation and Variational Methods Lecture Outline: The Interpolation Problem, Estimation Options Regression Methods –Linear –Nonlinear.
University of Colorado Boulder ASEN 5070 Statistical Orbit determination I Fall 2012 Professor George H. Born Professor Jeffrey S. Parker Lecture 9: Least.
R. Kass/W03 P416 Lecture 5 l Suppose we are trying to measure the true value of some quantity (x T ). u We make repeated measurements of this quantity.
Numerical Analysis – Data Fitting Hanyang University Jong-Il Park.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
École Doctorale des Sciences de l'Environnement d’ Î le-de-France Année Modélisation Numérique de l’Écoulement Atmosphérique et Assimilation.
Computacion Inteligente Least-Square Methods for System Identification.
Amir Yavariabdi Introduction to the Calculus of Variations and Optical Flow.
June 20, 2005Workshop on Chemical data assimilation and data needs Data Assimilation Methods Experience from operational meteorological assimilation John.
Lesson 8: Basic Monte Carlo integration
Chapter 7. Classification and Prediction
Deep Feedforward Networks
STATISTICAL ORBIT DETERMINATION Coordinate Systems and Time Kalman Filtering ASEN 5070 LECTURE 21 10/16/09.
Lecture Two The rules for coding adjoint and the application of 3D variational method to radar data analysis Jidong Gao November 2006 Center.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
6.6 The Marquardt Algorithm
Learning From Observed Data
Presentation transcript:

Understanding the 3DVAR in a Practical Way Lecture One: Understanding the 3DVAR in a Practical Way Jidong Gao jdgao@ou.edu Center for Analysis and Prediction of Storms University of Oklahoma

A simple example A variational data assimilation algorithm can involve: Variational calculus; inverse problem theory; estimation theory; optimal control theory; and various computational theories Can we present a relatively easy but comprehensive outline of variational data assimilation without being annoyed by all those complicated theories ? Let’s begin with a very simple example, that involves all major aspects of the variational data assimilation.

Consider the temperature in our class room Consider the temperature in our class room. This room is air-conditioned. According to room characteristics, power of AC, etc., we estimate (forecast) that the room temperature should be Tb=180C. We call it background T. This cannot be perfect, since there are various random disturbances, for example, the door of the room can be opened randomly, the air conditioner does not work perfectly according to spec., etc. Let the standard deviation of background σb=10 c. On the other hand, suppose there is a thermometer installed in this room. The reading of the thermometer is, say, To=20o C. We call it observation. The reading is not perfect either, Let the standard deviation of the measured temperature from the thermometer be σo=0.50 c. The question is, what should be the best estimate of the room temperature ?

The simple way is to make a weighted average, We know that the above weights are optimal weights determined by the minimum variance estimation theory, as you have learnt from Drs. Carr and Xue’s lectures.

Based on Bayesian estimation, or maximum likelihood estimate theory, we can derive that the best solution can be obtained by minimizing the following cost function, The minimization of J is the solution of the equation It should be easy to find that this is the maximum likelihood estimate. The answer is the same as the weighted average derived.

Why the analysis is more close to the observation? Posterior error You may ask how good this estimate is? This is actually a crucial question. In the world of data assimilation, the estimate of the accuracy of result is of the same importance as the result itself !! By using error variance of the estimate by defining, Go through some algebraic manipulations, yield, Obviously, and Why the analysis is more close to the observation? (Bkgrd: 18oC, Obs: 20oC and Analysis: 19.6oC)

Comments: This simple example shows how to solve the data assimilation problem in the variational way. We need to minimize the cost function J. In order to do this, we have to calculate the gradient of cost function with respect to analysis variable T, dJ/dT, and set to zero. For real 3DVAR/4DVAR, a scalar T is replaced by a vector, the two scalar error variances are replaced by error covariance matrices. The procedures to solve them are quite similar.

The 3DVAR Formulation A general cost function is defined as   Goal: Find the analysis state x that minimizes J. Where, x is the NWP model state; xb is the background state; B is the background error covariance matrix; R is the observation error covariance matrix; yo is the observation vector; y = H(x) is the operator that brings the model state x to the observational state variables. For 4DVAR, H includes the full prediction model Jc represents the dynamic constraints.

What should we know before we can solve the 3DVAR problem ? Cost function J measures the fit of analysis x to background xb and observation yo, and dynamic constraints (Jc) are also satisfied in some way. What really is J? a vector, a numerical number, or a matrix ? What we should know before minimization ? B: unknown, but can be estimated. It is a priori, 3DVAR is thus also called a priori estimate. B is vitally important!! It decides how observations spread to nearby grid points. However, B is also most difficult one to get. Its dimension is huge 1010-14 and its inverse is impossible. Simplification is necessary…And this is a very active research area for data assimilation in the past 30 years. Among the data assimilation community, there are two basic methods: 1) assume B to be diagonal. This can be done only in spectral space (Parrish and Derber 1992). However, this approximation is not acceptable for grid point models. 2) B is modeled by parameterized formulism. This reduces the dimension and the inversion of B can be avoided through judiciary choices of control variables (Huang 2000, Purser et al. 2003a, b). R: observation error covariance matrix, also includes the representative error, usually diagonal, can be decided “off-line” based on each type of observation used. xb: background state usually comes from previous forecast. yo: obs. Every new type of observation may have positive, or negative impact to the whole 3DVAR system. Active research area: OU, radar data; Wisconsin, satellite data. Jc: One, or more equation constraints. Also can be a good research topic. y=H(x): forward observational operator (including interpolation operator). Also a lot of research in this area. With all of the above being readily taken care of and coded, we can begin to think about the minimization.

a scalar Iteration loop The procedure for minimization of J by iteration First Guess x=(u,v,p,t…) Minimization algorithm Find the new control Variable x=(u,v,p,t…) Calculate cost function J Iteration loop a scalar No Calculate gradient of cost function, dJ/dx Convergence criterion YES Output x=(u,v,p,t…) Model forecast

From the flow chart, there are three important tasks for the minimization procedure: Calculate the cost function. Calculate the gradient of the cost function. Select a minimization algorithm. The first task was already discussed; the second task usually requires the use of the adjoint technique; and the third one is also crucial! To develop an efficient minimization algorithm is an active research topic in applied mathematics for the past 40 years... For us we just need to pick up a good one and know how to use it. You may find one from book “Numerical Recipe” on-line: www.nr.com

Simple example of how to use minimization algorithm To solve this in a variational way, define a cost function first, let it to be,

Then, we need the gradient of the cost function, it’s a vector of 3 variables, Subroutine FCN(N, X, F) integer::N; real::X(N), F F=(x(1)+x(2)+x(3)-6)**2+(2*x(1)-x(2)-x(3)-3)**2 & +(x(1)+2*x(2)+x(3)-8)**2 return; end Subroutine GRAD(N, X, G) integer::N; real::X(N), G(N) G(1)=6*x(1)+x(2)-20; G(2)=x(1)+6*x(2)+4*x(3)-19 G(3)=4*x(2)+3*x(3)-11

Program main Integer::n parameter(n=3) Integer:: I, maxfn Real:: Fvalue, X(n), G(n), Xguess(n) Real:: dfred, gradtl External:: fcn, grad Do i=1,n; Xguess(i) = 0.0; end do ! Provide the first guess dfred=0.002 ! Accuracy criterion about cost function Gradtl=1.0E-7 ! Accuracy criterion about the norm of ! the gradient Maxfn=50 Call umcgg(fcn, grad, n, xguess, gradtl, maxfn, dfred, x, g, fvalue) !algorithm print*, ‘x=‘,x(1), x(2), x(3) end The minimization algorithm, like this umcgg, one of the conjugate gradient methods, requires you to provide subroutines, FCN and GRAD, for calculating J and its gradient. We can quickly get the answer: (x, y, z)= (3, 2, 1) after only 2, or 3 iterations. Because the problem is simple, you do not need many iterations.

Comments: All variational data assimilation algorithms work in similar ways; you define a cost function, get its gradient, and feed them into a minimization algorithm along with a first guess of the solution. But, large, real problems are not that easy ti implement. One of the outstanding problem is how to calculate the gradient of cost function efficiently. This brings out the adjoint technique, which allows us to efficiently calculate transposes of large matrices found in the gradient calculation. What is the adjoint, and how to use it?

It’s only a mathematical tool to help you to get the gradient of the cost function. In R. M. Errico paper What is an adjoint model?, It is said “…the adjoint is used as a tool for efficiently determining the optimal solutions. Without this tool, the optimization problem (including minimization and maximization) could not be solved in a reasonable time for application to real-time forecasting”. This is a good statement. Then what does it mean exactly ?

A simple maximization example Suppose we have a fence with 200m, and want to use it to make a rectangular yard with a maximum possible area. How can we do it ? Let x, and y are the long and wide of the yard respectively, then we can define a cost function, like, and the gradient of the cost function, The ‘multiplier’ is equivalent to the ‘adjoint variable’, the role of this parameter is help to calculate the gradient of cost function!