What is Data Assimilation? A Tutorial Andrew S. Jones Lots of help also from: Steven Fletcher, Laura Fowler, Tarendra Lakhankar, Scott Longmore, Manajit Sengupta, Tom Vonder Haar, Dusanka Zupanski, and Milija Zupanski
Data Assimilation Outline Why Do Data Assimilation? Who and What Important Concepts Definitions Brief History Common System Issues / Challenges
The Purpose of Data Assimilation Why do data assimilation?
The Purpose of Data Assimilation Why do data assimilation? (Answer: Common Sense)
The Purpose of Data Assimilation Why do data assimilation? (Answer: Common Sense) MYTH: “It’s just an engineering tool”
The Purpose of Data Assimilation Why do data assimilation? (Answer: Common Sense) MYTH: “It’s just an engineering tool” If Truth matters, “It’s our most important science tool”
The Purpose of Data Assimilation Why do data assimilation? I want better model initial conditions for better model forecasts
The Purpose of Data Assimilation Why do data assimilation? I want better model initial conditions for better model forecasts I want better calibration and validation (cal/val)
The Purpose of Data Assimilation Why do data assimilation? I want better model initial conditions for better model forecasts I want better calibration and validation (cal/val) I want better acquisition guidance
The Purpose of Data Assimilation Why do data assimilation? I want better model initial conditions for better model forecasts I want better calibration and validation (cal/val) I want better acquisition guidance I want better scientific understanding of Model errors (and their probability distributions) Data errors (and their probability distributions) Combined Model/Data correlations DA methodologies (minimization, computational optimizations, representation methods, various method approximations) Physical process interactions
The Purpose of Data Assimilation Why do data assimilation? I want better model initial conditions for better model forecasts I want better calibration and validation (cal/val) I want better acquisition guidance I want better scientific understanding of Model errors (and their probability distributions) Data errors (and their probability distributions) Combined Model/Data correlations DA methodologies (minimization, computational optimizations, representation methods, various method approximations) Physical process interactions (i.e., sensitivities and feedbacks) Leads toward better future models
The Purpose of Data Assimilation Why do data assimilation? I want better model initial conditions for better model forecasts I want better calibration and validation (cal/val) I want better acquisition guidance I want better scientific understanding of Model errors (and their probability distributions) Data errors (and their probability distributions) Combined Model/Data correlations DA methodologies (minimization, computational optimizations, representation methods, various method approximations) Physical process interactions (i.e., sensitivities and feedbacks) Leads toward better future models VIRTUOUS CYCLE
The Data Assimilation Community Who is involved in data assimilation? NWP Data Assimilation Experts NWP Modelers Application and Observation Specialists Cloud Physicists / PBL Experts / NWP Parameterization Specialists Physical Scientists (Physical Algorithm Specialists) Radiative Transfer Specialists Applied Mathematicians / Control Theory Experts Computer Scientists Science Program Management (NWP and Science Disciplines) Forecasters Users and Customers
The Data Assimilation Community What skills are needed by each involved group? NWP Data Assimilation Experts (DA system methodology) NWP Modelers (Model + Physics + DA system) Application and Observation Specialists (Instrument capabilities) Physical Scientists (Instrument + Physics + DA system) Radiative Transfer Specialists (Instrument config. specifications) Applied Mathematicians (Control theory methodology) Computer Scientists (DA system + OPS time requirements) Science Program Management (Everything + $$ + Good People) Forecasters (Everything + OPS time reqs. + Easy/fast access) Users and Customers (Could be a wide variety of responses) e.g., NWS / Army / USAF / Navy / NASA / NSF / DOE / ECMWF
The Data Assimilation Community Are you part of this community?
The Data Assimilation Community Are you part of this community? Yes, you just may not know it yet.
The Data Assimilation Community Are you part of this community? Yes, you just may not know it yet. Who knows all about data assimilation?
The Data Assimilation Community Are you part of this community? Yes, you just may not know it yet. Who knows all about data assimilation? No one knows it all, it takes many experts
The Data Assimilation Community Are you part of this community? Yes, you just may not know it yet. Who knows all about data assimilation? No one knows it all, it takes many experts How large are these systems?
The Data Assimilation Community Are you part of this community? Yes, you just may not know it yet. Who knows all about data assimilation? No one knows it all, it takes many experts How large are these systems? Typically, the DA systems are “medium”-sized projects using software industry standards Medium = multi-year coding effort by several individuals (e.g., RAMDAS is ~230K lines of code, ~3500 pages of code) Satellite “processing systems” tend to be larger still Our CIRA Mesoscale 4DVAR system was built over ~7-8 years with heritage from the ETA 4DVAR system
The Building Blocks of Data Assimilation Control Variables are the initial model state variables that are optimized using the new data information as a guide NWP Model Observation Model Minimization Observations They can also include boundary condition information, model parameters for “tuning”, etc. NWP Adjoint Observation Model Adjoint
What Are We Minimizing? Minimize discrepancy between model and observation data over time The Cost Function, J, is the link between the observational data and the model variables Observations are either assumed unbiased, or are “debiased” by some adjustment method
Bayes Theorem P (x | y) ~ P (y | x) P (x) Maximum Conditional Probability is given by: P (x | y) ~ P (y | x) P (x) Assuming Gaussian distributions… P (y | x) ~ exp {-1/2 [y – H (x)]T R-1 [y – H (x)]} P (x) ~ exp {-1/2 [x –xb]T B-1 [x – xb]} e.g., 3DVAR Lorenc (1986)
What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Model Background or Observations?
What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Model Background or Observations? Trust = Weightings Just like your financial credit score!
Who are the Candidates for “Truth”? Minimize discrepancy between model and observation data over time Candidate 1: Background Term “x0” is the model state vector at the initial time t0 this is also the “control variable”, the object of the minimization process “xb” is the model background state vector “B” is the background error covariance of the forecast and model errors
Who are the Candidates for “Truth”? Minimize discrepancy between model and observation data over time Candidate 2: Observational Term “y” is the observational vector, e.g., the satellite input data (typically radiances), salinity, sounding profiles “M0,i(x0)” is the model state at the observation time “i” “h” is the observational operator, for example the “forward radiative transfer model” “R” is the observational error covariance matrix that specifies the instrumental noise and data representation errors (currently assumed to be diagonal…)
What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Candidate 1: Background Term The default condition for the assimilation when data are not available or the available data have no significant sensitivity to the model state or the available data are inaccurate
Model Error Impacts our “Trust” Minimize discrepancy between model and observation data over time Candidate 1: Background Term Model error issues are important Model error varies as a function of the model time Model error “grows” with time Therefore the background term should be trusted more at the initial stages of the model run and trusted less at the end of the model run
How to Adjust for Model Error? Minimize discrepancy between model and observation data over time Candidate 1: Background Term Add a model error term to the cost function so that the weight at that specific model step is appropriately weighted or Use other possible adjustments in the methodology, i.e., “make an assumption” about the model error impacts If model error adjustments or controls are used the DA system is said to be “weakly constrained”
What About Model Error Errors? Minimize discrepancy between model and observation data over time Candidate 1: Background Term Model error adjustments to the weighting can be “wrong” In particular, most assume some type of linearity Non-linear physical processes may break these assumptions and be more complexly interrelated A data assimilation system with no model error control is said to be “strongly constrained” (perfect model assumption)
What About other DA Errors? Overlooked Issues? Data debiasing relative to the DA system “reference”. It is not the “Truth”, however it is self-consistent. DA Methodology Errors? Assumptions: Linearization, Gaussianity, Model errors Representation errors (space and time) Poorly known background error covariances Imperfect observational operators Overly aggressive data “quality control” Historical emphasis on dynamical impact vs. physical Synoptic vs. Mesoscale?
DA Theory is Still Maturing The Future: Lognormal DA (Fletcher and Zupanski, 2006, 2007) Gaussian systems typically force lognormal variables to become Gaussian introducing an avoidable data assimilation system bias Lognormal Variables Clouds Precipitation Water vapor Emissivities Many other hydrologic fields Mode Mean Add DA Bias Here! Many important variables are lognormally distributed Gaussian data assimilation system variables are “Gaussian”
What Do We Trust for “Truth”? Minimize discrepancy between model and observation data over time Candidate 2: Observational Term The non-default condition for the assimilation when data are available and data are sensitive to the model state and data are precise (not necessarily “accurate”) and data are not thrown away by DA “quality control” methods
What “Truth” Do We Have? DATA MODEL CENTRIC CENTRIC TRUTH Minimize discrepancy between model and observation data over time DATA MODEL CENTRIC CENTRIC TRUTH
DA Theory is Still Maturing A Brief History of DA Hand Interpolation Local polynomial interpolation schemes (e.g., Cressman) Use of “first guess”, i.e., a background Use of an “analysis cycle” to regenerate a new first guess Empirical schemes, e.g., nudging Least squares methods Variational DA (VAR) Sequential DA (KF) Monte Carlo Approx. to Seq. DA (EnsKF)
Variational Techniques Major Flavors: 1DVAR (Z), 3DVAR (X,Y,Z), 4DVAR (X,Y,Z,T) Lorenc (1986) and others… Became the operational scheme in early 1990s to the present day Finds the maximum likelihood (if Gaussian, etc.) (actually it is a minimum variance method) Comes from setting the gradient of the cost function equal to zero Control variable is xa
Sequential Techniques Kalman (1960) and many others… These techniques can evolve the forecast error covariance fields similar in concept to OI B is no longer static, B => Pf = forecast error covariance Pa (ti) is estimated at future times using the model K = “Kalman Gain” (in blue boxes) Extended KF, Pa is found by linearizing the model about the nonlinear trajectory of the model between ti-1 and ti
Sequential Techniques Ensembles can be used in KF-based sequential DA systems Ensembles are used to estimate Pf through Gaussian “sampling” theory f is a particular forecast instance l is the reference state forecast Pf is estimated at future times using the model K number model runs are required (Q: How to populate the seed perturbations?) Sampling allows for use of approximate solutions Eliminates the need to linearize the model (as in Extended KF) No tangent linear or adjoint models are needed
Sequential Techniques Notes on EnsKF-based sequential DA systems EnsKFs are an approximation Underlying theory is the KF Assumes Gaussian distributions Many ensemble samples are required Can significantly improve Pf Where does H fit in? Is it fully “resolved”? What about the “Filter” aspects? Future Directions Research using Hybrid EnsKF-Var techniques
Sequential Techniques Zupanski (2005): Maximum Likelihood Ensemble Filter (MLEF) Structure function version of Ensemble-based DA (Note: Does not use sampling theory, and is more similar to a variational DA scheme using principle component analysis (PCA) NE is the number of ensembles S is the state-space dimension Each ensemble is carefully selected to represent the degrees of freedom of the system Square-root filter is built-in to the algorithm assumptions
Where is “M” in all of this? No M used 3DDA Techniques have no explicit model time tendency information, it is all done implicitly with cycling techniques, typically focusing only on the Pf term 4DDA uses M explicitly via the model sensitivities, L, and model adjoints, LT, as a function of time Kalman Smoothers (e.g., also 4DEnsKS) would likewise also need to estimate L and LT M used
4DVAR Revisited (for an example see Poster NPOESS P1.16 by Jones et al.) Automatically propagates the Pf within the cycle, however can not save the result for the next analysis cycle (memory of “B” info becomes lost in the next cycle) (Thepaut et al., 1993) LT is the adjoint which is integrated from ti to t0 Adjoints are NOT the “model running in reverse”, but merely the model sensitivities being integrated in reverse order, thus all adjoints appear to function backwards. Think of it as accumulating the “impacts” back toward the initial control variables.
Minimization Process J x TRUTH Minima is at J/ x = 0 Issues: Jacobian of the Cost Function is used in the minimization procedure Minima is at J/ x = 0 Issues: Is it a global minima? Are we converging rapid or slow? J TRUTH x
Ensembles: Flow-dependent forecast error covariance and spread of information from observations grid-point time t2 x From M. Zupanski t1 Isotropic correlations Outline slide obs2 obs1 Geographically distant observations can bring more information than close-by observations, if in a dynamically significant region t0
Preconditioning the Space From M. Zupanski Result: faster convergence “Preconditioners” transform the variable space so that fewer iterations are required while minimizing the cost function x ->
Incremental VAR Courtier et al. (1994) Most Common 4D framework in operational use Incremental form performs Linear minimization within a lower dimensional space (the inner loop minimization) Outer loop minimization is at the full model resolution (non-linear physics are added back in this stage) Benefits: Smoothes the cost function and assures better minimization behaviors Reduces the need for explicit preconditioning Issues: Extra linearizations occur It is an approximate form of VAR DA
Types of DA Solution Spaces Model Space (x) Physical Space (y) Ensemble Sub-space e.g., Maximum Likelihood Ensemble Filter (MLEF) Types of Ensemble Kalman Filters Perturbed observations (or stochastic) Square root filters (i.e., analysis perturbations are obtained from the Square root of the Kalman Filter analysis covariance)
How are Data used in Time? Observation model Cloud resolving model forecast time observations Assimilation time window
Assimilation time window A “Smoother” Uses All Data Available in the Assimilation Window (a “Simultaneous” Solution) Observation model Cloud resolving model forecast time observations Assimilation time window
Assimilation time window A “Filter” Sequentially Assimilates Data as it Becomes Available in each Cycle Observation model Cloud resolving model forecast time observations Assimilation time window
Assimilation time window A “Filter” Sequentially Assimilates Data as it Becomes Available in each Cycle Observation model Cloud resolving model Cycle Previous Information forecast time observations Assimilation time window
Assimilation time window A “Filter” Sequentially Assimilates Data as it Becomes Available in each Cycle Observation model Cloud resolving model Cycle Previous Information forecast time observations Assimilation time window
A “Filter” Sequentially Assimilates Data as it Becomes Available in each Cycle Observation model Cloud resolving model forecast time Cycle Physics “Barriers” What Can Overcome the Barrier? Linear Physics Processes and Propagated Forecast Error Covariances
Data Assimilation Conclusions Thanks! (jones@cira.colostate.edu) Broad, Dynamic, Evolving, Foundational Science Field! Flexible unified frameworks, standards, and funding will improve training and education Continued need for advanced DA systems for research purposes (non-OPS) Can share OPS framework components, e.g., JCSDA CRTM Data Assimilation For more information… Great NWP DA Review Paper (By Mike Navon) ECMWF DA training materials JCSDA DA workshop http://people.scs.fsu.edu/~navon/pubs/JCP1229.pdf http://www.ecmwf.int/newsevents/training/rcourse_notes/ http://www.weatherchaos.umd.edu/workshop/ Thanks! (jones@cira.colostate.edu)
Backup Slides Or: Why Observationalists and Modelers see things differently… No offense meant for either side
What Approach Should We Use? DATA MODEL CENTRIC CENTRIC TRUTH
What Approach Should We Use? DATA MODEL CENTRIC CENTRIC TRUTH
What Approach Should We Use? My Precious… We Trust the Model! Data hurts us!, Yes… DATA MODEL CENTRIC CENTRIC TRUTH
MODEL CENTRIC FOCUS FOCUS ON DATA MODEL CENTRIC CENTRIC “B” Background Error Improvements are Needed “xb” Associated background states and “Cycling” are more heavily emphasized DA method selection tends toward sequential estimators, “filters”, and improved representation of the forecast model error covariances DATA MODEL CENTRIC CENTRIC E.g., Ensemble Kalman Filters, other Ensemble Filter systems
What Approach Should We Use? This is not to say that all model-centric improvements are bad… DATA MODEL CENTRIC CENTRIC TRUTH
What Approach Should We Use? DATA MODEL CENTRIC CENTRIC TRUTH
What Approach Should We Use? My Precious… We Trust the Data! Models unfair and hurts us!, Yes… DATA MODEL CENTRIC CENTRIC TRUTH
DATA CENTRIC FOCUS FOCUS ON DATA DATA CENTRIC CENTRIC “h” Observational Operator Improvements are Needed “M0,i(x0)” Model state capabilities and independent experimental validation is more heavily emphasized DA method selection tends toward “smoothers” (less focus on model cycling), more emphasis on data quantity and improvements in the data operator and understanding of data representation errors DATA DATA CENTRIC CENTRIC e.g., 4DVAR systems
DUAL-CENTRIC FOCUS Best of both worlds? Solution: Ensemble based forecast covariance estimates combined with 4DVAR smoother for research and 4DVAR filter for operations? Several frameworks to combine the two approaches are in various stages of development now…
What Have We Learned? DATA MODEL CENTRIC CENTRIC TRUTH Your Research Objective is CRITICAL to making the right choices… Operational choices may supercede good research objectives Computational speed is always critical for operational purposes Accuracy is critical for research purposes DATA MODEL CENTRIC CENTRIC TRUTH
What About Model Error Errors? “I just can’t run like I used to.” A Strongly Constrained System? Can Data Over Constrain? Model “Little Data People”
What About Model Error Errors? “We’ll… no one’s perfect.” A Strongly Constrained System? Can Data Over Constrain? DA expert
Optimal Interpolation (OI) Eliassen (1954), Bengtsson et al. (1981), Gandin (1963) Became the operational scheme in early 1980s and early 1990s OI merely means finding the “optimal” Weights, W A better name would have been “statistical interpolation”
What is a Hessian? G(f)ij(x) = DiDj f(x) A Rank-2 Square Matrix Containing the Partial Derivatives of the Jacobian G(f)ij(x) = DiDj f(x) The Hessian is used in some minimization methods, e.g., quasi-Newton…
The Role of the Adjoint, etc. Adjoints are used in the cost function minimization procedure But first… Tangent Linear Models are used to approximate the non-linear model behaviors L x’ = [M(x1) – M(x2)] / L is the linear operator of the perturbation model M is the non-linear forward model is the perturbation scaling-factor x2 = x1 + x’
Useful Properties of the Adjoint <Lx’, Lx’> <LTLx’, x’> LT is the adjoint operator of the perturbation model Typically the adjoint and the tangent linear operator can be automatically created using automated compilers y = (x1, …, xn, y) *xi = *xi + *y /xi *y = *y /y where *xi and *y are the “adjoint” variables
Useful Properties of the Adjoint <Lx’, Lx’> <LTLx’, x’> LT is the adjoint operator of the perturbation model Typically the adjoint and the tangent linear operator can be automatically created using automated compilers Of course, automated methods fail for complex variable types (See Jones et al., 2004) E.g., how can the compiler know when the variable is complex, when codes are decomposed into real and imaginary parts as common practice? (It can’t.)