Dusanka Zupanski CIRA/Colorado State University Fort Collins, Colorado Ensemble Kalman Filter Guest Lecture at AT 753: Atmospheric Water Cycle 21 April 2006, CSU/ATS Dept., Fort Collins, CO Dusanka Zupanski, CIRA/CSU Acknowledgements: M. Zupanski, C. Kummerow, S. Denning, and M. Uliasz, CSU A. Hou and S. Zhang, NASA/GMAO
Why Ensemble Data Assimilation? Kalman filter and Ensemble Kalman filter Maximum likelihood ensemble filter (MLEF) Examples of MLEF applications Future research directions Dusanka Zupanski, CIRA/CSU OUTLINE
Why Ensemble Data Assimilation? Dusanka Zupanski, CIRA/CSU Three main reasons : Need for optimal estimate of the atmospheric state + verifiable uncertainty of this estimate; Need for flow-dependent forecast error covariance matrix; and The above requirements should be applicable to most complex atmospheric models (e.g., non-hydrostatic, cloud-resolving, LES).
Example 1: Fronts Example 2: Hurricanes (From Whitaker et al., THORPEX web-page) Benefits of Flow-Dependent Background Errors
Are there alternatives? Dusanka Zupanski, CIRA/CSU Two good candidates: 4d-var method: It employs flow-dependent forecast error covariance, but it does not propagate it in time. Kalman Filter (KF): It does propagate flow- dependent forecast error covariance in time, but it is too expensive for applications to complex atmospheric models. EnKF is a practical alternative to KF, applicable to most complex atmospheric models. A bonus benefit: EnKF does not use adjoint models!
Typical EnKF Dusanka Zupanski, CIRA/CSU Forecast error Covariance P f (ensemble subspace) DATA ASSIMILATION Observations First guess Optimal solution for model state x=(T,u,v,f, ) ENSEMBLE FORECASTING Analysis error Covariance P a (ensemble subspace) INFORMATION CONTENT ANALYSIS T b,u b,v b,f b, , , Hessian preconditioning Non-Gaussian PDFs Maximum Likelihood Ensemble Filter
Dusanka Zupanski, CIRA/CSU Data Assimilation Equations Equations in model space: Prior (forecast) error covariance of x (assumed known): - Dynamical model for model state evolution (e.g., NWP model) - Model state vector of dim Nstate ; - Model error vector of dim Nstate - Dynamical model for state dependent model error Model error covariance (assumed known): - Mathematical expectation ; GOAL: Combine Model and Data to obtain optimal estimate of dynamical state x - Time step index
- Observations vector of dim Nobs ; Observation error covariance, includes also representatives error (assumed known): - Observation operator Equations in data space: - Observation error Data Assimilation Equations Dusanka Zupanski, CIRA/CSU - Time step index (denoting observation times) Data assimilation should combine model and data in an optimal way. Optimal solution z can be defined in terms of optimal initial conditions x a (analysis), model error w, and empirical parameters , , .
Approach 1: Kalman filterExtended Kalman filterEnKF Approach 1: Optimal solution (e.g., analysis x a ) = Minimum variance estimate, or conditional mean of Bayesian posterior probability density function (PDF) (e.g., Kalman filter; Extended Kalman filter; EnKF) - PDF Dusanka Zupanski, CIRA/CSU How can we obtain optimal solution? Two approaches are used most often: Extended Kalman filterEnsemble Kalman filter For non-liner M or H the solution can be obtained employing Extended Kalman filter, or Ensemble Kalman filter. Kalman filter Assuming liner M and H and independent Gaussin PDFs Kalman filter solution (e.g., Jazwinski 1970) x a is defined as mathematical expectation (i.e., mean) of the conditional posterior p ( x|y ), given observations y and prior p ( x ).
Dusanka Zupanski, CIRA/CSU Approach 2: variationalMLEF Approach 2: Optimal solution (e.g., analysis x a ) = Maximum likelihood estimate, or conditional mode of Bayesian posterior p ( x | y ) (e.g., variational methods; MLEF) For independent Gaussian PDFs, this is equivalent to minimizing cost function J: Solution can be obtained (with ideal preconditioning) in one iteration for liner H and M. Iterative solution for non-linear H and M : - Preconditioning matrix = inverse Hessian of J x a = Maximum of posterior p ( x | y ), given observations and prior p ( x ).
VARIATIONAL MLEF Milija Zupanski, CIRA/CSU Ideal Hessian Preconditioning
Dusanka Zupanski, CIRA/CSU x mode x mean x p(x)p(x) Non-Gaussian x mode = x mean x p(x)p(x) Gaussian MEAN vs. MODE For Gaussian PDFs and linear H and M results of all methods [KF, EnKF (with enough ensemble members), and variational] should be identical, assuming the same P f, R, and y are used in all methods. Minimum variance estimate= Maximum likelihood estimate!
KF, EnKF, 4d-var, all created equal? Does this really happen?!?
TEST RESULTS EMPLOYING A LINEAR MODEL AND GAUSSIAN PDFs (M.Uliasz) (D. Zupanski)
Dusanka Zupanski, CIRA/CSU - Optimal estimate of x (analysis) Kalman filter solution Analysis step: - Background (prior) estimate of x - Analysis (posterior) error covariance matrix ( Nstate x Nstate ) Forecast step: ; - Update of forecast error covariance - Kalman gain matrix ( Nstate x Nobs ) Often neglected
Ensemble Kalman Filter (EnKF) solution EnKF as first introduced by Evensen (1994) as a Monte Carlo filter. Analysis solution defined for each ensemble member i : Mean analysis solution: Analysis error covariance in ensemble subspace: Analysis step: Analysis ensemble perturbations: Sample analysis covariance Equations given here following Evensen (2003)
Ensemble Kalman Filter (EnKF) Forecast step: Forecast error covariance calculated using ensemble perturbations: Ensemble forecasts employing a non-linear model M ; Sample forecast covariance Non-linear forecast perturbations
There are many different versions of EnKF Monte Carlo EnKF (Evensen 1994; 2003) EnKF (Houtekamer et al. 1995; 2005; First operational version) Hybrid EnKF (Hamill and Snyder 2000) EAKF (Anderson 2001) ETKF (Bishop et al. 2001) EnSRF (Whitaker and Hamill 2002) LEKF (Ott et al. 2004) MLEF (Zupanski 2005; Zupanski and Zupanski 2006) Minimum variance solution Maximum likelihood solution Why maximum likelihood solution? It is more adequate for employing non- Gaussian PDFs (e.g., Fletcher and Zupanski 2006).
Current status of EnKF applications EnKF is operational in Canada, since January 2005 (Houtekamer et al.). Results comparable to 4d-var. EnKF is better than 3d-var (experiments with NCEP T62 GFS) - Whitaker et al., THORPEX presentation ). Very encouraging results of EnKF in application to non- hydrostatic, cloud resolving models (Zhang et al., Xue et al.). Very encouraging results of EnKF for ocean (Evensen et al.), climate (Anderson et al.), and soil hydrology models (Reichle et al.). Theoretical advantages of ensemble-based DA methods are getting confirmed in an increasing number of practical applications.
Examples of MLEF applications Dusanka Zupanski, CIRA/CSU
Dusanka Zupanski, CIRA/CSU - Dynamical model for standard model state x Maximum Likelihood Ensemble Filter - Dynamical model for model error (bias) b - Dynamical model for empirical parameters Define augmented state vector z Find optimal solution (augmented analysis) z a by minimizing J (MLEF method): And augmented dynamical model F ,. (Zupanski 2005; Zupanski and Zupanski 2006)
Both the magnitude and the spatial patterns of the true bias are successfully captured by the MLEF. 40 Ens 100 Ens True R Cycle 1 Cycle 3 Cycle 7 Bias estimation: Respiration bias R, using LPDM carbon transport model (Nstate=1800, Nobs=1200, DA interv=10 days) Domain with larger bias (typically land) Domain with smaller bias (typically ocean)
Dusanka Zupanski, CIRA/CSU Information measures in ensemble subspace Shannon information content, or entropy reduction Degrees of freedom (DOF) for signal (Rodgers 2000): - information matrix in ensemble subspace of dim Nens x Nens - are columns of Z - control vector in ensemble space of dim Nens - model state vector of dim Nstate >>Nens Errors are assumed Gaussian in these measures. (Bishop et al. 2001; Wei et al. 2005; Zupanski et al. 2006, subm. to MWR) - eigenvalues of C for linear H and M
Dusanka Zupanski, CIRA/CSU GEOS-5 Single Column Model: DOF for signal (Nstate=80; Nobs=80, seventy 6-h DA cycles, assimilation of simulated T,q observations) DOF for signal varies from one analysis cycle to another due to changes in atmospheric conditions. 3d-var approach does not capture this variability. Small ensemble size (10 ens), even though not perfect, captures main data signals. T obs (K)q obs (g kg -1 ) Data assimilation cycles Vertical levels RMS Analysis errors for T, q: ens ~ 0.45K; 0.377g/kg 20ens ~ 0.28K; 0.265g/kg 40ens ~ 0.23K; 0.226g/kg 80ens ~ 0.21K; 0.204g/kg No_obs ~ 0.82K; 0.656g/kg
Non-Gaussian (lognormal) MLEF framework: CSU SWM (Randall et al.) Beneficial impact of correct PDF assumption – practical advantages Dusanka Zupanski, CIRA/CSU Cost function derived from posterior PDF ( x-Gaussian, y-lognormal): Lognormal additional nonlinear term Normal (Gaussian) Courtesy of M. Zupanski
Future Research Directions Covariance inflation and localization need further investigations: Are these techniques necessary? Model error and parameter estimation need further attention: Do we have sufficient information in the observations to estimate complex model errors? Information content analysis might shed some light on DOF of model error and also on the necessary ensemble size. Non-Gaussian PDFs have to be included into DA (especially for cloud variables). Characterize error covariances for cloud variables. Account for representativeness error. Dusanka Zupanski, CIRA/CSU
References for further reading Anderson, J. L., 2001: An ensemble adjustment filter for data assimilation. Mon. Wea. Rev., 129, 2884–2903. Fletcher, S.J., and M. Zupanski, 2006: A data assimilation method for lognormally distributed observational errors. Q. J. Roy. Meteor. Soc. (in press). Evensen, G., 1994: Sequential data assimilation with a nonlinear quasi-geostrophic model using Monte Carlo methods to forecast error statistics. J. Geophys. Res., 99, (C5), Evensen, G., 2003: The ensemble Kalman filter: theoretical formulation and practical implementation. Ocean Dynamics. 53, Hamill, T. M., and C. Snyder, 2000: A hybrid ensemble Kalman filter/3D-variational analysis scheme. Mon. Wea. Rev., 128, 2905–2919. Houtekamer, Peter L., Herschel L. Mitchell, 1998: Data Assimilation Using an Ensemble Kalman Filter Technique. Mon. Wea. Rev., 126, Houtekamer, Peter L., Herschel L. Mitchell, Gerard Pellerin, Mark Buehner, Martin Charron, Lubos Spacek, and Bjarne Hansen, 2005: Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Wea. Rev., 133, Ott, E., and Coauthors, 2004: A local ensemble Kalman filter for atmospheric data assimilation. Tellus., 56A, 415–428. Tippett, M. K., J. L. Anderson, C. H. Bishop, T. M. Hamill, and J. S. Whitaker, 2003: Ensemble square root filters. Mon. Wea. Rev., 131, 1485–1490. Whitaker, J. S., and T. M. Hamill, 2002: Ensemble data assimilation without perturbed observations. Mon. Wea. Rev., 130, 1913–1924. Zupanski D. and M. Zupanski, 2006: Model error estimation employing an ensemble data assimilation approach. Mon. Wea. Rev. 134, Zupanski, M., 2005: Maximum likelihood ensemble filter: Theoretical aspects. Mon. Wea. Rev., 133, 1710–1726 Dusanka Zupanski, CIRA/CSU
Dusanka Zupanski, CIRA/CSU Thank you.