Download presentation
Presentation is loading. Please wait.
1
EnKF Overview and Theory
Jeff Whitaker Will give a brief overview of the role of DA in NWP, then get into the specifics of the algorithms we use in NWP and how ensembles fit in. In the latter half of the talk I’ll describe where operational NWP centers, particularly our own, are heading and what the main outstanding issues are.
2
The Numerical Weather Prediction Process
Forecast Model 06 UTC obs Analysis Data Assimilation 00 UTC Forecast To understand what data assimilation is, let’s look at the big picture of numerical weather prediction. Every 6-h we collect all the obs (including satellites, aircraft, radiosondes) These observations are blended with a forecast from a weather prediction model in the process we call DA, to produce new initial conditions for the model (what we call the analysis). That analysis is used to start a new forecast, which is blended with obs 6-h later, and the process continues. The analyses, and hence the forecasts, get more accurate when we get better observations, or the forecast model or data assimilation algorithm improves. The model is very important in this process, since it carries the information from past observvations forward in time. As Sandy showed, most of the improvement we have seen in 5-d forecasts has come from improvements to the model and the assimilation process over the last decade. Analyses and forecasts become more accurate when: Observations, forecast model and/or data assimilation components improve. Forecast model carries information from past observations.
3
The Data Assimilation Process
Bayes Thm + Guassian assumption (Kalman Filter) xa = xb + K(yo-Hxb) where K =PbHT(HPbHT + R)-1 Pa = (I-KH) Pb Background forecast xb with uncertainty Pb Obs yo with uncertainty R Forecast model xa xb, Pb Pb Cycling xa and xb easy, but Pb and Pa are huge matrices!
4
The EnKF approach Instead of evolving the full error covariance matrix (Pa), evolve a random sample (via an ensemble forecast). In the DA step, Pb is a sample estimate from the forecast ensemble, and each ensemble member is updated individually.
5
The EnKF DA Process EnKF – estimate Pb from ensemble update every member with xa = xb + K(yo-Hxb) where K =PbHT(HPbHT + R)-1 Ensemble of background forecasts xb Obs yo with uncertainty R Forecast model xa xb for each member Cycle every ensemble member instead of propagating Pb
6
Two categories of EnKF ‘stochastic’ EnKF (original formulation by Houtekamer and Mitchell, 1998 MWR) treats obs as ensemble by adding N(0,R) noise. Every member updated with the same familiar KF equation (simple!) Used by Environment Canada. ‘deterministic’ EnKF (ETKF, Bishop et al 2001 MWR; EnSRF, Whitaker and Hamill 2002 MWR) avoids this by constructing analysis perturbations so that Pa consistent with KF is obtained. Mean updated with Kalman update equation. Perturbations updated differently. More accurate than stochastic approach for small ensembles.
7
Computational shortcuts in EnKF: (1) Simplifying Kalman gain calculation
The Kalman gain is computed without explicitly forming Pb – only matrices of size (M x N) and (N x N) are needed where N is the number of obs and M is the number of state vars (N << M usually). The key here is that the huge matrix Pb is never explicitly formed
8
Computational shortcuts in EnKF: (2) serial processing of observations (requires observation error covariance R to be diagonal) Method 1 Observations 1 and 2 Background forecasts EnKF Analyses Method 2 We can make this matrices even smaller by using a trick, which takes advantage of the fact that R is often diagonal, i.e. observation errors are uncorrelated. In that case, you can assimilate obs in batches, or even one by one and get the same answer as when they are all assimilated at once. For exampe, if we have two observations, we can calculate the full Kalman gain and update the model state in one go. Or, we can assimilate just the first ob, update the model state, use that model state as the backgound for assimilating the second ob, then assimilate that one. If R is diagonal, the result is the same. Observation 1 Observation 2 Background forecasts EnKF Analyses after obs 1 EnKF Analyses
9
NOAA EnKF implements two algorithms
Serial EnKF (observations processed one at a time) Both stochastic and deterministic options available. Local Ensemble Transform Filter (LETKF) Update computed with all obs at once, but matrices are kept small by updating each grid point independently using only nearby observations.
10
LETKF Algorithm
11
Consequences of sampling error
Ensemble sizes we can afford are O( ). Rank of full Pb is at least O(106) for current global ensemble resolution. Pb is estimated from XbXbT, so most of eigenvalues are zero. Errors in individual elements of Pb are O(Ne-1/2), but correlated across elements. EnKF fails miserably if raw sample covariance used. Pa grossly underestimated, spread collapses, data assimilation ignores all observations.
12
Covariance localization – the secret sauce that makes it all work.
Basic idea: small values in Pb cannot be estimated accurately, just setting them to zero will increase rank of sample estimate. Hypothesis: Covariances decrease in significance with distance. Method: Taper covariance estimate to zero away from (block) diagonal using a pre-specific function. Also makes algorithms faster, since observations cannot impact the entire state.
13
Localization: a simple example
Estimates of covariances from a small ensemble will be noisy, with signal-to-noise small especially when covariance is small
14
Localization: a real-world example
AMSUA_n15 channel 6 radiance at 150E,-50S. Increment to level 30 (~310mb) temperature for a 1K O-F for 40,80,160,320 and 640 ens members with no localization.
15
Localization: a real-world example
16
What about uncertainty in the model itself
What about uncertainty in the model itself? Not included in Pb if every member run with the same model Must account for the background error any difference between simulated and true environment. Methods used so far: multiplicative inflation (mult. ens perts by a factor > 1). additive inflation (random perts added to each member – e.g. differences between 24 and 48-h forecasts valid at the same time). model-based schemes (e.g. stochastic kinetic energy backscatter for representing unresolved processes, multi-model/multi-parameterization). Opnl NCEP system used use a combination of 1) and 2), now using a combination of 1) and 3). Only 1) is taken care of within the EnKF itself. 2) Is a separate step (runs after EnKF update and before forecast step) 3) happens inside forecast model.
17
Relaxation To Prior Spread (RTPS) Inflation Described in DOI: 10
Relaxation To Prior Spread (RTPS) Inflation Described in DOI: /MWR-D Inflate posterior spread (std. dev) sa back toward prior spread sb Equivalent to Fuqing Zhang and his collaborators first proposed this type of inflation in a 2005 paper. Their version, which I call ‘relaxation to prior perturbation’ inflation, relaxes the posterior perturbations back to the prior with a coefficient alpha, so that if alpha=1 the posterior perts are completely replace by the prior perts, and if alpha=0 nothing is done. Since the posterior perturbations are much smaller than the prior perturbations in regions of dense obs, for a given value of alpha, more inflation is applied where the obs have had a larger impact on the analysis. Accounts for assimilation related errors, which will occur where observations have a larger impact. Also, model error is a bigger fraction of total background error where observations are dense. Here we propose an alternative approach, which we call ‘relaxation to prior stdev, which relaxes the ensemble spread back to the prior spread. This can be expressed a multiplicative inflation where the inflation coefficient depends on the fractional reduction of the ensemble spread that results from assimlating observations.
18
Why does Pb matter? Flow dependence and unobserved variables
ps ob First-Guess Precipitable Water First-Guess SLP contours A simple example of the benefits… This case involves a northward flow of warm and moist air ahead of a maritime cyclone (aka atmospheric river). White contours are background MSLP, colored field is background total column water vapor, and red contours are TCWV increment for a single surface pressure observation at the location denoted by the yellow dot. Covariance between surface pressure at the observation location and the TCWV field estimated from an EnKF ensemble allows the pressure ob to adjust the position ‘atmopsheric river consistent with the surface cyclone. DA system ‘knows’ there is a cyclone/atm river in the background, makes adjustments when given observations taking the dynamics into account. In the old 3DVar system, the climatogical weights were set such that surface pressure obs could not change the humidity field at all. This is because, on average, there is very little correlation between errors in the surface pressure field and errors in the humidity field. However, in certain dynamical situations these correlations can be large, and the ensemble-based DA methods allow much more information to extracted from the observations in those cases. Surface pressure observation can improve analysis of integrated water vapor (through flow-dependent cross-variable relationships). If climo Pb were used (3DVar) there would be no vapor increment. 18
19
Why combine EnKF and Var?
Features from EnKF Features from Var Can propagate Pb from across assimilation windows Treatment of sampling error in ensemble Pb estimate does not depend on H. More flexible treatment of model error (can be treated in ensemble) Dual-resolution capability – can produce a high-res “control” analysis. Automatic initialization of ensemble forecasts. Ease of adding extra constraints to cost function, including a static Pb component. Why not just stand-alone EnKF? Operational centers have used Var systems for many years – existing infrastructure. Var solver makes some aspects easier (localization, extra constraints, dual-res, blending of specified static covaraince).
20
What is limiting performance now?
Sampling error Run larger ensembles. Better (flow and scale dependent) localization methods. Better treatment of non-local observations (model-space localization). Model error Run higher resolution ensembles. Better parameterizations of model uncertainty. Non-gaussian error statistics. Some phenomena not observed often/well enough, dynamical error growth will be nonlinear. Can arise from displacement errors in coherent features. Variables that are physically bounded (humidity, wind speed).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.