Craig H. Bishop Elizabeth A Satterfield Kevin T. Shanley, David Kuhl, Tom Rosmond, Justin McLay and Nancy Baker Naval Research Laboratory Monterey CA November.

Slides:



Advertisements
Similar presentations
Multi-model ensemble post-processing and the replicate Earth paradigm (Manuscript available on-line in Climate Dynamics) Craig H. Bishop Naval Research.
Advertisements

Data-Assimilation Research Centre
Variational data assimilation and forecast error statistics
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Errors in Error Variance Prediction and Ensemble Post-Processing Elizabeth Satterfield 1, Craig Bishop 2 1 National Research Council, Monterey, CA, USA;
DA/SAT Training Course, March 2006 Variational Quality Control Erik Andersson Room: 302 Extension: 2627
Jidong Gao and David Stensrud Some OSSEs on Assimilation of Radar Data with a Hybrid 3DVAR/EnKF Method.
Accounting for ensemble variance inaccuracy with Hybrid Ensemble 4D-VAR “There are known knowns; there are things we know we know. We also know there are.
Visual Recognition Tutorial
The Simple Linear Regression Model: Specification and Estimation
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem, random variables, pdfs 2Functions.
Advanced data assimilation methods- EKF and EnKF Hong Li and Eugenia Kalnay University of Maryland July 2006.
Estimation and the Kalman Filter David Johnson. The Mean of a Discrete Distribution “I have more legs than average”
Continuous Random Variables and Probability Distributions
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
A comparison of hybrid ensemble transform Kalman filter(ETKF)-3DVAR and ensemble square root filter (EnSRF) analysis schemes Xuguang Wang NOAA/ESRL/PSD,
Lecture II-2: Probability Review
Comparison of hybrid ensemble/4D- Var and 4D-Var within the NAVDAS- AR data assimilation framework The 6th EnKF Workshop May 18th-22nd1 Presenter: David.
Hypothesis Testing in Linear Regression Analysis
Ensemble Data Assimilation and Uncertainty Quantification Jeffrey Anderson, Alicia Karspeck, Tim Hoar, Nancy Collins, Kevin Raeder, Steve Yeager National.
ESA DA Projects Progress Meeting 2University of Reading Advanced Data Assimilation Methods WP2.1 Perform (ensemble) experiments to quantify model errors.
EnKF Overview and Theory
Observing Strategy and Observation Targeting for Tropical Cyclones Using Ensemble-Based Sensitivity Analysis and Data Assimilation Chen, Deng-Shun 3 Dec,
Model Inference and Averaging
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Nonlinear Data Assimilation and Particle Filters
Eidgenössisches Departement des Innern EDI Bundesamt für Meteorologie und Klimatologie MeteoSchweiz Statistical Characteristics of High- Resolution COSMO.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss High-resolution data assimilation in COSMO: Status and.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Model dependence and an idea for post- processing multi-model ensembles Craig H. Bishop Naval Research Laboratory, Monterey, CA, USA Gab Abramowitz Climate.
Data Assimilation Using Modulated Ensembles Craig H. Bishop, Daniel Hodyss Naval Research Laboratory Monterey, CA, USA September 14, 2009 Data Assimilation.
2004 SIAM Annual Meeting Minisymposium on Data Assimilation and Predictability for Atmospheric and Oceanographic Modeling July 15, 2004, Portland, Oregon.
A unifying framework for hybrid data-assimilation schemes Peter Jan van Leeuwen Data Assimilation Research Center (DARC) National Centre for Earth Observation.
Applications of optimal control and EnKF to Flow Simulation and Modeling Florida State University, February, 2005, Tallahassee, Florida The Maximum.
MODEL ERROR ESTIMATION EMPLOYING DATA ASSIMILATION METHODOLOGIES Dusanka Zupanski Cooperative Institute for Research in the Atmosphere Colorado State University.
Data assimilation and forecasting the weather (!) Eugenia Kalnay and many friends University of Maryland.
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
July 11, 2006Bayesian Inference and Maximum Entropy Probing the covariance matrix Kenneth M. Hanson T-16, Nuclear Physics; Theoretical Division Los.
Local Predictability of the Performance of an Ensemble Forecast System Liz Satterfield and Istvan Szunyogh Texas A&M University, College Station, TX Third.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Machine Learning 5. Parametric Methods.
Continuous Random Variables and Probability Distributions
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
A Random Subgrouping Scheme for Ensemble Kalman Filters Yun Liu Dept. of Atmospheric and Oceanic Science, University of Maryland Atmospheric and oceanic.
The Unscented Kalman Filter for Nonlinear Estimation Young Ki Baik.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Computacion Inteligente Least-Square Methods for System Identification.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Estimating standard error using bootstrap
LOGNORMAL DATA ASSIMILATION: THEORY AND APPLICATIONS
Data Assimilation Research Testbed Tutorial
Probability Theory and Parameter Estimation I
Data Assimilation Theory CTCD Data Assimilation Workshop Nov 2005
Ch3: Model Building through Regression
Materials for Lecture 18 Chapters 3 and 6
Observation Informed Generalized Hybrid Error Covariance Models
Statistical Methods For Engineers
Filtering and State Estimation: Basic Concepts
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Volume 111, Issue 2, Pages (July 2016)
Parametric Methods Berlin Chen, 2005 References:
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Volume 90, Issue 10, Pages (May 2006)
Sarah Dance DARC/University of Reading
CHAPTER – 1.2 UNCERTAINTIES IN MEASUREMENTS.
Presentation transcript:

Craig H. Bishop Elizabeth A Satterfield Kevin T. Shanley, David Kuhl, Tom Rosmond, Justin McLay and Nancy Baker Naval Research Laboratory Monterey CA November 2, 2012 Hidden Error Variances and the optimal combination of static and flow dependent variances 1

Introduction: Definitions Error Variance: Mean of a large number of squared forecast errors. Flow Dependent Error Variance: Mean of a large number of squared forecast errors given a particular flow. (In order to obtain a large number of errors the “flow” or “condition” must repeat itself). Hidden Error Variance: A flow dependent error variance that is formally unobservable because the particular flow does not repeat itself. “A conundrum of predictability research is that while the prediction of flow dependent error distributions is one of its main foci, chaos hides flow dependent forecast error distributions from empirical observation.” Bishop and Satterfield (2012a,b, MWR, in press), Satterfield and Bishop (2012ab, to be submitted)

Binned Ensemble variance (mm^2) Spread-skill plot for COAMPS simulations relative to the control (solid line) and Mean (dashed line) o ET-ctl ET-ens h total precipitation (mm) Binned squared Error (mm)^2 Mean ET-ctl = Mean ET-ens = Spread-skill plot for COAMPS simulations similar to for 48-h total accumulated precipitation (mm)^2. Significant spread-skill relationships were found for all variables – including precipitation Previous work: spread-skill diagrams Collect innovations (ob – fcst) corresponding to similar ensemble variances into a bin. Compute bin averaged squared innovation. It should increase with ensemble variance. These diagrams do not reveal (a)the climatological range of true error variances, nor (b)the degree of variation of ensemble variance given a true error variance. 3

Overview Observations of hidden error variances using replicate systems. Empirical determination of key pdfs. Analytic model of statistical relationships of ensemble variances and true error variances. Use of (innovation, ensemble-variance) pairs to estimate parameters of analytic model. Estimation of optimal weights for Hybrid. Comparison of performance of Hybrid DA with weights from brute force tuning and weights from hidden error variance theory. Other Hybrid results Conclusions 4

What is the true flow dependent error variance ? Imagine an unimaginably large number of quasi-identical Earths. (Slartibartfast – Magrathean designer of planets, D. Adams, Hitchhikers …) 5

25000 Lorenz Model Replicates Reveal Hidden Error Variance Using a 10 variable Lorenz ’96 model with additive model error and a 20 member Ensemble Transform Kalman Filter (ETKF) data assimilation scheme, we created 25,000 independent time series of analyses and forecasts, each having the same true state but differing random draws of observation error. True Error Variances were then obtained for each spatio-temporal point by averaging the squared forecast error for this point across the 25,000 replicates. First demonstration of ETKF accurately predicting true flow dependent error variance in non-linear system. Scatter plot of ETKF ensemble variance from a single replicate system as a function of true error variance. The true error variance is estimated from all 25,000 replicate systems. The linear fit to the points on the scatter plot is governed by the equation. 6

Controlled accuracy of ensemble variances by degrading ETKF variances A primary objective is to show how pdf of true error variances given an imperfect ensemble variance changes as the accuracy of the ensemble variance changes. To do this, we created degraded ensemble variances by sampling a Gamma distribution with mean equal to the ETKF variance and relative variance determined by an “effective ensemble size” M. (b) M=4 (a) M=8 Examples of assumed likelihood gamma pdfs of ensemble variances with a mean of unity. Panel (a) is for an effective ensemble size of M=8, or equivalently, a relative variance of 2/7. Panel (b) is for an effective ensemble size of 4, or equivalently, a relative variance of 2/3. 7

Histograms of true error variance given an imperfect ensemble variance The histograms give an empirical estimate of the pdf of true error variances given a constrained range of sample variances for an 8 member ensemble. The ranges are given on each figure; they correspond to the 2nd and 34th bins, respectively, of 35 bins of true error variance. The solid lines give the fit of an inverse-gamma function to the distribution of true error variances in each bin. Inverse-gamma distribution is a very good fit to empirically derived histogram of true error variances given an ensemble variance for all ensemble variance categories. M=8 8

Climatological pdf of true error variances Prior climatological distribution of true error variances. Bars show the probability density histogram of forecast error variances. Solid line shows the fit of the pdf (eq 4) to the data. The thick dashed line marks the mean of both the pdf and the data. M=8 Inverse-gamma distribution gives a reasonable fit to empirically derived prior climatological pdf of true error variances. 9

Empirical estimation of pdf of true error variance given ensemble variance from trials (a) M=8, empirical(b) M=2, empirical Red lines depict empirical estimate of pdf of true error variance (ordinate axis) given fixed values of ensemble variance (abscissa axis). Thin green and blue lines give the mode and mean of the empirical estimates of the mode and mean of these estimates. Panels (a) and (b) show the empirical estimates for random sample ensembles of sizes M=2 and M=8, respectively. The grey shading gives an inverse- gamma pdf fit to the climatological pdf of true error variances. 10

An analytic model of hidden error variance Assumption 1: The error of the deterministic forecast is a random draw from a Gaussian distribution, whose true variance  i 2 is a random draw from a prior climatological inverse gamma pdf of error variances.

An analytic model of hidden error variance Assumption 2: Ensemble variances are drawn from a likelihood gamma pdf of ensemble variances with mean a(  i 2 -  min 2 )+s 2 min stochastic (b) M=4 (a) M=8

An analytic model of hidden error variance Bayes’ Theorem defines the posterior inverse gamma pdf of error variances given an imperfect ensemble variance s i 2 Climatological Prior Distribution Likelihood distribution of s 2 given a particular  2 13

(a) M=8, empirical (c) M=8, analytic(d) M=2, analytic (b) M=2, empirical Green lines give mode Blue lines give mean Given an ensemble variance, there are a broad range of possible true error variances. Current DA schemes require a single value. For the minimum error variance estimate, use the posterior mean. For the maximal likelihood estimate, … For QC, … 14

Posterior mean error variance is a Hybrid combination of static and ensemble variances Flow dependent ensemble variance Static climatological mean error variance i.As the stochastic variation of ensemble variance about the true variance goes to zero, the weight on the ensemble variance goes to 1. ii.If there is any imperfection in the flow-dependent ensemble variance, the optimal error variance estimate gives weight to the climatological covariance. iii.If there is no variance of the true error variance, the weight on the static variance goes to 1. Purely flow dependent error variance models are sub-optimal Implications for Ensemble DA? Implications for 4DVAR? 15

Problem 16

Solution: Equations that define hidden parameters from data assimilation output 17

Retrieved hidden parameters, var(s 2 ), s 2, a and M are shown in plots (a), (b), (c) and (d), respectively “Light grey bars: M=2, Dark grey bars: M=8 The “given” ensemble sizes in (d) are the random sample ensemble sizes used to degrade the quality of the ETKF ensemble variance. Equations recover hidden parameters “observed” by replicate systems “Observed” values are obtained from 175 DA cycles of the 25,000 “replicate systems” Minimum, mean and maximum of the values retrieved from 21 single system independent time series of with n=2,000,000.

Each plot summarizes information from 60 independent retrievals. The values marked as min, mean, max and std are the minimum, mean, maximum and standard-deviation of the values retrieved from 60 completely independent synthetically generated data sets. Recovery of min(sigma^2) is inaccurate when min(sigma^2) is small These tests pertain to synthetic data generated using the analytical model of hidden error variance The “specified” were set equal to values previously retrieved from Lorenz model experiments with a “given” M=8 and differing values of the model error q. Each retrieval is from 2,000,000 (innovation, ensemble-variance) pairs synthetically generated from specified distributions.

Variation of weights for mean of posterior distribution of true error variances with model error q and given effective ensemble size M. Black bars give the weights for the de- biased flow-dependent ensemble variance while grey bars give the corresponding weights for the static mean of the climatological error variances. Variation of optimal weights with model error and ensemble size, M Ensemble variance weight in dark grey. Static variance weight in light grey. q gives model error variance parameter M gives an “effective ensemble size” corresponding to the relative variance of a random normal ensemble of size M. The weight on the ensemble variance increases with ensemble size The weight on the static variance increases as model error variance increases 20

Use of recoveries in Hybrid DA Flow-dependent ensemble prediction Covariance matrix of unavoidable errors(?) Static covariance 21 As a start, let’s guess that the optimal weights for error variance prediction are “useful” weights for error covariance prediction.

Application to Hybrid DA: Lorenz model 1, perturbed observations. A suboptimal M=32 member ensemble is generated using a perturbed observations update. A climatological error covariance matrix (P f climatology ) is formed by collecting forecast errors for 100,000 time steps (using an 100% ensemble based error covariance matrix) P f hybrid is computed at each time step and used in the ETKF DA scheme to obtain an analysis, which is cycled. We compute the “best practice” hybrid and the “standard” hybrid for all alpha values for comparison. “Best Practice” hybrid: The ensemble based P f is corrected by a factor of Hybrid based on weights from theory performs as well as that obtained from brute force tuning of the weights.

The eq’s include a kurtosis term which is likely to be sensitive to data QC decisions based on the size of innovations. Fortunately, it may be shown that the weight for the ensemble variances is entirely independent of this term. The weight for the static term can then be obtained by insisting that the average of Hybrid variance be consistent with innovation variance. 23 Possible approaches to concerns in application of theory to Hybrid DA

NAVDAS-AR-Hybrid Results Low resolution results Hybrid weights computed for 6 distinct regions using new theory Alpha=0.5 Hybrid based on weights from theory performs as well as that obtained from brute force tuning of the weights.

NAVDAS-AR-Hybrid Results High Resolution Results alpha=.5 vs alpha=0 RMS wind error radiosonde verification results. Red means Hybrid outperformed non- Hybrid UTC 1 February 2011 to 0000 UTC 1 April (From Kuhl et al. 2012, in review)

NAVDAS-AR-Hybrid Results High Resolution Results alpha=.5 vs alpha=0 RMS wind error self-analysis verification results. Red means Hybrid outperformed non- Hybrid UTC 1 February 2011 to 0000 UTC 1 April (From Kuhl et al. 2012, in review)

NAVDAS-AR-Hybrid Results High Resolution Results alpha=.5 vs alpha=0 RMS error global radiosonde verification results. Red means Hybrid outperformed non- Hybrid UTC 1 February 2011 to 0000 UTC 1 April (From Kuhl et al. 2012, in review)

NAVDAS-AR-Hybrid Results High Resolution Results alpha=.5 vs alpha=0 Geopotential Height Anomaly Correlation (verification against self-analysis). Red means Hybrid outperformed non-Hybrid UTC 1 February 2011 to 0000 UTC 1 April (From Kuhl et al. 2012, in review)

Conclusions 1.A simple theory of the relationships between ensemble variances and true error variances has been developed. 2.This theory provides a new method for estimating from an archive of (innovation, ensemble-variance) pairs a.the prior climatological pdf of true error variances, b.the likelihood pdf of ensemble variances given a true error variance, c.the posterior pdf of true error variances given an ensemble variance, and d.the mean of (c) as a weighted sum of a static and an ensemble variance. 3.The pdfs 2a and 2c are well approximated by inverse-gamma distributions for the Lorenz 96 system. 4.Result (2d) provides a theoretical justification for Hybrid error covariance models that linearly combine static and flow-dependent covariances. 5.The ansatz that the optimal weights (2d) for error variances are useful for error covariances is true for the Lorenz ’96 system with perturbed obs Hybrid DA and a low resolution Hybrid-4DVAR version of the Navy’s operational DA scheme. 6.Enables Hybrid weights to be defined regionally at a fraction of the cost of weights obtained via trial and error. 7.QC and ensemble post-processing applications are also possible. 29