Download presentation
Presentation is loading. Please wait.
3
May 30, 2003 Tony Eckel, Eric Grimit, and Cliff Mass UW Atmospheric Sciences This research was supported by the DoD Multidisciplinary University Research Initiative (MURI) program administered by the Office of Naval Research under Grant N00014-01-10745.
4
Overview Review Ensemble Forecasting Theory and Introduce UW’s SREFs Discuss Results of Model Deficiencies on SREF - Need for Bias Correction - Impact on ensemble spread - Impact on probabilistic forecasts skill Conclusions
5
T The true state of the atmosphere exists as a single point in phase space that we never know exactly. A point in phase space completely describes an instantaneous state of the atmosphere. For a model, a point is the vector of values for all parameters (pres, temp, etc.) at all grid points at one time. An analysis produced to run a model like the eta is in the neighborhood of truth. The complete error vector is unknown, but we have some idea of its structure and magnitude. e Chaos drives apart the forecast and true trajectories…predictability error growth. EF can predicted the error magnitude and give a “probabilistic cloud” of forecasts. 12h forecast 36h forecast 24h forecast 48h forecast T 48h verification phase space
6
e a u c j t g n M T T Analysis Region 48h forecast Region 12h forecast 36h forecast 24h forecast Diagram for: PME ACME core or ACME core+ phase space Plug each IC into the MM5 to create an ensemble of mesoscale forecasts (cloud of future states encompassing truth). 1) Reveal uncertainty in forecast 2) Reduce error by averaging M 3) Yield probabilistic information
7
e a u c j t g n c T M T Analysis Region 48h forecast Region phase space ACME’s Centroid
8
e n a c u t g T j M T Analysis Region 48h Forecast Region e a u c j t g n c phase space ACME’s Mirrored Members
9
FP = 93% Parameter Threshold (EX: precip > 0.5”) FP = ORF = 72% Frequency Initial State Forecast Probability from an Ensemble EF provides an estimate (histogram) of truth’s Probability Density Function (red curve). In a large, well-tuned EF, Forecast Probability (FP) = Observed Relative Frequency (ORF) 24hr Forecast State48hr Forecast State Frequency In practice, things get wacky from Under-sampling of the PDF (too few ensemble members) Poor representation of initial uncertainty Model deficiencies -- Model bias causes a shift in the estimated mean -- Sharing of model errors between EF members leads to reduced variance EF’s estimated PDF does not match truth’s PDF, and Fcst Prob Obs Rel Freq
10
UW’s Ensemble of Ensembles # of EF Initial Forecast Forecast Name Members Type Conditions Model(s) Cycle Domain ACME 17SMMA 8 Ind. Analyses, “Standard” 00Z 36km, 12km 1 Centroid, MM5 8 Mirrors ACME core 8SMMA Independent “Standard” 00Z 36km, 12km Analyses MM5 ACME core+ 8PMMA “ “ 8 MM5 00Z 36km, 12km variations PME 8 MMMA “ “ 8 “native” 00Z, 12Z 36km large-scale Homegrown Imported ACME: Analysis-Centroid Mirroring Ensemble PME: Poor Man’s Ensemble MM5: PSU/NCAR Mesoscale Modeling System Version 5 SMMA: Single Model Multi-Analysis PMMA: Perturbed-model Multi-Analysis MMMA: Multi-model Multi-Analysis
11
Resolution ( ~ @ 45 N ) Objective Abbreviation/Model/Source Type Computational Distributed Analysis gfs, Global Forecast System, SpectralT254 / L641.0 / L14 SSI National Centers for Environmental Prediction~55km~80km3D Var cmcg, Global Environmental Multi-scale (GEM),SpectralT199 / L281.25 / L113D Var Canadian Meteorological Centre ~70km ~100km eta, Eta limited-area mesoscale model, Finite12km / L60 90km / L37SSI National Centers for Environmental Prediction Diff.3D Var gasp, Global AnalysiS and Prediction model,SpectralT239 / L291.0 / L11 3D Var Australian Bureau of Meteorology~60km~80km jma, Global Spectral Model (GSM),SpectralT106 / L211.25 / L13OI Japan Meteorological Agency~135km~100km ngps, Navy Operational Global Atmos. Pred. System,SpectralT239 / L301.0 / L14OI Fleet Numerical Meteorological & Oceanographic Cntr. ~60km~80km tcwb, Global Forecast System,SpectralT79 / L181.0 / L11 OI Taiwan Central Weather Bureau~180km~80km ukmo, Unified Model, Finite5/6 5/9 /L30same / L123D Var United Kingdom Meteorological Office Diff.~60km “Native” Models/Analyses of the PME
12
Design of ACME core+ 8 5 3 2 5 3 2 2 8 8 = 921,600 Total possible combinations:
13
Total of 129, 48-h forecasts (Oct 31, 2002 – Mar 28, 2003) all initialized at 00z - Missing forecast case days are shaded Parameters: - 36 km Domain: Mean Sea Level Pressure (MSLP), 500mb Geopotential Height (Z500) - 12 km Domain: Wind Speed @ 10m (WS10), Temperature at 2m (T2) Research Dataset 36 km Domain (151 127) 12 km Domain (101 103) Verification: - 36 km Domain: centroid analysis (mean of 8 independent analyses, available at 12h increments) - 12 km Domain: ruc20 analysis (NCEP 20 km mesoscale analysis, available at 3h increments) NovemberDecemberJanuary February March
15
cmcg* The ACME Process STEP 1: Calculate best guess for truth (the centroid) by averaging all analyses. STEP 2: Find error vector in model phase space between one analysis and the centroid by differencing all state variables over all grid points. STEP 3: Make a new IC by mirroring that error about the centroid. cmcg C cmcg* Sea Level Pressure (mb) ~1000 km 1006 1004 1002 1000 998 996 994 cent 170°W 165°W 160°W 155°W 150°W 145°W 140°W 135°W eta ngps tcwb gasp avn ukmo cmcg
16
MSLP analysis south of the Aleutians at 00Z on Jan 16, 2003 tcwb centroid centroid + (centroid tcwb)
17
bias correction…
18
Overview The two flavors of model deficiencies play a big role in SREF: 1) Systematic: Model bias is a significant fraction of forecast error and must be removed. 2) Stochastic: Random model errors significantly increase uncertainty and must be accounted for. Bias Correction: A simple method gives good results Model Error*: Impact on ensemble spread Final Results: Impact of both on probabilistic forecasts skill * bias-corrected
19
Often difficult to completely remove bias within a model’s code Systematic but complex; involving numerics, parameterizations, resolution, etc. Depend upon weather regime (time of day, surface characteristics, stability, moisture, etc.) Cheaper and easier to remove bias through post-processing Sophisticated routines such as MOS require long training periods (years) The bulk of bias can be removed with the short term mean error Need for Bias Removal NGPS Forecast vs Analysis Data Info Single model grid point in eastern WA Verification: centroid analysis 70 forecasts (Nov 25, 2002 – Feb 7, 2003) Lead time = 24h GASP Forecast vs AnalysisGFS Forecast vs Analysis GFS-MM5 Forecast vs Analysis
20
Training Period Bias-corrected Forecast Period Training Period Bias-corrected Forecast Period Training Period Bias-corrected Forecast Period Gridded Bias Removal N number of forecast cases (14) f i,j,t forecast at grid point (i, j ) and lead time (t) o i,j verifying observation For the current forecast cycle: 1) Calculate bias at every grid point and lead time using previous 2 weeks’ forecasts 2) Post-process current forecast to correct for bias: f i,j,t bias-corrected forecast at grid point (i, j ) and lead time (t) * NovemberDecemberJanuary February March
21
Spatial and Temporal Dependence of Bias GFS-MM5 MSLP Bias at f24 Common Bias Forecast Error > 1 too low < 1 too high
22
Spatial and Temporal Dependence of Bias GFS-MM5 MSLP Bias at f36 Common Bias Forecast Error > 1 too low < 1 too high
23
Bias Correction Results biased bias-corrected PME
24
ACME core Bias Correction Results biased bias-corrected
25
ACME core+ Bias Correction Results biased bias-corrected
26
Lead Time (hours) Verification Rank Probability Verification Rank Histogram Record of where verification fell (i.e., its rank) among the ordered ensemble members: Flat Well calibrated EF (truth’s PDF matches EF PDF) U’d Under-dispersive EF (truth “gets away” quite often) Humped Over-dispersive EF
27
Lead Time (hours) Verification Rank Probability Verification Rank Histogram Record of where verification fell (i.e., its rank) among the ordered ensemble members: Flat Well calibrated EF (truth’s PDF matches EF PDF) U’d Under-dispersive EF (truth “gets away” quite often) Humped Over-dispersive EF
28
Model Error Impact on ensemble spread …
29
Ensemble Dispersion (MSLP) Analysis Error Error Growth due to Analysis Error Ensemble Variance (mb 2 ) Error Growth due to Model Error EF Mean’s MSE adjusted by n / n+1 to account for small sample size MSE of EF MEAN
30
Lead Time (hours) Verification Rank Probability Verification Rank Histogram Record of where verification fell (i.e., its rank) among the ordered ensemble members: Flat Well calibrated EF (truth’s PDF matches EF PDF) U’d Under-dispersive EF (truth “gets away” quite often) Humped Over-dispersive EF
31
Lead Time (hours) Verification Rank Probability Verification Rank Histogram Record of where verification fell (i.e., its rank) among the ordered ensemble members: Flat Well calibrated EF (truth’s PDF matches EF PDF) U’d Under-dispersive EF (truth “gets away” quite often) Humped Over-dispersive EF
32
Lead Time (hours) Verification Rank Probability Verification Rank Histogram Record of where verification fell (i.e., its rank) among the ordered ensemble members: Flat Well calibrated EF (truth’s PDF matches EF PDF) U’d Under-dispersive EF (truth “gets away” quite often) Humped Over-dispersive EF
33
Impact of both on probabilistic forecasts skill
34
Explain probabilistic forecast verification
35
P(MSLP < 1001mb) by uniform ranks method, 36h lead time, Sub-domain A Reliability Diagram Comparison PME ACME core Sample Climatology
36
36km Verification Sub-domain A
37
~6hr improvement by bias correction ~11hr improvement by multi-model diversity and “global” error growth Skill vs. Lead Time (Sub-domain A)
38
36km Verification Sub-domain B
39
Skill vs. Lead Time (all bias –corrected) 36km Sub-domain A (2/3 ocean) P(MSLP < 1001mb) Sample Climatology 23% 36km Sub-domain B (mostly land) P(MSLP < 1011mb) Sample Climatology 20% ~11hr improvement by PME ~22hr improvement by PME ~3hr improvement by ACME core+
40
Conclusions Caveats: - Consider non-optimal ICs and small EF size? ( still fair comparison between PME and ACME core ) - What about higher skill of PME members? ( not so dramatic after bias correction ) - Does higher resolution of MM5 make comparison unfair? ( fitting to lower res. would decrease 2 ) Why bother with ACME core ? PME is certainly more skilled at the synoptic level, but has little to no mesoscale info. Should these conclusions hold true for mesoscale? YES! Model deficiencies for surface variables (precip, winds, temperature) can be even stronger, so the effect on SREF may be even greater. Demonstrating that is now the focus of my research… P(precip > 0.25” in 6hr) An ensemble’s skill is dramatically improved by: 1) Correcting model bias 2) Accounting for model uncertainty
41
UW’s Ensemble of Ensembles # of EF Initial Forecast Forecast Name Members Type Conditions Model(s) Cycle Domain ACME 17SMMA 8 Ind. Analyses, “Standard” 00Z 36km, 12km 1 Centroid, MM5 8 Mirrors ACME core 8SMMA Independent “Standard” 00Z 36km, 12km Analyses MM5 ACME core+ 8PMMA “ “ 8 MM5 00Z 36km, 12km variations PME 8 MMMA“ “ 8 “native” 00Z, 12Z 36km large-scale ACNE 9hybrid?8 Ind. Analyses, 9 MM5 00Z, 12Z 36km, 12km MMMA 1 Centroid variations PMMA ACNE: Analysis-Centroid Nudged Ensemble SMMA: Single Model Multi-Analysis PMMA: Perturbed-model Multi-Analysis MMMA: Multi-model Multi-Analysis Proposed
42
?
43
Skill Score (SS ) Details (reliability) (resolution) (uncertainty) Brier Score Brier Skill Score n: number of data pairs FP i : forecast probability {0.0…1.0} ORF i : observation {0.0 = yes, 1.0 = no} M : number of probability bins (normally 11) N : number of data pairs in the bin FP * i : binned forecast probability {0.0, 0.1,…1.0} ORF * i : observation for the bin {0.0 = yes, 1.0 = no} SC : sample climatology (total occurrences / total forecasts) Decomposed Brier Score (uses binned FP as in rel. diag.) Skill Score
44
FP = 77.1% For a certain threshold, say Ws 20kt, the FP is then simply the area under the PDF to the right (1 p value) Ws = {16.5 21.1 27.3 29.3 33.4 37.4 40.2 47.8} Wind Speed (kt) Frequency Ideal Calculation of Forecast Probability (FP) Given a very large ensemble, a PDF could be found a grid point for any parameter (e.g., wind speed, Ws). Unfortunately, we work with very small ensembles so we can’t make a good estimate of the PDF. Plus, we often do not even know what PDF shape to fit. So we are forced to estimate FP by other means, for a set of Ws forecasts at a point such as: Note: These are random draws from the PDF above
45
FP = 7/8 = 87.5% FP = 7/9 + [ (21.1 – 20.0) / (21.1 – 16.5) ] * 1/9 = 80.4% 16.521.127.329.333.437.440.247.8 8/8 7/8 6/8 5/8 4/8 3/8 2/8 1/8 0/8 Democratic Voting FP Uniform Ranks FP 9/9 8/9 7/9 6/9 5/9 4/9 3/9 2/9 1/9 0/9 “pushes” FP towards the extreme values, so high FP is normally over-forecast and low FP is normally under-forecast. a continuous, more appropriate approximation. 20.0
46
FP = [ (1 – G CDF (50.0)) / (1 – G CDF (47.8)) ] * 1/9 = 8.5% a b fraction = a / b 16.521.127.329.333.437.440.247.8 Uniform Ranks FP 9/9 8/9 7/9 6/9 5/9 4/9 3/9 2/9 1/9 0/9 FP When Threshold Falls in an Extreme Rank 0.0, - 50.0 Use the tail of a Gumbel PDF to approximate the fraction for the last rank.
47
FP = [ (1 – CDF(50.0)) / (1 – CDF(47.8)) ] * 0.17 = 13.0% 16.521.127.329.333.437.440.247.8 Weighted Ranks FP 1.0 0.83 0.72 0.62 0.54 0.45 0.36 0.27 0.17 0.0 Calibration by Weighted Ranks 0.0, - 50.0 Use the verification rank histogram from past cases to define non-uniform, “weighted ranks”. The ranks to sum up and fraction of the rank where the threshold falls are found the same way as with uniform ranks, but now the probability within each rank is the chance that truth will occur there.
48
Sample Climatology Skill Zone Uniform Ranks vs. Democratic Voting Data Info P(MSLP < 1002mb) Verification: centroid analysis 70 forecasts (Nov 25, 2002 – Feb 7, 2003) Applied 2-week, running bias correction 36km, Outer Domain Lead time = 48h UR DV
49
References
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.