Robin Hogan Ewan OConnor, Natalie Harvey, Thorwald Stein, Anthony Illingworth, Julien Delanoe, Helen Dacre, Helene Garcon University of Reading, UK Chris.

Slides:



Advertisements
Similar presentations
ECMWF Slide 1Met Op training course – Reading, March 2004 Forecast verification: probabilistic aspects Anna Ghelli, ECMWF.
Advertisements

Slide 1ECMWF forecast User Meeting -- Reading, June 2006 Verification of weather parameters Anna Ghelli, ECMWF.
Slide 1ECMWF forecast products users meeting – Reading, June 2005 Verification of weather parameters Anna Ghelli, ECMWF.
Robin Hogan Alan Grant, Ewan O’Connor,
Robin Hogan, Chris Westbrook University of Reading Lin Tian NASA Goddard Space Flight Center Phil Brown Met Office Why it is important that ice particles.
Lidar observations of mixed-phase clouds Robin Hogan, Anthony Illingworth, Ewan OConnor & Mukunda Dev Behera University of Reading UK Overview Enhanced.
Quantifying sub-grid cloud structure and representing it GCMs
Ewan OConnor, Robin Hogan, Anthony Illingworth Drizzle comparisons.
Proposed new uses for the Ceilometer Network
Dynamical and Microphysical Evolution of Convective Storms (DYMECS) University: Robin Hogan, Bob Plant, Thorwald Stein, Kirsty Hanley, John Nicol Met Office:
Ewan OConnor, Robin Hogan, Anthony Illingworth, Nicolas Gaussiat Radar/lidar observations of boundary layer clouds.
Convection Initiative discussion points What info do parametrizations & 1.5-km forecasts need? –Initiation mechanism, time-resolved cell size & updraft.
Anthony Illingworth, + Robin Hogan, Ewan OConnor, U of Reading, UK and the CloudNET team (F, D, NL, S, Su). Reading: 19 Feb 08 – Meeting with Met office.
Robin Hogan Ewan OConnor, Anthony Illingworth University of Reading, UK Clouds radar collaboration meeting 17 Nov 09 Ground based evaluation of cloud forecasts.
Robin Hogan, Andrew Barrett
Robin Hogan Ewan OConnor, Anthony Illingworth University of Reading, UK Chris Ferro, Ian Jolliffe, David Stephenson University of Exeter, UK Verifying.
Radar/lidar observations of boundary layer clouds
Robin Hogan, Julien Delanoë, Nicky Chalmers, Thorwald Stein, Anthony Illingworth University of Reading Evaluating and improving the representation of clouds.
Robin Hogan Anthony Illingworth Ewan OConnor Nicolas Gaussiat Malcolm Brooks University of Reading Cloudnet products available from Chilbolton.
DYMECS: Dynamical and Microphysical Evolution of Convective Storms (NERC Standard Grant) University of Reading: Robin Hogan, Bob Plant, Thorwald Stein,
Robin Hogan Department of Meteorology University of Reading Cloud and Climate Studies using the Chilbolton Observatory.
How to test a model: Lessons from Cloudnet
Robin Hogan, Richard Allan, Nicky Chalmers, Thorwald Stein, Julien Delanoë University of Reading How accurate are the radiative properties of ice clouds.
Clouds processes and climate
Robin Hogan, Chris Westbrook University of Reading Lin Tian NASA Goddard Space Flight Center Phil Brown Met Office The importance of ice particle shape.
Robin Hogan Ewan OConnor University of Reading, UK What is the half-life of a cloud forecast?
Use of ground-based radar and lidar to evaluate model clouds
Robin Hogan (with input from Anthony Illingworth, Keith Shine, Tony Slingo and Richard Allan) Clouds and climate.
Robin Hogan Ewan OConnor Anthony Illingworth Department of Meteorology, University of Reading UK PDFs of humidity and cloud water content from Raman lidar.
Robin Hogan Ewan OConnor Damian Wilson Malcolm Brooks Evaluation statistics of cloud fraction and water content.
Robin Hogan Julien Delanoe University of Reading Remote sensing of ice clouds from space.
Clouds and their turbulent environment
Robin Hogan Ewan OConnor Anthony Illingworth Nicolas Gaussiat Malcolm Brooks Cloudnet Evaluating the clouds in European forecast models.
Robin Hogan Ewan OConnor Cloudnet level 3 products.
Robin Hogan Julien Delanoe, Ewan OConnor, Anthony Illingworth, Jonathan Wilkinson University of Reading, UK Quantifying the skill of cloud forecasts from.
DYMECS: Dynamical and Microphysical Evolution of Convective Storms (NERC Standard Grant) University of Reading: Robin Hogan, Bob Plant, Thorwald Stein,
Chapter 4: Basic Estimation Techniques
Chapter 7 Hypothesis Testing
Measuring the performance of climate predictions Chris Ferro, Tom Fricker, David Stephenson Mathematics Research Institute University of Exeter, UK IMA.
Introduction to data assimilation in meteorology Pierre Brousseau, Ludovic Auger ATMO 08,Alghero, september 2008.
Simple Linear Regression Analysis
Multiple Regression and Model Building
Daria Kluver Independent Study From Statistical Methods in the Atmospheric Sciences By Daniel Wilks.
Robin Hogan Anthony Illingworth Marion Mittermaier Ice water content from radar reflectivity factor and temperature.
1. The problem of mixed-phase clouds All models except DWD underestimate mid-level cloud –Some have separate “radiatively inactive” snow (ECMWF, DWD) –Met.
Federal Department of Home Affairs FDHA Federal Office of Meteorology and Climatology MeteoSwiss Quantitative precipitation forecasts in the Alps – first.
The DYMECS project A statistical approach for the evaluation of convective storms in high-resolution models Thorwald Stein, Robin Hogan, John Nicol, Robert.
The aim of FASTER (FAst-physics System TEstbed and Research) is to evaluate and improve the parameterizations of fast physics (involving clouds, precipitation,
COSMO General Meeting Zurich, 2005 Institute of Meteorology and Water Management Warsaw, Poland- 1 - Verification of the LM at IMGW Katarzyna Starosta,
Verification of extreme events Barbara Casati (Environment Canada) D.B. Stephenson (University of Reading) ENVIRONMENT CANADA ENVIRONNEMENT CANADA.
© Crown copyright Met Office Operational OpenRoad verification Presented by Robert Coulson.
1 On the use of radar data to verify mesoscale model precipitation forecasts Martin Goeber and Sean Milton Model Diagnostics and Validation group Numerical.
4IWVM - Tutorial Session - June 2009 Verification of categorical predictands Anna Ghelli ECMWF.
Sampling Uncertainty in Verification Measures for Binary Deterministic Forecasts Ian Jolliffe and David Stephenson 1EMS September Sampling uncertainty.
“High resolution ensemble analysis: linking correlations and spread to physical processes ” S. Dey, R. Plant, N. Roberts and S. Migliorini Mesoscale group.
Heidke Skill Score (for deterministic categorical forecasts) Heidke score = Example: Suppose for OND 1997, rainfall forecasts are made for 15 stations.
Robin Hogan, Ewan O’Connor, Andrew Barrett University of Reading, UK Maureen Dunn, Karen Johnson Brookhaven National Laboratory Objective assessment of.
Anthony Illingworth, Robin Hogan, Ewan O’Connor, U of Reading, UK Nicolas Gaussiat Damian Wilson, Malcolm Brooks Met Office, UK Dominique Bouniol, Alain.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
USING THE ROSSBY RADIUS OF DEFORMATION AS A FORECASTING TOOL FOR TROPICAL CYCLOGENESIS USING THE ROSSBY RADIUS OF DEFORMATION AS A FORECASTING TOOL FOR.
Evaluating forecasts of the evolution of the cloudy boundary layer using radar and lidar observations Andrew Barrett, Robin Hogan and Ewan O’Connor Submitted.
Verification of Rare Extreme Events
Testing the Differences between Means Statistics for Political Science Levin and Fox Chapter Seven 1.
Evaluation of three-dimensional cloud structures in DYMECS Robin Hogan John Nicol Robert Plant Peter Clark Kirsty Hanley Carol Halliwell Humphrey Lean.
Overview of WG5 activities and Conditional Verification Project Adriano Raspanti - WG5 Bucharest, September 2006.
Basic Estimation Techniques
Relationships inferred from AIRS-CALIPSO synergy
Basic Estimation Techniques
forecasts of rare events
Quantitative verification of cloud fraction forecasts
Presentation transcript:

Robin Hogan Ewan OConnor, Natalie Harvey, Thorwald Stein, Anthony Illingworth, Julien Delanoe, Helen Dacre, Helene Garcon University of Reading, UK Chris Ferro, Ian Jolliffe, David Stephenson University of Exeter, UK Verifying cloud and boundary-layer forecasts

How skillful is a forecast? Most model evaluations of clouds test the cloud climatology –What about individual forecasts? Standard measure shows ECMWF forecast half-life of ~6 days in 1980 and ~9 days in 2000 –But virtually insensitive to clouds! ECMWF 500-hPa geopotential anomaly correlation

Cloud has smaller-scale variations than geopotential height because it is separated by around two orders of differentiation: –Cloud ~ vertical wind ~ relative vorticity ~ 2 streamfunction ~ 2 pressure –Suggests cloud observations would be a more stringent test of models Geopotential height anomalyVertical velocity

Overview Desirable properties of verification measures (skill scores) –Usefulness for rare events –Equitability: is the Equitable Threat Score equitable? Testing the skill of cloud forecasts from seven models –What is the half life of a cloud forecast? Testing the skill of cloud forecasts from space –Which cloud types are best forecast and which types worst? Testing the skill of boundary-layer type forecasts –New diagnosis from Doppler lidar

Chilbolton Observations Met Office Mesoscale Model ECMWF Global Model Meteo-France ARPEGE Model KNMI RACMO Model Swedish RCA model Cloud fraction

Joint PDFs of cloud fraction Raw (1 hr) resolution –1 year from Murgtal –DWD COSMO model 6-hr averaging ab cd …or use a simple contingency table

a = 7194b = 4098 c = 4502d = DWD model, Murgtal Model cloud Model clear-sky a: Cloud hitb: False alarm c: Missd: Clear-sky hit Contingency tables For given set of observed events, only 2 degrees of freedom in all possible forecasts (e.g. a & b), because 2 quantities fixed: - Number of events that occurred n =a +b +c +d - Base rate (observed frequency of occurrence) p =(a +c)/n Observed cloud Observed clear-sky

Desirable properties of verification measures 1.Equitable: all random forecasts receive expected score zero –Constant forecasts of occurrence or non-occurrence also score zero 2.Difficult to hedge –Some measures reward under- or over-prediction 3.Useful for rare events –Almost all widely used measures are degenerate in that they asymptote to 0 or 1 for vanishingly rare events 4.Linear: so that can fit an inverse exponential for half-life 5.Useful for overwhelmingly common events… 6.Base-rate independent… 7.Bounded… For a full discussion see Hogan and Mason, Chapter 3 of Forecast Verification 2 nd edition:

Skill versus cloud-fraction threshold Consider 7 models evaluated over 3 European sites in –Two equitable measures: Heidke Skill Score and Log of Odds Ratio LOR implies skill increases for larger cloud-fraction threshold HSS implies skill decreases significantly for larger cloud- fraction threshold LORHSS

Extreme dependency scores Stephenson et al. (2008) explained this behavior: –Almost all scores have a meaningless limit as base rate p 0 –HSS tends to zero and LOR tends to infinity Solved with their Extreme Dependency Score: –Problem: inequitable and easy to hedge: just forecast clouds all the time Hogan et al. (2009) proposed the symmetric version SEDS: Ferro and Stephenson (2011) proposed Symmetric Extremal Dependence Index –where hit rate H = a/(a+c) and false alarm rate F = b/(b+d) –Robust for rare and overwhelmingly common events

Skill versus cloud-fraction threshold SEDS has much flatter behaviour for all models (except for Met Office which underestimates high cloud occurrence significantly) LORHSS SEDS

Skill versus height Verification using SEDS reveals: –Skill tends to slowly decrease at tropopause –Mid-level clouds (4-5 km) most skilfully predicted, particularly by Met Office –Boundary-layer clouds least skilfully predicted

Asymptotic equitability Equitable Threat Score is slightly inequitable for n < 30 –Should call it Gilbert Skill Score ORSS, EDS, SEDS & SEDI approach zero much more slowly with n –For events that occur 2% of the time need n > 25,000 before magnitude of expected score is less than 0.01 Hogan et al. (2010) showed that inequitable measures can be scaled to make them equitable but tricky numerical operation Alternatively be sure sample size is large enough and report CIs on verification measures For some measures, expected score for random forecast only tends to zero for a large number of samples: these are asymptotically equitable

League table Truly Equitable AsymptoticallyEquitableLinear or nearlylinearUseful for rareeventsUseful foroverwhelminglycommon events Equitably transformed SEDI (Tricky to implement) YYYYY Symmetric Extremal Dependence Index SEDI NYYYY Symmetric Extreme Dependency Score SEDS NYYYN Peirce Skill Score PSS / Heidke Skill Score HSS YYYNN Log of Odds Ratio LOR NYYNN Odds Ratio Skill Score LOR / Yules Q NYNNN Gilbert Skill Score GSS (formerly ETS) NYNNN Extreme Dependency Score EDS NNYYN Hit Rate H / False Alarm Rate FAR NNYNN Critical Success Index CSI NNNNN

Forecast half life Fit an inverse-exponential: –S 0 is the initial score and 1/2 is the half-life Noticeably longer half-life fitted after 36 hours –Same thing found for Met Office rainfall forecast (Roberts 2008) –First timescale due to data assimilation and convective events –Second due to more predictable large-scale weather systems days 2.9 days 2.7 days 2.9 days 2.7 days 3.1 days 2.4 days 4.0 days 4.3 days 3.0 d 3.2 d 3.1 d Met OfficeDWD

A-train verification: July 2006 Both models underestimate mid- and low-level clouds (partly a snow issue at ECMWF) GSS and LOR misleading: skill increases or decreases with cloud fraction SEDS and SEDI much more robust Highest skill: winter upper-troposphere mid-latitudes Lowest skill: tropical and sub-tropical boundary-layer clouds Tropical deep convection somewhere in between!

How is the boundary layer modelled? Met Office model has explicit boundary-layer types (Lock et al. 2000)

Doppler-lidar retrieval of BL type Usually the most probable type has a probability greater than 0.9 Now apply to two years of data and evaluate the type in the Met Office model Most probable boundary-layer type II: Stratocu over stable surface layer IIIb: Stratocumulus- topped mixed layer Ib: Stratus Harvey, Hogan and Dacre (2012)

Forecast skill random

Forecast skill: stability Surface layer stable? –Model very skilful (but basically predicting day versus night) –Better than persistence (predicting yesterdays observations) b a d c random

Forecast skill: cumulus Cumulus present (given the surface layer is unstable)? –Much less skilful than in predicting stability –Significantly better than persistence b a d c random

Forecast skill: decoupled Decoupled (as opposed to well-mixed)? –Not significantly more skilful than a persistence forecast b a dc random

Forecast skill: multiple cloud layers? Cumulus under statocumulus as opposed to cumulus alone? –Not significantly more skilful than a random forecast –Much poorer than cloud occurrence skill (SEDI ) b a dc random

Take-home messages Pressure is too easy to forecast; verify with clouds instead! Half life of cloud forecasts is days rather than 9-10 days ETS is not strictly equitable: call it Gilbert Skill Score instead But GSS and most others are misleading for rare events I recommend the Symmetric Extremal Dependence Index Global verifications shows mid-lat winter ice clouds have most skill, tropical boundary-layer clouds have no skill at all! Relevant publications –Cloud-forecast half-life: Hogan, OConnor & Illingworth (QJ 2009) –Asymptotic equitability: Hogan, Ferro, Jolliffe & Stephenson (WAF 2010) –SEDI: Ferro and Stephenson (WAF 2011) –Comparison of verification measures and calculation of confidence intervals: Hogan and Mason (2 nd Ed of Forecast Verification 2011) –Doppler-lidar BL type: Harvey, Hogan & Dacre (Submitted to QJRMS) –Global verification: Hogan, Stein, Garcon & Delanoe (ERL in prep)

Cloud fraction in 7 models Mean & PDF for 2004 for Chilbolton, Paris and Cabauw Illingworth et al. (BAMS 2007) 0-7 km –All models except DWD underestimate mid-level cloud –Some have separate radiatively inactive snow (ECMWF, DWD); Met Office has combined ice and snow but still underestimates cloud fraction –Wide range of low cloud amounts in models –Not enough overcast boxes, particularly in Met Office model

Skill-Bias diagrams Positive skill Random forecast Negative skill Best possible forecast ab cd Worst possible forecast Under-prediction No bias Over-prediction Random unbiased forecast Constant forecast of non-occurrence Constant forecast of occurrence ???????????????? Reality (n=16, p=1/4) Forecast - Hogan and Mason (2011)

Skill-bias diagram

Hedging Issuing a forecast that differs from your true belief in order to improve your score (e.g. Jolliffe 2008) Hit rate H=a/(a+c) –Fraction of events correctly forecast –Easily hedged by randomly changing some forecasts of non-occurrence to occurrence H=0.5 H=0.75 H=1

Some reportedly equitable measures HSS = [x-E(x)] / [n-E(x)]; x = a+dETS = [a-E(a)] / [a+b+c-E(a)] LOR = ln[ad/bc]ORSS = [ad/bc – 1] / [ad/bc + 1] E(a) = (a+b)(a+c)/n is the expected value of a for an unbiased random forecasting system Random and constant forecasts all score zero, so these measures are all equitable, right? Simple attempts to hedge will fail for all these measures

Extreme dependency score Stephenson et al. (2008) explained this behavior: –Almost all scores have a meaningless limit as base rate p 0 –HSS tends to zero and LOR tends to infinity They proposed the Extreme Dependency Score: –where n = a + b + c + d It can be shown that this score tends to a meaningful limit: –Rewrite in terms of hit rate H =a/(a +c) and base rate p =(a +c)/n : –Then assume a power-law dependence of H on p as p 0: –In the limit p 0 we find –This is useful because random forecasts have Hit rate converging to zero at the same rate as base rate: =1 so EDS=0 –Perfect forecasts have constant Hit rate with base rate: =0 so EDS=1

Extreme dependence scores Extreme Dependence Score –Stephenson et al. (2008) –Inequitable –Easy to hedge Symmetric EDS –Hogan et al. (2009) –Asymptotically equitable –Difficult to hedge Symmetric Extremal Dependence Index –Ferro and Stephenson (2011) –Base-rate independent –Robust for both rare and overwhelmingly common events

Expected values of a–d for a random forecasting system may score zero: –S[E(a), E(b), E(c), E(d)] = 0 But expected score may not be zero! –E[S(a,b,c,d)] = P(a,b,c,d)S(a,b,c,d) Width of random probability distribution decreases for larger sample size n –A measure is only equitable if positive and negative scores cancel Which measures are equitable? ETS & ORSS are asymmetric n = 16 n = 80

Possible solutions 1.Ensure n is large enough that E(a) > 10 2.Inequitable scores can be scaled to make them equitable: –This opens the way to a new class of non-linear equitable measures 3.Report confidence intervals and p-values (the probability of a score being achieved by chance)

Key properties for estimating ½ life We wish to model the score S versus forecast lead time t as: –where 1/2 is forecast half-life We need linearity –Some measures saturate at high skill end (e.g. Yules Q / ORSS) –Leads to misleadingly long half-life...and equitability –The formula above assumes that score tends to zero for very long forecasts, which will only occur if the measure is equitable

Different spatial scales? Convection? –Average temporally before calculating skill scores: –Absolute score and half-life increase with number of hours averaged Why is half-life less for clouds than pressure?

Forecast skill: Nocturnal stratocu Stratocumulus present (given a stable surface layer)? –Marginally more skilful than a persistence forecast –Much poorer than cloud occurrence skill (SEDI ) b a dc random