User-Focused Verification Barbara Brown* NCAR July 2006

User-Focused Verification Barbara Brown* NCAR July 2006 bgb@ucar.edu

WAS*IS July 20062 Concepts of user-focused verification Barbara Brown, NCAR; bgb@ucar.edu Purposes of verification* –Administrative –Scientific –Economic *Brier and Allen, 1951

WAS*IS July 20063 Concepts of user-focused verification Purposes of verification (Brier and Allen, 1951) –Administrative –Scientific –Economic Postulate: –Most verification, to date, serves only the first purpose (administrative) –This is especially true for verification of operational systems

WAS*IS July 20064 Historical perspective The Finley example U.S. Army (Signal Service/Corps) 1877-~1920 Tornado predictions 1884-1885 –Two 8-h outlooks per day –Spotter reports (~1,000 reporters) –18 districts, eastern U.S.; 4 parts in each John Park Finley 1854-1943 (from Galway, 1985; BAMS) Lebanon, KS, 1902

WAS*IS July 20065 The Finley example Finley forecasts 96.6% accurate “Accuracy” if no tornado forecasts issued: 98.2% Obs. YesObs. NoSum Fcst.Yes2872100 Fcst. No2326802703 Sum5127522803

WAS*IS July 20066 The Finley example: Outcomes The first (?) scientific discussion of verification Numerous verification measures developed (e.g., Equitable Threat Score, Heidke skill score) Many issues raised –Definition of forecast “event” –Quality of observations –Baselines of no skill –“Dimensionality” of the verification problem –Specifying purpose of verification –Use and value of forecasts –Asymmetric costs of misclassification Source: Murphy, 1996 (WAF, 11, 3-20)

WAS*IS July 20067 Current situation Not much has changed… Measures-based approaches applied in practice Operational verification focuses on “management” needs and model-centric applications of verification –i.e., not on diagnostic or user-focused approaches Focus is on –A few traditional measures –Aggregated statistics –A few parameters (e.g., 500 mb ht, T, PoP)

WAS*IS July 20068 Current situation cont. Model verification “drives” choices in model parameterizations, development, etc. –Ex: verification of models using RMSE or anomaly correlation applied to 500 mb heights leads to particular choices in model development and evolution (which may – or may not – be intended) Uncertainty in verification measures is rarely estimated Forecast use/value rarely considered

WAS*IS July 20069 Uncertainty in verification measures Model precipitation example: Equitable Threat Score (ETS) Confidence intervals take into account various sources of error, including sampling and observational Computation of confidence intervals for verification stats is not always straight-forward

WAS*IS July 200610 Forecast Evaluation: Quality vs. Value Forecast QUALITY (Verification) Forecast VALUE (User studies; Impacts) Forecast EVALUATION

WAS*IS July 200611 What’s wrong with the traditional, measures- oriented approach? Traditional verification measures (e.g., RMSE, CSI, ETS) Provide overall monitoring of forecast performance But Measure only limited attributes of forecast quality Tend to reward “smooth” forecasts Do not provide information about what went wrong with a forecast (they only say that it was wrong) Cannot diagnose how the forecast can be “fixed” or feed into forecast development process Are not “informative” to users

WAS*IS July 200612 Challenges and issues: Traditional verification approaches First four forecasts have POD=0; FAR=1; CSI=0 –i.e., all are equally “BAD” Fifth forecast has POD>0, FAR 1 Traditional verification approach identifies “worst” forecast as the “best” OF OF OFO F F O

WAS*IS July 200613 High vs. low resolution Which rain forecast would you rather use? Mesoscale model (5 km) 21 Mar 2004 Sydney Global model (100 km) 21 Mar 2004 Sydney Observed 24h rain RMS=13.0 RMS=4.6 From E. Ebert “Smooth” forecasts generally “Win” according to traditional verification approaches.

WAS*IS July 200614 Why do users need verification information? Improve forecasts Determine whether to use a forecast or forecasting system Heeding/ignoring warnings Interpretation of forecast (“What does a forecast of 32 really mean?”) Input to decisions and/or decision- support systems –Economic and forecast value implications

WAS*IS July 200615 User-focused verification: Good forecast or Bad forecast? FO

WAS*IS July 200616 User-focused verification: Good forecast or Bad forecast? FO If I’m a water manager for this watershed, it’s a pretty bad forecast…

WAS*IS July 200617 User-focused verification: Good forecast or Bad forecast? If I’m an aviation traffic strategic planner… It might be a pretty good forecast O AB OF Flight Route Different users have different ideas about what makes a good forecast

WAS*IS July 200618 An initial goal: Diagnostic evaluation approaches Identify and evaluate meaningful attributes of the forecasts –Example questions: What is the typical location error? Size error? Intensity error? Provide detailed information about forecast quality –What went wrong? What went right? –How can the forecast be improved? –How do 2 forecasts differ from each other, and in what ways is one better than the other?

WAS*IS July 200619 Examples of alternative (more user-focused) diagnostic approaches (spatial forecasts) Scale-separation approaches –How does performance change as the resolution changes? Entity-based verification –What are the major contributors to forecast error? “Fuzzy” approaches –Take into account observational error, impacts of displacement errors Composite approach –Evaluate systematic errors Object-based verification –Examine forecasts’ ability to reproduce certain attributes (e.g., location, shape, intensity)

WAS*IS July 200620 Object-based verification example Locations: Forecast objects are Too far North (except B) Too far West (except C) Precipitation intensity: Median intensity is too large Extreme (0.90 th) intensity is too small Size: Forecasts C and D are too small Forecast B is somewhat too large Matching: Two small observed objects were not matched AoAo BoBo CoCo DoDo AfAf BfBf CfCf DfDf ForecastObserved POD = 0.27 FAR = 0.75 CSI = 0.34

WAS*IS July 200621 Composite verification example Average rain (mm) given an event was predicted FCST-shade OBS-contour FCST-shade OBS-contour Average rain (mm) given an event was observed Accepted for public release: 7530-03-70 From J. Nachamkin

WAS*IS July 200622 A new paradigm for verification: Levels of User-focus Level 0 –Measures-oriented aggregated summaries of performance –1 or 2 traditional statistics (e.g., RMSE, CSI) –Uncertainty in verification measures not considered –Uses: Administrative

WAS*IS July 200623 A new paradigm for verification: Levels of User-focus Level 1 –Broad diagnostic approaches applied –A more complete view of forecast performance –Distributions of errors presented for meaningful subsets (temporal, spatial) –Stratified into relevant categories –Some uncertainty estimates –Uses/Users: Admin, Forecast developers, some users

WAS*IS July 200624 A new paradigm for verification: Levels of User-focus Level 2 –Features-based verification applied –Detailed information about forecast attributes –Attribute information can be tailored to meet specific types of information needs –Results stratified into relevant categories –Uncertainty information provided –Uses/Users: Admin, Forecast developers, Broad range of users AoAo BoBo CoCo DoDo AfAf BfBf CfCf DfDf

WAS*IS July 200625 A new paradigm for verification: Levels of User-focus Level 3 –Users identify the type of forecast “quality” or “performance” information that is needed for particular decisions, or as input to a decision support system –Verification is tailored to meet the needs of specific users Level 4 –Economic or cost-loss models or survey methods are used to assess the value or benefits of particular forecasts for specific users and applications

WAS*IS July 200626 To summarize… What makes a good forecast depends on the user and the decision to be made –Corollary: Different users need different types of verification information Forecast verification measures are uncertain, and that uncertainty should be estimated and communicated Approaches are available (or could be developed) that more appropriately represent potential forecast value and would be useful for the process of estimating value

User-Focused Verification Barbara Brown* NCAR July 2006

Similar presentations

Presentation on theme: "User-Focused Verification Barbara Brown* NCAR July 2006"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

User-Focused Verification Barbara Brown* NCAR July 2006

Similar presentations

Presentation on theme: "User-Focused Verification Barbara Brown* NCAR July 2006"— Presentation transcript:

Similar presentations

About project

Feedback