An evaluation of observationally based, high resolution gridded data sets over the continental United States Ruben Behnke – UMT Missoula, UW Madison Andrew Allstadt – UW Madison Jared Oyler – UMT Missoula Steve Vavrus – UW Madison
7/13/19952/2/199610/3/2000 Tmax Precipitation HOT day in MadisonCold day in MadisonVery Wet day Miami DataSet TMax_Obs ( ° C) Data ( ° C) TMax_Obs ( ° C) Data ( ° C) Precip_Obs (mm) Data (mm) DayMet JohnA Livneh Maurer NLDAS TopoWx (CPC) 1 st Motivation: Different data sets can provide much different data for the same query. Here’s a couple specific examples of where/when data sets differ from each other and obs… 2 nd Motivation: After having been asked to provide climate data for a myriad of research topics, I decided a formal analysis of which data set to either recommend or provide to the end user is needed. 3 rd Motivation: A formal analysis is needed in order to determine what needs to be improved, possible new research paths, etc.
Project Focus and Path 1) Increasing number of gridded, daily data sets out there, but which one to use? 2)Focus on END USERS needs (how well gridded data match station data only)! – not designed to take into account interpolation algorithms, stations included, etc. 3)Consider ‘Observations’ to be actual GHCN/COOP/etc. station data 4)Focus on ‘extreme’ indices (CLIMDEX) as they are the hardest to model, but BIOCLIM indices, as well as several other measures are being calculated Large scale; ~7000 precipitation, ~5000 temperature stations
Some results… (from a first study using 119 stations from around the country)
Introductory CLIMDEX figures – these are meant to indicate variation among data sets All examples are for Madison, WI.
CLIMDEX, continued…
How well does each data set model daily Precipitation from 1981 – 2010? (based on ‘average daily U.S. Precipitation’ derived from 119 stations) Histograms BLUE – Station GREY - MODEL Conditional Quantiles - For each bin in the histogram, the median, 25 th /75 th, and 10 th /90 th percentiles are calculated. Why is there a spike in 4 of these data sets for this bin?
How well does each data set model daily Precipitation from 1981 – 2010? (based on ‘average daily U.S. Precipitation’ derived from 119 stations) Taylor Diagram Shows the RMSE, correlation, and standard deviation of a modeled data series relative to an observed data series Example The observed RMSE is 0, correlation is 1, and standard deviation is 1 (by definition) Relative to the observed data, DayMet’s correlation is 0.79, RMSE is 0.57, and normalized standard deviation is 0.96
How well does each data set model daily TMax from 1981 – 2010? (based on ‘average daily U.S. Tmax’ derived from 119 stations)
How well does each data set model daily TMax from 1981 – 2010? (based on ‘average daily U.S. Tmax’ derived from 119 stations) So, when averaging across many stations, the data sets do a good job for temperature. But precipitation is more difficult and this showed in the Taylor and quantile diagrams.
What happens when we look at individual stations? Precipitation (observed daily data vs daily data corresponding to grid cell where station is located)
Maximum Temperature (observed daily data vs daily data corresponding to grid cell where station is located)
Let’s take a look at Madison, WI in more detail. PrecipitationMaximum Temperature Individual grid cells, even those with a first order station used in the interpolation, are much harder to model There’s that spike in the downscaled data again!
May be stating the obvious, but… 1)Fairly easy to get good temporal and spatial averages, totals, etc. Much more difficult to model daily values at individual grid cells (even those which contain a station). 2)Precipitation is much more difficult to model than temperature. 3)Higher resolution does not necessarily equal better data. 4)Choosing a graphic or statistic to: a) analyze data and/or b) communicate results isn’t straightforward as user’s needs vary (extremes vs. means, station vs. region, etc.) Some First Results 5)The “best” data set tends to vary by location and variable 6)An “overall best” data set… Ben Livneh (??) – also the newest 7)Much more specific results coming…
Future work… Expand analysis to 7000 precipitation, 5000 temperature stations Aggregate results spatially by topography, ecoregion, etc. Add new/upcoming data sets (daily PRISM, Dan McKenney, etc.) Include regional data sets (??) Maps, Portrait Diagrams, Time Series, etc. Comments, questions, ideas are all welcome! Thank You!
Seasonal Maximum Temperature (119 Station Mean) Seasonal Precipitation (119 Station Mean)
TMax Data SetnFAC2MBMGENMBNMGERMSErCOE 1 DayMet JohnA Livneh Maurer NLDAS TopoWx A few of the many more statistics that can be used…