Dataset Development within the Surface Processes Group David I. Berry and Elizabeth C. Kent
Outline Aims of talk Global datasets developed within the subgroup How these datasets are used Steps in developing the datasets Future work and related activities
Overview of dataset(s) NOCS v2 mean meteorology and surface flux dataset (builds on previous version (NOCS1 and 1a) of dataset) Estimates of meteorological and surface fluxes and their uncertainty Monthly, 1 degree. Daily fields available on request 1970 – present (1950 onwards some variables)
Dataset Overview Air temperature
Dataset Overview Specific humidity
Dataset Overview Sea surface temperature
Dataset Overview Wind speed
Dataset Overview Cloud cover
Dataset Overview Sensible heat flux
Dataset Overview Latent heat flux
Dataset Overview Shortwave radiation
Dataset Overview Longwave radiation
Dataset Overview Net heat flux
Overview of dataset(s) NOCS v2 mean meteorology and surface flux dataset (builds on previous version (NOCS1 and 1a) of dataset) Estimates of meteorological and surface fluxes and their uncertainty Monthly, 1 degree. Daily fields available on request 1970 – present (1950 onwards some variables)
Usage summary NOCS2 dataset released in 2009 Since release, ~400GB of data downloaded by 180+ users (17 MB = 1 variable year) Wide variety of institutes (> 100 unique institutes), ranging from universities to government departments, met agencies and military Users located in 23 different countries 26 papers citing dataset published since 2009
Usage summary
Example uses Global and regional climate change assessments, e.g. BAMS State of the Climate, MCCIP ARC, DEFRA Charting Progress Validation of satellite fields, e.g. satellite humidity estimates Estimation and comparison to ocean transports
Dataset Development Steps 1.Decide on goals / end use for dataset – Independence from other estimates – Realistic uncertainty estimates – Accurate estimates of the fluxes – Example uses given above 2.Choice of data and characterisation 3.Decide on optimal averaging / gridding method 4.Determine a-priori information, e.g. length scales 5.Perform gridding and validate results
Figure after NCEO Theme 7 (Data assimilation) Proposal Dataset Development
Choice of data and characterisation Each source of data / observation used in dataset construction needs to be characterised in terms of bias and uncertainty For NOCS v2 this has meant characterising the uncertainties in observations of 6 different meteorological variables – As an example, uncertainty (i.e. random errors) in individual observations have been estimated using a semi-variogram analysis
Semi-variogram analysis Variance (i.e. MSE) between paired observations plotted as a function of separation distance (variogram – see example to right) Contribution from spatial variations removed by extrapolating to zero separation distance Differences at zero separation due to small scale variability (assumed to be small) and random errors
Semi-variogram analysis Variance (i.e. MSE) between paired observations plotted as a function of separation distance (variogram – see example to right) Contribution from spatial variations removed by extrapolating to zero separation distance Differences at zero separation due to small scale variability (assumed to be small) and random errors
Semi-variogram analysis Analysis performed globally for each variable and month Variety of different models tested Appropriate model will depend on variable of interest and controlling physics
Choice of data and characterisation Each source of data / observation used in dataset construction needs to be characterised in terms of bias and uncertainty For NOCS v2 this has meant characterising the uncertainties in observations of 6 different meteorological variables Bias adjustments also developed for different variables Background references can be found at
Optimal averaging / gridding method For NOCS v2 dataset we want – Independent (and accurate) estimates of each meteorological parameter – Realistic uncertainty estimates – Spatially complete fields – Accurate estimates of the fluxes This has lead to the choice of using optimal interpolation (OI, simple kriging) Length scales used based on literature review, but variogram analysis could (should) be used Gridding performed daily, fluxes calculated using daily fields For further information see Berry and Kent (2009, 2011), also
Optimal averaging / gridding method
Validation Comparison with input data Comparison with other (independent) estimates Cross validation
Comparison to VOS observations (January 1993) Differences between observations and interpolated values in NOCS2.0
Comparison to WHOI UOP Moorings Data from Woods Hole Upper Ocean Mooring Data Archive at
Comparison to WHOI UOP Moorings
North Atlantic Latent Heat Flux (Wm -2 )
Cross validation: bias (SST, 1974)
Cross validation: uncertainty (air temperature, 1974) Values 1 uncertainty too small (note: results unreliable in poorly sampled regions)
Validation - summary Comparison with input data – Mean meteorological fields unbiased compared to input data – RMS difference suggest uncertainty estimates for individual observations accurate Comparison with other (independent) estimates – Results generally compare favourable with similar datasets – Observed differences within error bars Cross validation – Fields unbiased – Uncertainty estimates too small in some regions (high variability regions) – This is likely to be due to choice of length scales, varying the scales (spatially and directionally) should improve this
Future and related work Improvement of length scales and uncertainty estimates – 2 D variogram analysis to estimate length scales – 3 D variogram analysis to estimate length scales Increased resolution and inclusion of other data sources – Higher resolution on Cartesian grid (0.25°, 6 hourly) – Higher resolution on equal area grid (e.g. quaternary triangular mesh) – Inclusion of satellite and/or buoy data. Each new source requires characterisation (related work ongoing under NCEO) Other parameters – Precipitation – Wind stress – CO 2
Improved length scales Semi-variograms previously used to estimate random errors However, they can also be used to estimate the length scales required by the OI Isotropic model could be used, such as shown earlier (and to the right)
Improved length scales Semi-variograms previously used to estimate random errors However, they can also be used to estimate the length scales required by the OI Isotropic model could be used, such as shown earlier (and to the right) But correlation length scales known to vary directionally
Improved length scales Right hand plot shows variance as a function of distance and bearing between pairs of observations Axis of anisotropy roughly N-S and W/E as we’d expect We can also expand to 3 dimensions
Improved length scales
Increased resolution Variety of options available Higher resolution cartesian grid (left) Equal area grid (right) “Best” option depends on use / data
Future and related work Improvement of length scales and uncertainty estimates – 2 D variogram analysis to estimate length scales – 3 D variogram analysis to estimate length scales Increased resolution and inclusion of other data sources – Higher resolution on Cartesian grid (0.25°, 6 hourly) – Higher resolution on equal area grid (e.g. quaternary triangular mesh) – Inclusion of satellite and/or buoy data. Each new source requires characterisation (related work ongoing under NCEO) Other parameters – Precipitation – Wind stress – CO 2 Questions (and what should our priorities be)? For further information see: