Quality control and homogenization of the COST benchmark dataset Petr Štěpánek Pavel Zahradníček Czech Hydrometeorological Institute, regional office Brno.

Slides:



Advertisements
Similar presentations
SECONDARY VALIDATION OF WATER LEVEL DATA (1) PRIMARY VALIDATION: –BASED ON KNOWLEDGE OF INSTRUMENTATION AND METHODS OF MEASUREMENT WITH ASSOCIATED ERRORS.
Advertisements

1 Alberta Agriculture and Food (AF) Surface Meteorological Stations and Data Quality Control Procedures.
Andrea Toreti 1,2, Franco Desiato 1, Guido Fioravanti 1, Walter Perconti 1 1 APAT – Climate and Applied Meteorology Unit 2 University of Bern
Homogenization of monthly Benchmark temperature series of network no. 3 – using ProClimDB software COST Benchmark meeting in Zürich September 2010.
ENVIRONMENTAL AGENCY OF THE REPUBLIC OF SLOVENIA A Method for Daily Temperature Data Interpolation and Quality Control Based on the Selected.
Arctic Discharge Observations The Arctic RIMS and the R-Arctic Net datasets Åsa Rennermalm Princeton University.
SECONDARY VALIDATION - RAINFALL DATA PRIMARY VALIDATION ALREADY DONE *ON INDIVIDUAL STATION BASIS SECONDARY VALIDATION *IDENTIFY SUSPECT VALUES BY HAVING.
Benchmark database based on surrogate climate records Victor Venema.
Using a Centered Moving Average to Extract the Seasonal Component of a Time Series If we are forecasting with say, quarterly time series data, a 4-period.
Spatial Interpolation
Stratospheric Temperature Variations and Trends: Recent Radiosonde Results Dian Seidel, Melissa Free NOAA Air Resources Laboratory Silver Spring, MD SPARC.
A Procedure for Automated Quality Control and Homogenization of historical daily temperature and precipitation data (APACH). Part 1: Quality Control of.
Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.
X-12 ARIMA Eurostat, Luxembourg Seasonal Adjustment.
MOS Performance MOS significantly improves on the skill of model output. National Weather Service verification statistics have shown a narrowing gap between.
Economics 20 - Prof. Anderson1 Fixed Effects Estimation When there is an observed fixed effect, an alternative to first differences is fixed effects estimation.
Daily Stew Kickoff – 27. January 2011 First Results of the Daily Stew Project Ralf Lindau.
Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
Nynke Hofstra and Mark New Oxford University Centre for the Environment Trends in extremes in the ENSEMBLES daily gridded observational datasets for Europe.
Utskifting av bakgrunnsbilde: -Høyreklikk på lysbildet og velg «Formater bakgrunn» -Under «Fyll», velg «Bilde eller tekstur» og deretter «Fil…» -Velg ønsket.
Spatial Interpolation of monthly precipitation by Kriging method
Constructing and Analyzing Climate Graphs
Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks,
Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601.
COSTOC Olivier MestreMétéo-FranceFrance Ingebor AuerZAMGAustria Enric AguilarU. Rovirat i VirgiliSpain Paul Della-MartaMeteoSwissSwitzerland Vesselin.
SCIENTIFIC REPORT ON COST SHORT TERM SCIENTIFIC MISSION Tania Marinova National Institute of Meteorology and Hydrology at the Bulgarian Academy of Sciences,
 The data set below gives the points per game averages for the 10 players who had the highest averages (minimum 70 games or 1400 points) during the
Gridding Daily Climate Variables for use in ENSEMBLES Malcolm Haylock, Climatic Research Unit Nynke Hofstra, Mark New, Phil Jones.
Interpolation Tools. Lesson 5 overview  Concepts  Sampling methods  Creating continuous surfaces  Interpolation  Density surfaces in GIS  Interpolators.
Experiences with homogenization of daily and monthly series of air temperature, precipitation and relative humidity in the Czech Republic, P.
Data Types Entities and fields can be transformed to the other type Vectors compared to rasters.
HOME-ES601WG-1 Report to the 2nd MC, Vienna 23/11/2007 WG1 REPORT TO THE 2nd MC Enric Aguilar URV, Tarragona, Spain
Principal aspects taken into account in PRISM model: 1.Relationship between precipitation and elevation: Precipitation increases with elevation, with a.
Correction of daily values for inhomogeneities P. Štěpánek Czech Hydrometeorological Institute, Regional Office Brno, Czech Republic
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
1 Analysis of Variance Chapter 14 2 Introduction Analysis of variance helps compare two or more populations of quantitative data. Specifically, we are.
1 Recharge on Non-irrigated Lands ESHMC Meeting January 2009 W. Schreuder & B. Contor.
Measures of Spread 1. Range: the distance from the lowest to the highest score * Problem of clustering differences ** Problem of outliers.
International Workshop on Rescue and Digitization of Climate Records in the Mediterranean Basin Data Rescue Activities at Slovenian Meteorological Office.
Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.
Lecture 6: Point Interpolation
A novel methodology for identification of inhomogeneities in climate time series Andrés Farall 1, Jean-Phillipe Boulanger 1, Liliana Orellana 2 1 CLARIS.
A comparison of automatic model selection procedures for seasonal adjustment Cathy Jones.
Experience regarding detecting inhomogeneities in temperature time series using MASH Lita Lizuma, Valentina Protopopova and Agrita Briede 6TH Homogenization.
ACTION COST-ES0601: Advances in homogenisation methods of climate series: an integrated approach (HOME), WG Meeting, Palma de Mallorca, January, 25-27,
The observational dataset most RT’s are waiting for: the WP5.1 daily high-resolution gridded datasets HadGHCND – daily Tmax Caesar et al., 2001 GPCC -
U.S. Department of the Interior U.S. Geological Survey Evaluating the drought monitoring capabilities of rainfall estimates for Africa Chris Funk Pete.
Special Topics in Geo-Business Data Analysis Week 3 Covering Topic 6 Spatial Interpolation.
Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau.
1 Detection of discontinuities using an approach based on regression models and application to benchmark temperature by Lucie Vincent Climate Research.
Data quality control for the ENSEMBLES grid Evelyn Zenklusen Michael Begert Christof Appenzeller Christian Häberli Mark Liniger Thomas Schlegel.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
ENVIRONMENTAL AGENCY OF THE REPUBLIC OF SLOVENIA COST benchmark dataset homogenisation: issues and remarks of the “Slovenian team” Presentation.
Homogenization of daily data series for extreme climate index calculation Lakatos, M., Szentimey T. Bihari, Z., Szalai, S. Meeting of COST-ES0601 (HOME)
Actions & Activities Report PP8 – Potsdam Institute for Climate Impact Research, Germany 2.1Compilation of Meteorological Observations, 2.2Analysis of.
Benchmark database Victor Venema, Olivier Mestre, Enric Aguilar, Ingeborg Auer, José A. Guijarro, Petr Stepanek, Claude.N.Williams, Matthew Menne, Peter.
Homogenisation of temperature time series in Croatia
Measures of dispersion
MOS Developed by and Run at the NWS Meteorological Development Lab (MDL) Full range of products available at:
Kostas M. Andreadis1, Dennis P. Lettenmaier1
Qc2 Development
Day 91 Learning Target: Students can use statistics appropriate to the shape of the data distribution to compare center (median, mean) and spread (interquartile.
Screening for Abnormal Values in AirBase Datasets
Statistics Vocabulary Continued
A Temperature Forecasting Model for the Continental United States
PROVIDING THE UPPER-AIR DATA RELEVANT TO STUDIES OF THE NORTHERN POLAR CLIMATE CHANGES Alexander M. Sterin (Russian Research Institute for Hydrometeorological.
Evaluation of Gridded Snow Products Using CloudSat Snowfall Estimates
Statistics Vocabulary Continued
Warm up Honors Algebra 2 3/14/19
Presentation transcript:

Quality control and homogenization of the COST benchmark dataset Petr Štěpánek Pavel Zahradníček Czech Hydrometeorological Institute, regional office Brno

Processing before any data analysis Software AnClim, AnClim, ProClimDB ProClimDB

Data Quality Control Finding Outliers Two main approaches: Using limits derived from interquartile ranges (time series) Using limits derived from interquartile ranges (time series) comparing values to values of neighbouring stations (spatial analysis) comparing values to values of neighbouring stations (spatial analysis)

for monthly data for monthly data weighted /unweighted mean from neighbouring stations weighted /unweighted mean from neighbouring stations Power of weight is 1 for temperature (1/d) and 3 for precipitation (1/d 3 ) - IDW Power of weight is 1 for temperature (1/d) and 3 for precipitation (1/d 3 ) - IDW criterions used for stations selection criterions used for stations selection (or combination of it): (or combination of it): best correlated / nearest neighbours (correlations – from the first differenced series) best correlated / nearest neighbours (correlations – from the first differenced series) limit correlation, limit distance limit correlation, limit distance limit difference in altitudes limit difference in altitudes neighbouring stations series should be standardized to test series AVG and / or STD/ Atlitude neighbouring stations series should be standardized to test series AVG and / or STD/ Atlitude Comparison with „expected“ value – Comparison with „expected“ value – (calculated as weighted mean (calculated as weighted mean from standardized neighbours values) from standardized neighbours values) Creating Reference Series

Example: Proposed list of stations used for creating reference series

„Outliers“ temperature sur1, network 1 detected 12 „outliers“ 10 errors for station 150 (5 in year 1909) Mean difference between measured outliers and expect value is about 6°C

„Outliers“ precipitation sur1, network 1 detected 8 „outliers“ Mean difference between measured outliers and expect value is about 180 mm Max difference is 313 mm (station , 8/1971)

Months, seasons, year

for monthly, for monthly, weighted /unweighted mean from neighbouring stations weighted /unweighted mean from neighbouring stations criterions used for stations selection (or combination of it): criterions used for stations selection (or combination of it): best correlated / nearest neighbours (correlations – from the first differenced series) best correlated / nearest neighbours (correlations – from the first differenced series) limit correlation, limit distance limit correlation, limit distance limit difference in altitudes limit difference in altitudes neighbouring stations series neighbouring stations series should be standardized to test series AVG and / or STD should be standardized to test series AVG and / or STD (temperature - elevation, precipitation - variance) (temperature - elevation, precipitation - variance) - missing data are not so big problem then - missing data are not so big problem then Creating Reference Series

Relative homogeneity testing Test series – 40 years Test series – 40 years Longer series – divide to the more section with overlay 10 years Longer series – divide to the more section with overlay 10 years Tests: SNHT, Bivarite, t-test Tests: SNHT, Bivarite, t-test

Example of the detected breaks – temperature, sur1, network 1 - Detected 63 breaks Station no. 50, break 1928 Station no. 50, break 1975 Difference between test and reference seriesTest and reference seriesTest statistics

Station no. 100, break 1983

Example of the detected breaks – precipitation, sur1, network 1 - Detected 10 breaks Station no , break 1909 Station no , break 1991

Adjusting monthly data using reference series based on distance using reference series based on distance Power of weight is 0.5 for temperature and 1 for precipitation Power of weight is 0.5 for temperature and 1 for precipitation adjustment: from differences/ratios 20 years before and after a change, monhtly adjustment: from differences/ratios 20 years before and after a change, monhtly smoothing monthly adjustments (low-pass filter for adjacent values) smoothing monthly adjustments (low-pass filter for adjacent values) Station no. 50, break 1928Station no. 100, break 1983

Adjusting values – evaluation After adjust must correlation increase – if not, the series is not adjust Temperature Precipitation

Absolute values of adjustment for temperature, surg1, network 1

Iterative homogeneity testing several iteration of testing and results evaluation several iteration of testing and results evaluation several iterations of homogeneity testing and series adjusting (3 iterations should be sufficient) several iterations of homogeneity testing and series adjusting (3 iterations should be sufficient) question of homogeneity of reference series is thus solved: question of homogeneity of reference series is thus solved: possible inhomogeneities should be eliminated by using averages of several neighbouring stations possible inhomogeneities should be eliminated by using averages of several neighbouring stations if this is not true: in next iteration neighbours should be already homogenized if this is not true: in next iteration neighbours should be already homogenized

Example – homogenized temperature series Station no. 50 Station no. 100

Example – homogenized precipitation series Station no , break 1909 Station no , break 1991