Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.

Slides:



Advertisements
Similar presentations
Statistical modelling of precipitation time series including probability assessments of extreme events Silke Trömel and Christian-D. Schönwiese Institute.
Advertisements

Zentralanstalt für Meteorologie und Geodynamik 1. Comparison of HOM, SPLIDHOM and INTERP 2. Ideas for the daily benchmark dataset (temperature) Christine.
Noise & Data Reduction. Paired Sample t Test Data Transformation - Overview From Covariance Matrix to PCA and Dimension Reduction Fourier Analysis - Spectrum.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
LSP 120: Quantitative Reasoning and Technological Literacy Section 118 Özlem Elgün.
Benchmark database based on surrogate climate records Victor Venema.
REFERENCES Begert M., Schlegel T., Kirchhofer W., Homogeneous temperature and precipitation series of Switzerland from 1864 to Int. J. Climatol.,
Lecture 6: Multiple Regression
Stratospheric Temperature Variations and Trends: Recent Radiosonde Results Dian Seidel, Melissa Free NOAA Air Resources Laboratory Silver Spring, MD SPARC.
TR32 time series comparison Victor Venema. Content  Jan Schween –Wind game: measurement and synthetic –Temporal resolution of 0.1 seconds  Heye Bogena.
Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.
Sorin CHEVAL*, Tamás SZENTIMREY**, Ancuţa MANEA*** *National Meteorological Administration, Bucharest, Romania and Euro-Mediterranean Centre for Climate.
Global analysis of recent frequency component changes in interannual climate variability Murray Peel 1 & Tom McMahon 1 1 Civil & Environmental Engineering,
Statistical characteristics of surrogate data based on geophysical measurements Victor Venema 1, Henning W. Rust 2, Susanne Bachner 1, and Clemens Simmer.
Statistical Methods for long-range forecast By Syunji Takahashi Climate Prediction Division JMA.
Hydrologic Statistics
Detected Inhomogeneities In Wind Direction And Speed Data From Ireland Predrag Petrović Republic Hydrometeorological Service of Serbia Mary Curley Met.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
1 1.1 © 2012 Pearson Education, Inc. Linear Equations in Linear Algebra SYSTEMS OF LINEAR EQUATIONS.
Utskifting av bakgrunnsbilde: -Høyreklikk på lysbildet og velg «Formater bakgrunn» -Under «Fyll», velg «Bilde eller tekstur» og deretter «Fil…» -Velg ønsket.
December 2002 Section 2 Past Changes in Climate. Global surface temperatures are rising Relative to average temperature.
Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601.
After HOME : Progress in the practical application of statistical homogenisation Peter Domonkos Dimitrios Efthymiadis Centre for Climate Change University.
SCIENTIFIC REPORT ON COST SHORT TERM SCIENTIFIC MISSION Tania Marinova National Institute of Meteorology and Hydrology at the Bulgarian Academy of Sciences,
Renewable Energy Research Laboratory University of Massachusetts Prediction Uncertainties in Measure- Correlate-Predict Analyses Anthony L. Rogers, Ph.D.
Regional climate prediction comparisons via statistical upscaling and downscaling Peter Guttorp University of Washington Norwegian Computing Center
SIXTH SEMINAR FOR HOMOGENIZATION AND QUALITY CONTROL IN CLIMATOLOGICAL DATABASES AND COST ES-0601 “HOME” ACTION MANAGEMENT COMMITTEE AND WORKING GROUPS.
Dataset Development within the Surface Processes Group David I. Berry and Elizabeth C. Kent.
Breaks in Daily Climate Records Ralf Lindau University of Bonn Germany.
Noise in 3D Laser Range Scanner Data Xianfang Sun Paul L. Rosin Ralph R. Martin Frank C. Langbein School of Computer Science Cardiff University, UK.
Progress Toward a New Weather Generator Eric Schmidt, Colorado State University - Pueblo Dr. James O’Brien, Florida State University Anthony Arguez, Florida.
Regression Chapter 16. Regression >Builds on Correlation >The difference is a question of prediction versus relation Regression predicts, correlation.
Quality control and homogenization of the COST benchmark dataset Petr Štěpánek Pavel Zahradníček Czech Hydrometeorological Institute, regional office Brno.
Correction of daily values for inhomogeneities P. Štěpánek Czech Hydrometeorological Institute, Regional Office Brno, Czech Republic
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
WFM 6311: Climate Risk Management © Dr. Akm Saiful Islam WFM 6311: Climate Change Risk Management Akm Saiful Islam Lecture-7:Extereme Climate Indicators.
Status and Plans of the Global Precipitation Climatology Centre (GPCC) Bruno Rudolf, Tobias Fuchs and Udo Schneider (GPCC) Overview: Introduction to the.
Regional Climate Model Evaluation System based on satellite and other observations for application to CMIP/AR downscaling Peter Lean 1, Jinwon Kim 1,3,
On the reliability of using the maximum explained variance as criterion for optimum segmentations Ralf Lindau & Victor Venema University of Bonn Germany.
Climate tendencies in the South Shetlands: was 1998 a climate divider ? Alberto Setzer, Francisco E. Aquino and Marcelo Romao O. CPTEC - INPE - Brazil.
Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.
Application of an adaptive radiative transfer parameterisation in a mesoscale numerical weather prediction model DWD Extramural research Annika Schomburg.
A novel methodology for identification of inhomogeneities in climate time series Andrés Farall 1, Jean-Phillipe Boulanger 1, Liliana Orellana 2 1 CLARIS.
ANOVA, Regression and Multiple Regression March
Experience regarding detecting inhomogeneities in temperature time series using MASH Lita Lizuma, Valentina Protopopova and Agrita Briede 6TH Homogenization.
ACTION COST-ES0601: Advances in homogenisation methods of climate series: an integrated approach (HOME), WG Meeting, Palma de Mallorca, January, 25-27,
Homogenization of Chinese daily surface air temperatures:An update for CHHT1.0 Li Qingxiang, Xu Wenhui, Xiaolan Wang, and coauthors (National Meteorological.
Developing long-term homogenized climate Data sets Olivier Mestre Météo-France Ecole Nationale de la Météorologie Université Paul Sabatier, Toulouse.
The ENSEMBLES high- resolution gridded daily observed dataset Malcolm Haylock, Phil Jones, Climatic Research Unit, UK WP5.1 team: KNMI, MeteoSwiss, Oxford.
1 Detection of discontinuities using an approach based on regression models and application to benchmark temperature by Lucie Vincent Climate Research.
N ational C limatic D ata C enter Development of the Global Historical Climatology Network Sea Level Pressure Data Set (Version 2) David Wuertz, Physical.
Data quality control for the ENSEMBLES grid Evelyn Zenklusen Michael Begert Christof Appenzeller Christian Häberli Mark Liniger Thomas Schlegel.
Statistical Fundamentals: Using Microsoft Excel for Univariate and Bivariate Analysis Alfred P. Rovai Pearson Product-Moment Correlation Test PowerPoint.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
ENVIRONMENTAL AGENCY OF THE REPUBLIC OF SLOVENIA COST benchmark dataset homogenisation: issues and remarks of the “Slovenian team” Presentation.
Homogenization of daily data series for extreme climate index calculation Lakatos, M., Szentimey T. Bihari, Z., Szalai, S. Meeting of COST-ES0601 (HOME)
Inhomogeneities in temperature records deceive long-range dependence estimators Victor Venema Olivier Mestre Henning W. Rust Presentation is based on:
Actions & Activities Report PP8 – Potsdam Institute for Climate Impact Research, Germany 2.1Compilation of Meteorological Observations, 2.2Analysis of.
Benchmark database Victor Venema, Olivier Mestre, Enric Aguilar, Ingeborg Auer, José A. Guijarro, Petr Stepanek, Claude.N.Williams, Matthew Menne, Peter.
Homogenisation of temperature time series in Croatia
The homogenization of GPS Integrated Water Vapour time series: methodology and benchmarking the algorithms on synthetic datasets R. Van Malderen1, E. Pottiaux2,
Break and Noise Variance
The break signal in climate records: Random walk or random deviations
The Chinese University of Hong Kong
Meeting of COST-ES0601 (HOME) Mallorca JAN 2010
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
European Climate Assessment & Dataset
Defining the Products: ‘GSICS Correction’
The Cycle of Proof: Dealing with Data
Presentation transcript:

Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Creation benchmark – Outline talk 1)Start with (in)homogeneous rain and temp. data 2)Multiple surrogate and synthetic realisations 3)Mask surrogate records 4)Add global trend 5)Insert inhomogeneities in station time series 6)Published on the web 7)Homogenize by COST participants and third parties 8)Analyse the results and publish

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Benchmark dataset 1)Real (inhomogeneous) climate records  Most realistic case  Investigate if various HA find the same breaks  Good meta-data 2)Surrogate data  Empirical distribution and correlations  Insert known inhomogeneities 3)Synthetic data  Gaussian white noise  Insert same types of known inhomogeneities

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Available data Monthly Variables# stations Daily Variables# stations Austriatm, rr 41 Cataloniatn, tx, tm21tn, tx, rr17 Francetx, rr24 (27) Hollandtn, tx, tm, rr 9 (11)tn, tx, tm, rr 9 (11) Norwaytm, rr100 (189)tn, tx, rr Romaniatn, tx, rr23

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 1) Start with homogeneous data  Monthly mean temperature and precipitation  Homogeneous  No missing data  Generated networks are 100 a  Longer surrogates are based on multiple copies

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 1) Start with inhomogeneous data  Distribution –Years with breaks are removed –Mean of section between breaks is adjusted to global mean  Spectrum –Longest period without any breaks in the stations  Worst breaks known, little missing data –Surrogate is divided in overlapping sections –Fourier coefficients and phases are adjusted for every small section –No adjustments on large scales!

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Surrogates from inhomogeneous data

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Independent breaks  Determined at random for every station and time  5 breaks per 100 a  Monthly slightly different perturbations  Temperature –Additive –Size: Gaussian distribution, σ=0.8°C  Rain –Multiplicative –Size: Gaussian distribution, =1, σ=10%

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Example break perturbations station

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Example break perturbations network

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Correlated break in network  One break in 50 % of networks  In 30 % of the station simultaneously  Position random –At least 10 % of data points on either side

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Example correlated break

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Outliers  Size –Temperature: 99 percentile –Rain: 99.9 percentile  Frequency –50 % of networks: 1 % –50 % of networks: 3 %

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Example outlier perturbations station

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Example outliers network

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Local trends (only temperature)  Linear increase or decrease in one station  Duration: D  [30, 60] a  Size Gaussian distributed:  T =0.8°C, or  rr =10%  Frequency: once in 10 % of the stations  Also for rain?

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Example local trends

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 6) Published on the web  Inhomogeneous data will be published on the COST-HOME homepage  Everyone is welcome to download and homogenize the data  mitarbeiter/venema/themes/homogenisation

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 7) Homogenize by participants  COST-HOME file format: venema/themes/homogenisation/  For benchmark & COST homogenisation software  Format description changes since Vienna: –Stations files include height –Many clarifications

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Work in progress  Preliminary benchmark: venema/themes/homogenisation/  Write report on the benchmark dataset  More input data  Set deadline for the availability benchmark  Deadline for the return of the homogeneous data  Agree on the details of the benchmark  Advertise benchmark outside of COST

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary

7) Homogenize by participants  COST-HOME file format: venema/themes/homogenisation/costhome_fileformat.pdf  For benchmark & COST homogenisation software  Regular ASCII matrix (columns)  One data and one quality-flag file per station  Yearly, daily, subdaily data: columns for time, one for data  Monthly data: year column, 12 columns for data  Filename: variable, resolution, quality, station  ASCII network-file with station names  ASCII break-file with dates and station names

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary One station – with annual cycle

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary One station – anomalies

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Multiple stations – 10 year zoom

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Multiple stations – 10 year zoom

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary IAAFT algorithm smoothes jumps

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 3) Mask surrogate records  Beginning of records jagged (rough)  Linear increase in number of stations  Last station after 25% of full time  End of record all stations are measuring  Influence of jagged edge on detection and correction  But trend is also increasing in time (i.e. different)!  Is this a problem?

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 3) Mask surrogate records

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 4) Add global trend  NASA GISS GISS Surface Temperature Analysis (GISTEMP) by J. Hansen  Global mean surface temperature  Last year of any surrogate network is 1999

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Detected break distribution, tn, tx

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Detected break distribution, Gaussian fit

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 7) Homogenize by participants  COST-HOME file format: venema/themes/homogenisation/costhome_fileformat.pdf  For benchmark & COST homogenisation software  One data and one quality-flag file per station  Filename: variable, resolution, quality, station  ASCII network-file with station names  ASCII break-file with dates and station names

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary COST-HOME file format – monthly data

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary COST-HOME file format – network file

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 8) Analyse the results  Detailed analysis will be performed in the working groups –Detection –Correction –Daily data homogenisation  Synthetic and surrogate data –RMS Error –No. breaks detected (function of size) –Application: reduction in the scatter in the trends  Performance difference between synthetic (Gaussian, white noise) and surrogate data

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Questions  Ideas for a better benchmark  For example, for other inhomogeneities, constants  Types of inhomogeneities for daily data  Automatic processing –In the order of 100 networks

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary Scatterplot 2 rain stations

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary

2) Multiple surrogate realisations  Multiple surrogate realisations –Temporal correlations –Station cross-correlations –Empirical distribution function  Annual cycle removed before, added at the end  Number of stations between 5 and 20  Cross correlation (  ) varies as much as possible:  [0.5, 1.0]

Victor Venema, COST HOME, Mai 2008, Budapest, Hungary 7) Homogenize by participants  Return homogenised data  Should be in COST-HOME file format (next slide)  Return break detections –BREAK –OUTLI –BEGTR –ENDTR  Multiple breaks at one data possible