Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.

Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Creation benchmark – Outline talk 1)Start with (in)homogeneous rain and temp. data 2)Multiple surrogate and synthetic realisations 3)Mask surrogate records 4)Add global trend 5)Insert inhomogeneities in station time series 6)Published on the web 7)Homogenize by COST participants and third parties 8)Analyse the results and publish

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Benchmark dataset 1)Real (inhomogeneous) climate records  Most realistic case  Investigate if various HA find the same breaks  Good meta-data 2)Surrogate data  Empirical distribution and correlations  Insert known inhomogeneities 3)Synthetic data  Gaussian white noise  Insert same types of known inhomogeneities

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Available data Monthly Variables# stations Daily Variables# stations Austriatm, rr 41 Cataloniatn, tx, tm21tn, tx, rr17 Francetx, rr24 (27) Hollandtn, tx, tm, rr 9 (11)tn, tx, tm, rr 9 (11) Norwaytm, rr100 (189)tn, tx, rr Romaniatn, tx, rr23

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 1) Start with homogeneous data  Monthly mean temperature and precipitation  Homogeneous  No missing data  Generated networks are 100 a  Longer surrogates are based on multiple copies

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 1) Start with inhomogeneous data  Distribution –Years with breaks are removed –Mean of section between breaks is adjusted to global mean  Spectrum –Longest period without any breaks in the stations  Worst breaks known, little missing data –Surrogate is divided in overlapping sections –Fourier coefficients and phases are adjusted for every small section –No adjustments on large scales!

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Surrogates from inhomogeneous data

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Independent breaks  Determined at random for every station and time  5 breaks per 100 a  Monthly slightly different perturbations  Temperature –Additive –Size: Gaussian distribution, σ=0.8°C  Rain –Multiplicative –Size: Gaussian distribution, =1, σ=10%

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Example break perturbations station

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Example break perturbations network

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Correlated break in network  One break in 50 % of networks  In 30 % of the station simultaneously  Position random –At least 10 % of data points on either side

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Example correlated break

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Outliers  Size –Temperature: 99 percentile –Rain: 99.9 percentile  Frequency –50 % of networks: 1 % –50 % of networks: 3 %

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Example outlier perturbations station

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Example outliers network

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 5) Insert inhomogeneities in stations  Local trends (only temperature)  Linear increase or decrease in one station  Duration: D  [30, 60] a  Size Gaussian distributed:  T =0.8°C, or  rr =10%  Frequency: once in 10 % of the stations  Also for rain?

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Example local trends

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 6) Published on the web  Inhomogeneous data will be published on the COST-HOME homepage  Everyone is welcome to download and homogenize the data  http://www.meteo.uni-bonn.de/ mitarbeiter/venema/themes/homogenisation

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 7) Homogenize by participants  COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/  For benchmark & COST homogenisation software  Format description changes since Vienna: –Stations files include height –Many clarifications

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Work in progress  Preliminary benchmark: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/  Write report on the benchmark dataset  More input data  Set deadline for the availability benchmark  Deadline for the return of the homogeneous data  Agree on the details of the benchmark  Advertise benchmark outside of COST

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary

7) Homogenize by participants  COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/costhome_fileformat.pdf  For benchmark & COST homogenisation software  Regular ASCII matrix (columns)  One data and one quality-flag file per station  Yearly, daily, subdaily data: columns for time, one for data  Monthly data: year column, 12 columns for data  Filename: variable, resolution, quality, station  ASCII network-file with station names  ASCII break-file with dates and station names

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary One station – with annual cycle

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary One station – anomalies

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Multiple stations – 10 year zoom

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary IAAFT algorithm smoothes jumps

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 3) Mask surrogate records  Beginning of records jagged (rough)  Linear increase in number of stations  Last station after 25% of full time  End of record all stations are measuring  Influence of jagged edge on detection and correction  But trend is also increasing in time (i.e. different)!  Is this a problem?

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 3) Mask surrogate records

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 4) Add global trend  NASA GISS GISS Surface Temperature Analysis (GISTEMP) by J. Hansen  Global mean surface temperature  Last year of any surrogate network is 1999

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Detected break distribution, tn, tx

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Detected break distribution, Gaussian fit

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 7) Homogenize by participants  COST-HOME file format: http://www.meteo.uni-bonn.de/ venema/themes/homogenisation/costhome_fileformat.pdf  For benchmark & COST homogenisation software  One data and one quality-flag file per station  Filename: variable, resolution, quality, station  ASCII network-file with station names  ASCII break-file with dates and station names

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary COST-HOME file format – monthly data

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary COST-HOME file format – network file

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 8) Analyse the results  Detailed analysis will be performed in the working groups –Detection –Correction –Daily data homogenisation  Synthetic and surrogate data –RMS Error –No. breaks detected (function of size) –Application: reduction in the scatter in the trends  Performance difference between synthetic (Gaussian, white noise) and surrogate data

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Questions  Ideas for a better benchmark  For example, for other inhomogeneities, constants  Types of inhomogeneities for daily data  Automatic processing –In the order of 100 networks

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary Scatterplot 2 rain stations

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary

2) Multiple surrogate realisations  Multiple surrogate realisations –Temporal correlations –Station cross-correlations –Empirical distribution function  Annual cycle removed before, added at the end  Number of stations between 5 and 20  Cross correlation (  ) varies as much as possible:  [0.5, 1.0]

Victor Venema, Victor.Venema@uni-bonn.de, COST HOME, Mai 2008, Budapest, Hungary 7) Homogenize by participants  Return homogenised data  Should be in COST-HOME file format (next slide)  Return break detections –BREAK –OUTLI –BEGTR –ENDTR  Multiple breaks at one data possible

Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.

Similar presentations

Presentation on theme: "Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema.

Similar presentations

Presentation on theme: "Benchmark database inhomogeneous data, surrogate data and synthetic data Victor Venema."— Presentation transcript:

Similar presentations

About project

Feedback