Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau.

Slides:



Advertisements
Similar presentations
Viscosity of Dilute Polymer Solutions
Advertisements

Managerial Economics in a Global Economy
Kriging.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Correlation and regression
Regression Analysis Using Excel. Econometrics Econometrics is simply the statistical analysis of economic phenomena Here, we just summarize some of the.
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.
The Simple Linear Regression Model: Specification and Estimation
Appendix to Chapter 1 Mathematics Used in Microeconomics © 2004 Thomson Learning/South-Western.
GG 313 Geological Data Analysis # 18 On Kilo Moana at sea October 25, 2005 Orthogonal Regression: Major axis and RMA Regression.
Simple Linear Regression
Curve-Fitting Regression
Appendix to Chapter 1 Mathematics Used in Microeconomics © 2004 Thomson Learning/South-Western.
7. Homogenization Seminar Budapest – October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
Diplomanden-Doktoranden-Seminar Bonn – 18. Januar 2010 Kriging Connection between Stepwise Kriging and Data Construction and Stepwise Kriging of Victorian.
Daily Stew Kickoff – 27. January 2011 First Results of the Daily Stew Project Ralf Lindau.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
So are how the computer determines the size of the intercept and the slope respectively in an OLS regression The OLS equations give a nice, clear intuitive.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
Objectives of Multiple Regression
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Non-Linear Models. Non-Linear Growth models many models cannot be transformed into a linear model The Mechanistic Growth Model Equation: or (ignoring.
Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks,
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Fundamentals of Data Analysis Lecture 10 Management of data sets and improving the precision of measurement pt. 2.
VI. Evaluate Model Fit Basic questions that modelers must address are: How well does the model fit the data? Do changes to a model, such as reparameterization,
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records Separation of true from spurious breaks Ralf.
DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Curve-Fitting Regression
Breaks in Daily Climate Records Ralf Lindau University of Bonn Germany.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15.
On the reliability of using the maximum explained variance as criterion for optimum segmentations Ralf Lindau & Victor Venema University of Bonn Germany.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
Correlation They go together like salt and pepper… like oil and vinegar… like bread and butter… etc.
1 Module One: Measurements and Uncertainties No measurement can perfectly determine the value of the quantity being measured. The uncertainty of a measurement.
Sampling and Statistical Analysis for Decision Making A. A. Elimam College of Business San Francisco State University.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 What happens to the location estimator if we minimize with a power other that 2? Robert J. Blodgett Statistic Seminar - March 13, 2008.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Demand Management and Forecasting Chapter 11 Portions Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin.
CORRELATION-REGULATION ANALYSIS Томский политехнический университет.
Statistics 350 Lecture 2. Today Last Day: Section Today: Section 1.6 Homework #1: Chapter 1 Problems (page 33-38): 2, 5, 6, 7, 22, 26, 33, 34,
Chapter 11 – With Woodruff Modications Demand Management and Forecasting Copyright © 2010 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Presentation : “ Maximum Likelihood Estimation” Presented By : Jesu Kiran Spurgen Date :
STA248 week 121 Bootstrap Test for Pairs of Means of a Non-Normal Population – small samples Suppose X 1, …, X n are iid from some distribution independent.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Stats Methods at IC Lecture 3: Regression.
Inference for Least Squares Lines
Part 5 - Chapter
Model validation and prediction
Break and Noise Variance
The break signal in climate records: Random walk or random deviations
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
Dipdoc Seminar – 15. October 2018
Random sampling Carlo Azzarri IFPRI Datathon APSU, Dhaka
Product moment correlation
Testing Causal Hypotheses
MGS 3100 Business Analysis Regression Feb 18, 2016
Presentation transcript:

Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau

Dipdoc Seminar – 12. November 2015 Break detection Consider the differences of one station compared to a neighbor reference from a surrounding network of stations. The dominating natural variance is cancelled out, because it is very similar at both stations. Breaks become visible by abrupt changes in the station-reference time series. Internal variance (Noise) within the subperiods External variance (Signal) between the means of different subperiods Break criterion: Maximum external (explained) variance

Break-aware idea Breaks are only detected, but not corrected. Calculate the mean trend over all homogeneous subperiods (omitting the known breakpoints). This trend should reflect the true trend. Dipdoc Seminar – 12. November 2015 Is not promising, because:

Correction is important Dipdoc Seminar – 12. November 2015

Station or network trend? From my point of view, the network-mean (regionally averaged) trend is per se more interesting. Moreover, trend corrections for individual stations are easy to derive, if the network-mean trend is known. Therefore, we concentrate on corrections of the network-mean trend. Dipdoc Seminar – 12. November 2015

Network-mean trend error Dipdoc Seminar – 12. November 2015 Observed = True + Spurious station trend Observed trend anomaly = Spurious station trend against the network-mean– network-mean trend error trend

Two ways of correction 1.Break-by-break method: Consider an individual break of one candidate station. Compare the two homogeneous subperiods before and after this break with homogeneous subperiods of suited neighbor stations, which preferably long overlap periods with the candidate break. 2.ANOVA method: Minimize the variance of the entire network. Discussed in the following, because it is better defined. Dipdoc Seminar – 16. June 2014

ANOVA correction scheme (1/3) Observation b (station, year) Climate signal c (year) Inhomogeneity a (station, year) Noise  station, year) Minimize the squared difference between theory and observations. Derivation with respect to c(j) leads to m equations, one for each c(j). Dipdoc Seminar – 12. November 2015

ANOVA correction scheme (2/3) Analogously, we get h equations, one for each of the homogeneous sub-periods. Altogether we have m+h equations for m+h unknowns. Thus a (m+h) x (m+h) matrix has to be solved. E.g. 115x115-matrix for 100 years and 15 subperiods. We can insert the m climate equations for each year into the h equations for each subperiod, so that only a hxh-matrix has to solved. Dipdoc Seminar – 12. November 2015

ANOVA correction scheme (3/3) n = 5 stations with data from 100 years. h = 15 homogeneous sub-periods in total. Consider sub-period no. 8: It has an overlap of 32 years with a 2, of 13 years with a 4, of 19 years with a 5, etc. The overlap periods constitutes the non- diagonal matrix elements. The diagonal is given by (n-1) length(a i ) Dipdoc Seminar – 12. November 2015

Simulated data Test the performance of the ANOVA correction scheme with simulated data. The simulated data consist of three superimposed signals: 1.The climate signal, which identical for all stations of a network. 2.Noise, which mimics the difference between the stations, e.g. due to weather. 3.Inhomogeneities inserted at random timings and with random strengths. Dipdoc Seminar – 12. November 2015

Test under perfect conditions 1000 networks 10 stations with 5 breaks each No mean trend error No noise Known break positions Result: Perfect skill. - We made no programming errors. - The method works perfectly. Dipdoc Seminar – 19. November 2015 Inserted yearly station inhomogeneity Detected yearly station inhomogeneity

Signal / noise = 1 Still: No mean trend error Perfectly known break positions Equal break and noise variance: SNR = 1 Result: As expected no longer perfect, but r = Dipdoc Seminar – 19. November 2015 Inserted yearly station inhomogeneity Detected yearly station inhomogeneity

Network-mean trends From individual inhomogeneities to network-mean trends. Both regressions are calculated. That taking the x-axis data as independent is in all three cases equal to the 1-to-1 line. What does this mean? Dipdoc Seminar – 19. November 2015 Inhomogeneities Station trends Network trends r = r = r =

What does it mean? Dipdoc Seminar – 12. November 2015

Remaining trend error It is convenient to display not the detected (y), but the remaining (y-x) trend error. As shown the inserted and the remaining quantities are uncorrelated. The remaining errors are smaller, but comparable in size. This is valid for SNR = 1 Dipdoc Seminar – 12. November 2015 Inserted network-mean trend error Remaining network-mean trend error  x 2 =  y 2 = 0.141

Preliminary conclusion I Dipdoc Seminar – 12. November 2015

Improvement ratio (1/2) Ratio q depends on the SNR. Upper panel: Doubled noise (SNR = ½ ) leads to doubled remaining trend error. The inserted trend error is unchanged. Doubled q Lower panel: Doubled signal (SNR = 2) leads to doubled inserted trend error. The remaining trend error is unchanged. Halved q Inserted and remaining errors are independent. The inserted error is determined by the break (signal) variance. The remaining error is determined by the noise variance. Dipdoc Seminar – 12. November 2015 Inserted  x 2 =  y 2 = Remaining Inserted Remaining SNR = ½  x 2 =  y 2 = SNR = 2 q = 1.46 (0.73) q = (0.73)

Improvement ratio (2/2) Further parameters (besides SNR) that may also affect q are: break and station number. It shows: The ratio does mainly depend on break number and not on station number. For 6 breaks the ratio is about 1. Dipdoc Seminar – 12. November 2015 Break number Station number

Preliminary conclusion II Does the correction act neutrally if no correction is necessary? Yes, depending on SNR, for SNR=1, break number=6, length=100 the data is neither upgraded nor downgraded. But, the obtained homogenized data is mutually dependent. Standard statistical techniques (using data independency) cannot be applied. All variances are underestimated. Dipdoc Seminar – 12. November 2015

And IF there is a trend error? Dipdoc Seminar – 12. November 2015 Year Mean inserted  I

Non-zero mean trend error The scatter is conserved, compared to zero mean trend error. The data cloud is shifted as a whole to the right. The uncertainty remains, but the mean trend error is well corrected. Dipdoc Seminar – 12. November 2015 Inserted network-mean trend error Remaining network-mean trend error x mean = y mean = 0.010

Including break position errors Question: What happens, if the break positions have errors and are not perfectly known (as in reality). Simulation: Scatter the correct positions by adding noise with standard deviation of 2 years. Result: Only 80% of the trend error is corrected, 20% remains. Dipdoc Seminar – 12. November 2015 Inserted network-mean trend error Remaining network-mean trend error x mean = y mean = 0.161

Conclusion If the original data contains no trend error (if the inhomogeneities have by chance no overall effect) the trend error for individual networks is (under realistic conditions) not improved. However, mutually dependent data results from homogenization. This is a disadvantage. Mean trend errors (due to inhomogeneities) are corrected perfectly, if the break positions are known perfectly. For realistic position errors (  = 2 years) the trend error is only partly corrected, 20% remains. Dipdoc Seminar – 12. November 2015