Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks,

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Diplomanden-Doktoranden-Seminar Bonn – 29. Juni 2008 Surrogates and Kriging Part I: Kriging Ralf Lindau.
Statistical Techniques I EXST7005 Start here Measures of Dispersion.
Linear regression and correlation
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Correlation Mechanics. Covariance The variance shared by two variables When X and Y move in the same direction (i.e. their deviations from the mean are.
Introduction to Statistics
Statistical tools in Climatology René Garreaud
Break Position Errors in Climate Records Ralf Lindau & Victor Venema University of Bonn Germany.
Significance Testing Chapter 13 Victor Katch Kinesiology.
Measures of Dispersion or Measures of Variability
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Evaluating Hypotheses
Diplomanden-Doktoranden-Seminar Bonn – 18. Mai 2008 LandCaRe 2020 Temporal downscaling of heavy precipitation and some general thoughts about downscaling.
Efficient Estimation of Emission Probabilities in profile HMM By Virpi Ahola et al Reviewed By Alok Datar.
7. Homogenization Seminar Budapest – October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
1 Seventh Lecture Error Analysis Instrumentation and Product Testing.
Diplomanden-Doktoranden-Seminar Bonn – 18. Januar 2010 Kriging Connection between Stepwise Kriging and Data Construction and Stepwise Kriging of Victorian.
Daily Stew Kickoff – 27. January 2011 First Results of the Daily Stew Project Ralf Lindau.
1 10. Joint Moments and Joint Characteristic Functions Following section 6, in this section we shall introduce various parameters to compactly represent.
Two and a half problems in homogenization of climate series concluding remarks to Daily Stew Ralf Lindau.
Physics 114: Lecture 15 Probability Tests & Linear Fitting Dale E. Gary NJIT Physics Department.
Linear Regression When looking for a linear relationship between two sets of data we can plot what is known as a scatter diagram. x y Looking at the graph.
More About Significance Tests
Downscaling in time. Aim is to make a probabilistic description of weather for next season –How often is it likely to rain, when is the rainy season likely.
1 DATA DESCRIPTION. 2 Units l Unit: entity we are studying, subject if human being l Each unit/subject has certain parameters, e.g., a student (subject)
Comparing two sample means Dr David Field. Comparing two samples Researchers often begin with a hypothesis that two sample means will be different from.
Benchmark dataset processing P. Štěpánek, P. Zahradníček Czech Hydrometeorological Institute (CHMI), Regional Office Brno, Czech Republic, COST-ESO601.
1 Trend Analysis Step vs. monotonic trends; approaches to trend testing; trend tests with and without exogeneous variables; dealing with seasonality; Introduction.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Physics 114: Exam 2 Review Lectures 11-16
Modern Navigation Thomas Herring
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Ch9. Inferences Concerning Proportions. Outline Estimation of Proportions Hypothesis concerning one Proportion Hypothesis concerning several proportions.
On the multiple breakpoint problem and the number of significant breaks in homogenisation of climate records Separation of true from spurious breaks Ralf.
DDS – 12. December 2011 What is the correct number of break points hidden in a climate record?
Wilcoxon rank sum test (or the Mann-Whitney U test) In statistics, the Mann-Whitney U test (also called the Mann-Whitney-Wilcoxon (MWW), Wilcoxon rank-sum.
Breaks in Daily Climate Records Ralf Lindau University of Bonn Germany.
TYPES OF STATISTICAL METHODS USED IN PSYCHOLOGY Statistics.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Quality control of daily data on example of Central European series of air temperature, relative humidity and precipitation P. Štěpánek (1), P. Zahradníček.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
7. Homogenization Seminar Budapest – 24. – 27. October 2011 What is the correct number of break points hidden in a climate record? Ralf Lindau Victor Venema.
BioSS reading group Adam Butler, 21 June 2006 Allen & Stott (2003) Estimating signal amplitudes in optimal fingerprinting, part I: theory. Climate dynamics,
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
On the reliability of using the maximum explained variance as criterion for optimum segmentations Ralf Lindau & Victor Venema University of Bonn Germany.
Data Analysis.
Correlation & Regression Analysis
Chapter 8: Simple Linear Regression Yang Zhenlin.
© Copyright McGraw-Hill 2004
Slide 1 Regression Assumptions and Diagnostic Statistics The purpose of this document is to demonstrate the impact of violations of regression assumptions.
1 What happens to the location estimator if we minimize with a power other that 2? Robert J. Blodgett Statistic Seminar - March 13, 2008.
1 Foundations of Research Cranach, Tree of Knowledge [of Good and Evil] (1472) Click “slide show” to start this presentation as a show. Remember: focus.
Correction of spurious trends in climate series caused by inhomogeneities Ralf Lindau.
1 Detection of discontinuities using an approach based on regression models and application to benchmark temperature by Lucie Vincent Climate Research.
The joint influence of break and noise variance on break detection Ralf Lindau & Victor Venema University of Bonn Germany.
Introduction Dispersion 1 Central Tendency alone does not explain the observations fully as it does reveal the degree of spread or variability of individual.
MODEL DIAGNOSTICS By Eni Sumarminingsih, Ssi, MM.
Application of the CRA Method Application of the CRA Method William A. Gallus, Jr. Iowa State University Beth Ebert Center for Australian Weather and Climate.
SUR-2250 Error Theory.
Break and Noise Variance
The break signal in climate records: Random walk or random deviations
Materials for Lecture 18 Chapters 3 and 6
Adjustment of Temperature Trends In Landstations After Homogenization ATTILAH Uriah Heat Unavoidably Remaining Inaccuracies After Homogenization Heedfully.
Chapter 9 Hypothesis Testing.
Dipdoc Seminar – 15. October 2018
Product moment correlation
Threshold Autoregressive
Presentation transcript:

Detection of inhomogeneities in Daily climate records to Study Trends in Extreme Weather Detection of Breaks in Random Data, in Data Containing True Breaks, and in Real Data Ralf Lindau

Daily Stew Meeting, Bonn – 14. June 2012 Internal and External Variance Consider the differences of one station compared to a neighbour or a reference. Breaks are defined by abrupt changes in the station-reference time series. Internal variance within the subperiods External variance between the means of different subperiods Criterion: Maximum external variance attained by a minimum number of breaks

Daily Stew Meeting, Bonn – 14. June 2012 Decomposition of Variance n total number of years N subperiods n i years within a subperiod The sum of external and internal variance is constant.

Daily Stew Meeting, Bonn – 14. June 2012 Three Questions How do random data behave? Needed as a stop criterion for the number of significant breaks. How do real breaks behave theoretically? How do real data behave?

Daily Stew Meeting, Bonn – 14. June 2012 Segment averages with stddev = 1 Segment averages x i scatter randomly mean : 0 stddev:1/ Because any deviation from zero can be seen as inaccuracy due to the limited number of members.

Daily Stew Meeting, Bonn – 14. June 2012  2 -distribution The external variance is equal to the mean square sum of a random standard normal distributed variable. Weighted measure for the variability of the subperiods‘ means

Daily Stew Meeting, Bonn – 14. June 2012 From  2 to  distribution n = 21 years k = 7 breaks data  X ~  2 (a) and Y ~  2 (b)  X / (X+Y) ~  (a/2, b/2) If we normalize a chi 2 -distributed variable by the sum of itself and another chi 2 -distributed variable, the result will be  -distributed.  with

Daily Stew Meeting, Bonn – 14. June 2012 Incomplete Beta Function External variance v is  -distributed and depends on n (years) and k (breaks): Solvable for even k and odd n: The exceeding probability P gives the best (maximum) solution for v Incomplete Beta Function We are interested in the best solution, with the highest external variance. We need the exceeding probability for high var ext

Daily Stew Meeting, Bonn – 14. June 2012 P(v) for different k Can we give a formula for in order to derive v(k)? 2 20 breaks Increasing the break number from k to k+1 has two consequences: 1.The probability function changes. 2.The number combinations increase.

Daily Stew Meeting, Bonn – 14. June 2012 dv/dk sketch P(v) is a complicated function and hard to invert into v(P). Thus, dv is concluded from dP / slope. And the solution is: k breaks k+1 breaks

Daily Stew Meeting, Bonn – 14. June 2012 Solution

Daily Stew Meeting, Bonn – 14. June 2012 Constance of Solution 101 years 21 years The solution for the exponent  is constant for different length of time series (21 and 101 years).

Daily Stew Meeting, Bonn – 14. June 2012 The extisting algorithm Prodige Original formulation of Caussinus and Mestre for the penalty term in Prodige Translation into terms used by us. Normalisation by k* = k / (n -1) Derivation to get the minimum In Prodige it is postulated that the relative gain of external variance is a constant for given n.

Daily Stew Meeting, Bonn – 14. June 2012 Our Results vs Prodige We know the function for the relative gain of external variance. Its uncertainty as given by isolines of exceeding probabilities for 2 -i are characterised by constant distances. Prodige propose a constant of 2 ln(n) ≈ 9 Exceeding probability 1/128 1/64 1/32 1/16 1/8 1/4

Daily Stew Meeting, Bonn – 14. June 2012 Wrong Direction n = 101 yearsn = 21 years

True Breaks Daily Stew Meeting, Bonn – 14. June 2012 

Only true for constant lengths True breaks with fixed distances behave identical to random data. For realistic random lengths the exponent is slightly increased. Daily Stew Meeting, Bonn – 14. June 2012 Sub-periods with random lengths Sub-periods with constant lengths data theory data

Distribution of Lengths The distribution of the sub- periods’ lengths as obtained by randomly inserted breaks is known. If necessary, it could be taken into account. Daily Stew Meeting, Bonn – 14. June 2012

Break vs Scatter Regime The two governing parameters are: 1) The relative amount of break variance compared to the scatter variance 2) The quotient The latter defines how much faster the internal variance decreases in the “true break regime” compared to the “scatter regime” If the relative scatter is low (10%) the transition between the regimes is clearly visible at 15 from 19 breaks. Daily Stew Meeting, Bonn – 14. June 2012 Time series length Number of true breaks

Real Data 1050 Climate Stations exist in Germany. For each station the next eastward (to avoid identical pairs) neighbour between 10 km and 30 km is searched. 443 stations pairs remain. Daily Stew Meeting, Bonn – 14. June 2012 All StationsNeighbouring pairs

Data Focus This project deals with daily climate data. Findings about their extremes are in the focus. At least statements about the distribution (moments) percentiles indices (number of wet days per month) should be possible. Daily Stew Meeting, Bonn – 14. June 2012

Parameters Daily Stew Meeting, Bonn – 14. June 2012 Interesting for break detection: Problem parameters PP Expected physical problems Temperature at high sun shine duration Temperature at high pressure Temperature at high diurnal cycle Temperature during snow cover Temperature depending on general weather situation Temperature during rain Rain at high wind speed Expected technical problems Frequency of rainy days below 1 mm Tenth of precipitation report Difference between T mean and (T max -T min ) Per se interesting parameters P Monthly means Temperature Precipitation, etc. Breaks are more sensitive to problem parameters. Breaks in PP may help to find breaks in P Distribution and extremes Standard deviation Skewness Kurtosis Maximum Minimum 90 percentile project focus (more sensitive?)

Two Parameter Pairs 1a. Monthly mean temperature 1b. Monthly maximum temperature 2a. Monthly precipitation sum 2b. Frequency of rainy days below 1 mm Can the sensitive parameter help to find breaks in the mean? Daily Stew Meeting, Bonn – 14. June 2012 (Project focus) (Problem parameter) “Drizzle days” are often excluded from rainy days to calculate the interesting indices: Monthly Rain Frequency Consecutive Dry Days “Drizzle frequency” is not only a technical problem parameter, but also a per se interesting one.

Monthly Mean Temperature Daily Stew Meeting, Bonn – 14. June 2012 Temperature difference between Ellwangen-Rindelbach and Crailsheim-Alexandersreut shows 1 strong and 3 further significant breaks. The statistical signature confirms it: The first break contains much variance. 2, 3 and 4 are only slightly larger than the Mestre penalty.

Break Statistics Daily Stew Meeting, Bonn – 14. June 2012 Individual pair All pairs r = 0.937

Monthly Maximum For the monthly temperature maximum, only the largest breaks are detectable, probably due to the reduced correlation. Daily Stew Meeting, Bonn – 14. June 2012 r = 0.865

Additional Breaks? In maximum temperature there are less breaks. Are they nevertheless new compared to those in mean temperature? Enhance the penalty from about 12 (i.e. 2 ln(n)) to 60.) With n = 600, it means that 10% of the remaining internal variance has to be explained by each additional break. Otherwise the search is stopped. For such increased requirements 297 breaks are found in the mean and 67 in the maximum. Nearly all breaks in t max exist also in t mean. The “stddev” of temporal distance is 1.75 years. Daily Stew Meeting, Bonn – 14. June 2012

Answer: No Nearly no new break is found by the sensitive parameter Monthly Maximum Temperature. The lower correlation (0.865 vs  doubled rms) hamper obviously the break finding capability of the sensitive parameter. However, the high correlation of break positions may the opposite direction become possible: To find break positions in the maximum temperature by considering the mean temperature. Daily Stew Meeting, Bonn – 14. June 2012

“Drizzle Days” Monthly frequency of rainy days below 1mm. This parameter is highly inhomogeneous. Even for individual stations the break is evident. Daily Stew Meeting, Bonn – 14. June 2012

Drizzle vs. Mean Precip. Daily Stew Meeting, Bonn – 14. June 2012 In the drizzle parameter more significant breaks are found (index 43.3 compared to 28.8), although the correlation is low, (0.339 compared to 0.855). Are the break positions again correlated?

Correlation of break positions Many new breaks are found. Only 12 breaks of the drizzle parameter are found at all somewhere the corresponding time series of mean precipitation, but mostly far away. In 93 time series pairs one or more breaks are found for drizzle, but even not a single in mean precipitation. Are these new breaks also included, but hidden in mean precipitation? Daily Stew Meeting, Bonn – 14. June 2012  remember

Forced Breaks (1)  Daily Stew Meeting, Bonn – 14. June 2012

Forced Breaks (2) Also in average, the external variance decreases only by about 1%, if “drizzle breaks” are inserted into the time series of mean precipitation. 1% is the mean decrease of a random n=100 time series and it is beta- distributed. However, here n is equal to 600. Is the result then a bit better than random? Daily Stew Meeting, Bonn – 14. June 2012

Simulated Data 1. Blind try of 3 breaks in a 21 years random time series 2. Blind try of 3 breaks in a 21 years constant time series with 6 true breaks. 3. Blind try 3 breaks in a 21 years time series with 6 true breaks plus random scatter. Daily Stew Meeting, Bonn – 14. June Purely random 2. Pure true breaks 3. Realistic mix

Realistic Mixed Data Real data is expected to be similar to a realistic mix, rather than to random scatter. As it then includes also real breaks, the Null Hypothesis is not random scatter, but a realistic mix. Here the blindly found external variance is again  -distributed, but generally larger. How much is difficult to quantify in advance. It depends on the signal to noise ratio. Daily Stew Meeting, Bonn – 14. June 2012

Conclusions The analysis of random data shows that the external variance is  -distributed, which leads to a new formulation for the penalty term. True breaks are also  -distributed. Their external variance increases faster by a factor of n/n k compared to random scatter. Are sensitive parameters helpful to find additional breaks? Monthly maximum temperature: Due to the reduced spatial correlation T max “finds” less breaks. Those identified are even better visible in T mean. Drizzle parameter: Highly inhomogeneous  Many breaks found. But they do not coincide with breaks in mean precipitation. Vice versa we expect that T mean breaks are helpful to find breaks in T max. But the prove of significance will be difficult.