Download presentation
Presentation is loading. Please wait.
1
„EXTREME-VALUE ANALYSIS: FOCUSING ON THE FIT AND THE CONDITIONS, WITH HYDROLOGICAL APPLICATIONS” Dávid Bozsó, Pál Rakonczai, András Zempléni Eötvös Loránd University, Budapest 4th Conference on Extreme Value Analysis Probabilistic and Statistical Models and their Applications
2
Table of contents Goodness of fit procedures Checking the conditions D, D(u n ) and D’(u n ). Multivariate problems: Copulas Simulations Goodness of fit tests for copulas Time dependence Hydrological applications
3
Generalized Pareto distribution Peaks over a sufficiently high threshold u can be modeled by the generalized Pareto distribution (under mild conditions): Appropriate threshold selection is very important
4
Goodness of fit in univariate threshold models Usual goodness-of-fit tests (Chi-squared, Kolmogorov-Smirnov) are not sensitive for the tails A better alternative is the Anderson-Darling test, where the discrepancies near the tails get larger weights. Its computation:
5
Goodness of fit - continued Modification: often the focus is on one tail only For maximum: (Zempléni, 2004) Computation: Critical values can be simulated (like in Choulakian and Stevens, 2001)
6
Finding thresholds Theoritical results related to GPD are doubly asymptotic, since not only the sample size but the threshold has to converge to infinity as well How can we find suitable thresholds? Suggestion: Increase the threshold level step by step Fit the GPD (by ML method for example) and perform AD-type tests in all of the cases Select some levels, for which the fit is acceptable For more details, see Bozsó et al, 2005
7
Hydrological applications Daily water level data from several stations along the river Tisza were given (time span: more than 100 years) As an illustration we have chosen Szeged station, but in fact we have repeated the suggested procedures (almost) automatically for all the stations In later parts of the talk we shall also use data from Csenger (river Szamos)
8
Finding thresholds ThresholdShape-parameterAD-statistics..... 330-0.57171.1599 340-0.56010.9015 350-0.54730.6296 360-0.53440.4048 370-0.53120.4198 380-0.53390.5456 390-0.51910.3566 400-0.50330.2188 410-0.4990.2414 420-0.51520.437 430-0.4910.2163 440-0.48660.239 450-0.48960.2599 460-0.47750.33.....
9
Focusing on the conditions So far: Threshold selection Fit a GPD model for data over the selected threshold for iid data Dependence is present Possible long range dependence? Are the return levels affected by it?
10
Condition D and D(u n )
11
How to check condition D ? Set p=1 and r=1 in the definition of condition D and choose threshold u as the level of interest, e.g. 400 or 430 cm in our example Calculate for each lag l=1,…,1000
12
Applications: daily water level data 400 cm – 80% quantile 430 cm – 83% quantile Compare with |d(l)| for well-known sequences iid, normally distributed sequence AR(1) series
13
Applications: daily water level data Hydrological data (level: 430 cm) Normal iid sequences Sample mean 95% quantile AR(1) sequences Sample mean 95% quantile Simulation study confirms our hypothesis, empirical data is in the 95% confidence interval
14
Condition D’(u n ) Practical procedure: select a sequence (u n ), calculate and plot it as a function of k
15
Applications: daily water level data Hydrological data Normal iid sequence Sample mean Y n =max(X n,X n+1 ), where X 2 has a standard normal distribution Sample mean
16
Multivariate models Copulas are very useful tools for investigating dependence among the coordinates of multivariate observations The marginal distributions and the dependence structure can be modeled separately! Which parametric models to use for the hydrological applications? (in two dimensions)
17
Hydrological applications Water level peaks measured in two different stations are shown (peaks were coupled to each other if occured nearer than one month) With the help of the earlier algorithm we can choose threshold levels (blue lines) and fit GPD to the marginals Only those peaks are used, which are extremal in both coordinates!
18
QQ-Plot for marginals
19
Empirical copula After transforming the data into uniform marginals the empirical copula is obtained Which parametric copula is the most adequate for the given application?
20
Conceivable copulas in 2D Elliptical copulas: Gauss: Student-t: Archimedian copulas: Gumbel: Clayton: Other copulas: Frechet: …
21
Simulation - Gauss
22
Simulation – Student-t
23
Simulation – Clayton I.
24
Simulation – Clayton II.
25
Simulation – Gumbel
26
Goodness of fit for copulas Cramér-von Mises and Kolmogorov- Smirnov functionals of might be used to test the null hypotesis A simple approach, which is based on the multivariate probability integral transformation of F, is defined by where (U 1,...,U d ) is a vector of uniform variables having C as their joint distribution
27
Visual comparison Genest et al (2003) proposed a graphical procedure for model selection through the visual comparison of the non-parametric estimate K n (.) of K to the parametric estimate K(θ n,.),where The better the fit is, the closer the graphs of these functions are Question: how to define the distance between the graphs?
28
Weighted quadratic differences:
29
Which weights to use? In order to compare which test statistics performs better at detecting discrepancies in the upper tail we applied the following algorithm: 1. Simulate a sample from a parametric copula 2. Randomly choose two not concordant points (x,y) near the right tail and permute their coordinates so that the new points x*,y* are concordant (the marginals do not change) – but the copula changes 3. Perform the three versions of the test for the modified data set 4. Repeat steps 2 and 3, and investigate which statistics is faster in detecting the changes
30
The data and its permutations The number in the title gives the number of changed pairs
31
Detecting changes In general the tests based on weigthed squared deviation perform better than the original one.. Among the two weighted tests, the modified version is more sensible!
32
Simulation results We recorded how many steps the different tests needed to detect the changes during the replications As expected, the modified weights were the best! meanst.dev Sum of squares (SS) 15.788.57 weighted SS11.587.77.7 Modified weighted SS 10.157.13
33
Time dependence Has the dependence structure of the observations changed in the last century? Windows of 80 years with a step size of 5 years were used to detect possible changes Firstly we have to decide which copula to use
34
Time dependence In all of the three cases the Gumbel copula seems to be better than Frechet!
35
Simulated critical values n \ tau0.30.40.50.6 500,46120,46250,41040,3685 1000,2105 0,2036 0,18720,1723 1500,13450,12950,13120,117 n \ tau0.30.40.50.6 504,05583,35382,96262,4796 1001,6783 1,507 1,25641,1039 1501,10040,94050,84230,733 n \ tau0.30.40.50.6 502,87952,33591,93541,5223 1001,2066 1,0019 0,81530,6618 1500,79140,62780,53490,4314
36
Applications for the hydrological data set: time dependence N (sample size) Kendall- tautheta Sum of squares (SS)weighted SS Modified weighted SS 1880.43061.75630.12080.81480.5313 2900.45591.83790.161.18050.7844 3870.45031.8190.13761.16650.8706 4880.38721.63190.14191.50231.1526* 5940.38561.62770.0880.9060.6534 All obs.1190.4251.73910.0750.59870.3716 The only (marginally) significant value is marked with * A simulation study may be used for detecting changes in the dependence structure
37
References Bozsó, D., Rakonczai, P. and Zempléni, A. (2005). Floods on river Tisza and some of its affluents. Extreme-value modelling in practice. Statisztikai Szemle, accepted for publication. (In Hungarian.) Choulakian, V. and Stephens, M.A. (2001). Goodness-of-fit tests for the genaralized Pareto distribution. Technometrics 43, 478-484. D’Agostino, R.B. and Stephens, M.A. (1986). Goodnes-of-fit Techniques. Marcell Dekker. Genest, C. Quessy, J.-F. and Rémillard, B. (2003). Goodnes-of-fit Procedures for Copula Models Based on the Integral Probability Transformation. GERAD. Leadbetter, M. R. - Lindgren, G. and Rootzen, H. (1983). Extremes and Related Properties of Random Sequences and Processes, Springer. Zempléni, A. (2004). Goodness-of-fit test in extreme value applications. Discussion paper No. 383, SFB 386, Statistische Analyse Diskreter Strukturen, TU München.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.