Quantities of interest in the analysis of time series extremes Some are determined by the marginal distribution: –probability of exceeding a high threshold –distribution of exceedances of a high threshold Others are determined by the clustering dynamics of extreme values: –average length of an extremal event (e.g. length of a flood) –distribution of the length of an extremal event –distribution of aggregate excesses (e.g. distribution of the flood volume)
Extremal index Conditions D(u n ) or (u n ) are always assumed. A stationary series has extremal index if there exists a real sequence u n for which n(1-F(u n )) P(M 1,n u n ) exp(- ) where M 1,n = max(X 1,X 2,...,X n ) Under D(u n ) the extremal index can be estimated as: = lim P(M 1,p(n) u n | X 0 >u n ) where p(n) is an appropriately increasing sequence p(n) is regarded as the cluster size
Cluster size distribution and point process convergence Distribution of the number of exceedances in [1,p n ]: n (j) = P( 1{X 1 >u n } {X p(n) >u n } = j | M 1,p(n) >u n ) The point process of exceedances: N n (.) = i/n (.)1{X i >u n } Under appropriate conditions: – n converges to some limiting distribution –N n (.) converges weakly to a compound Poisson process whose underlying Poisson process has intensity and whose i.i.d clusters are distributed as High-level exceedances occur in clusters, with cluster size distribution . Moreover, E( )=1/ .
Distribution of aggregate excess Aggregate excess above u in time interval [k,l]: W k,l (u) = (X k -u) + +(X k+1 -u) (X l -u) + This value (called flood volume in hydrology) is a good indicator of the severity of extreme events. Under appropriate conditions (Smith et al., 1997): W 1,n (u n ) d W 1 +W W K where K~Poisson( ) and the variables W i are i.i.d, independent of K. The distribution of W i can be regarded as the limiting aggregate excess distribution during an extremal event.
Problems Estimation of limiting quantities ( , , W) is difficult. Often the subasymptotic behaviour is of interest, too, since the convergence to the limit is very slow. To overcome these problems, one can restrict attention to certain families of models. A large class of Markov-chains behaves like a random walk at extreme levels which can be used to simulate extremal clusters in a Markov-chain, see e.g. Smith et al. (1997)
Water discharge series are non-Markovian – even above high thresholds If the series were Markovian, (X t -X t-1 | X t-1,X t-1 -X t-2 >0) ~ (X t –X t-1 | X t-1,X t-1 -X t-2 <0) would hold The following plots show X t -X t-1 as a function of X t-1 (if X t-1 is above the 98% quantile), conditionally on the sign of X t-1 -X t-2 The two plots are not similar!
A light-tailed conditionally heteroscedastic model X t -c t = a i (X t-i -c t-i ) + t + b j t-j t = t Z t t = [d 0 + d 1 (X t-1 -m) + ] 1/2 Z t is an i.i.d. sequence with zero mean and unit variance c t describes the deterministic seasonal behaviour in mean If all moments of Z t are finite, then all moments of X t are finite However, the exact tail behaviour is unknown (a special case of a similar model has Weibull-like tails, see Robert, 2000) The model approximates the extremal properties of water discharge series well (see Elek and Márkus, 2005)
A regime switching (RS) autoregressive model X t = X t-1 + 1t if I t = 1 (rising regime) X t = aX t-1 + 0t if I t = 0 (falling regime) 1t is an i.i.d noise, distributed as Gamma( , ) 0t is an i.i.d noise, distributed as Normal(0, ) 0<a<1 Successive regime durations are independent and distributed as –NegBinom( 1,p 1 ) in the rising regime –NegBinom( 0,p 0 ) in the falling regime
Properties of the RS-model Heuristic explanation: –X t gets independent positive shocks in the rising regime –it develops as a mean-reverting autoregression in the falling regime If 1 = 0 =1, then I t is a Markov-chain and X t is a Markov-switching autoregression The model is stationary by applying the result of Brandt (1986) for stochastic difference equations Regime switching models have deep roots in hydrology (see e.g. Bálint and Szilágyi, 2005)
The model gives back the asymmetric shape of the hydrograph
Tail behaviour of the stationary distribution Theorem: The process has Gamma-like upper tail: P( X t >u | I t =1 ) ~ K 1 u -1 exp{- u[1-(1-p 1 ) 1/ ]} P( X t >u | I t =0 ) ~ K 0 u -1 exp{- u[1-(1-p 1 ) 1/ ]/a} thus: P( X t >u ) ~ K 1 u -1 exp{- u[1-(1-p 1 ) 1/ ]}. The proof is based on the observations that the aggregate increment during a rising regime has Gamma-like tail which becomes “negligible” during the falling regime. Corollary: Exceedances above high thresholds are asymptotically exponentially distributed: lim u P(X t >x+u | X t >u) = exp{- x[1-(1-p 1 ) 1/ ]}
Limiting cluster quantities in the model I. Even when the regime lengths are negative binomial, the extremal index is p 1, and the limiting cluster size distribution is geometric with parameter p 1.
Limiting cluster quantities in the model II. If =1, the limiting aggregate excess distribution is W = E 1 + 2E NE N –where N is geometric with parameter p 1 –the variables E i are exponential with parameter, independent from each other and from N The exponential moments are infinite, but all polynomial moments are finite. Anderson and Dancy (1992) suggested to model the aggregate excesses of a hydrological data set with Weibull-distribution.
Slow convergence to the limiting quantities The plot gives (u,p) if =p 1 =0.5, p 0 =0.1, a=0.5 and = 0 = 1 =1 –for p=100 and 200 and –for u ranging from the 99% to the 99.99% quantile = lim p lim u P( M 1,p u | X 0 >u ) = (u,p)
Parameter estimation Estimation of the whole model with hidden regimes: –(reversible jump) MCMC –maximum likelihood if 1 = 0 =1 (i.e. in the Markov- switching case) – but it is computationally infeasible However, if we focus only on extremal dynamics and assume that the regime durations (at least above a high level) are geometrically distributed we can write down the likelihood based solely on data during floods (i.e. above a high threshold) =1 is also assumed (in accordance with the empirical data)
Exponential QQ-plot for the positive increments above the threshold 900 m 3 /s
Likelihood computations Likelihood can be determined recursively: –q t =P( I t =1 | X t, X t-1, …) –q 1cond = P( I t =1 | X t-1,…) = (1-p 1 )q t-1 + p 0 (1-q t-1 ) –q 0cond = P( I t =0 | X t-1,…) = p 1 q t-1 + (1-p 0 )(1-q t-1 ) –f 1 = f(X t, I t =1 | X t-1,…) = q 1cond f Exp( ) (X t -X t-1 ) –f 0 = f(X t, I t =0 | X t-1,…) = q 0cond f N(0, ) (X t -aX t-1 ) –f(X t | X t-1,…) = f 0 + f 1 –q t = f 1 /(f 0 + f 1 ) Some care is needed: –at the beginning of the floods q t is determined from the tail behaviour of the model –at the end of the floods the observation is censored
Advantages of using only the data over a threshold Model dynamics may be different at lower levels –For physical reasons, the rate of decay in the falling regime (characterised by a) is varying over the decay Fast maximum likelihood estimation –Smaller sample size –Regimes separate very well at high levels
Application to flood analysis Data: 50 years of daily water discharge series at Tivadar (river Tisza) – about observations We assume = 0 = 1 =1 Threshold: 900m 3 /s (about 98% quantile) Parameter estimates and asymptotic standard errors: –p 1 =0.598 (0.037) on average 1.7 days of further increase – in accordance with emp. value –p 0 =0.027 (0.011) has a negligible effect on the dynamics over the threshold –a=0.823 (0.007) high persistence even in the falling regime – = (0.0003) – =137.1 (8.0)
Empirical and simulated flood dynamics Shape of the empirical and simulated floods are very similar. Subasymptotic behaviour is important: –Simulated water discharge remains over the threshold for 1.4 days in average after the peak
Exceedances over a threshold Maximal exceedance over a threshold is approximately exponential with parameter p 1 =1/392 in the model, in good accordance with the empirical distribution. The plot shows the exceedance over the threshold 1250m 3 /s.
Aggregate excess (flood volume) Threshold = 1250 m 3 /s Operational definition: two floods are separated when the water discharge goes below a lower threshold (900 m 3 /s) between them There are only 48 such floods in 50 years Emp. mean: 72.1 mill. m 3 Sim. mean: 76.9 mill m 3 The QQ-plot shows the fit of the distribution, too.
Dependence of p 1 on the threshold
Conclusions The limiting cluster quantities can be determined in our physically motivated regime switching model Simulations are still needed since the subasymptotic behaviour is important at the relevant thresholds To determine return levels of, e.g., flood volume, the occurence of extreme events should also be modelled, by a Poisson-process. Further work: what parametric multivariate extreme value distribution does a reasonable multivariate regime switching model suggest?
