Extremal cluster characteristics of a regime switching model, with hydrological applications Péter Elek, Krisztina Vasas and András Zempléni Eötvös Loránd.

Slides:



Advertisements
Similar presentations
Hydrologic Statistics Reading: Chapter 11, Sections 12-1 and 12-2 of Applied Hydrology 04/04/2006.
Advertisements

Linear Regression.
Threshold Autoregressive. Several tests have been proposed for assessing the need for nonlinear modeling in time series analysis Some of these.
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Hydrologic Statistics
STAT 497 APPLIED TIME SERIES ANALYSIS
1 Alberto Montanari University of Bologna Simulation of synthetic series through stochastic processes.
Quantile Estimation for Heavy-Tailed Data 23/03/2000 J. Beirlant G. Matthys
Model Fitting Jean-Yves Le Boudec 0. Contents 1 Virus Infection Data We would like to capture the growth of infected hosts (explanatory model) An exponential.
Extremes ● An extreme value is an unusually large – or small – magnitude. ● Extreme value analysis (EVA) has as objective to quantify the stochastic behavior.
Simulation Modeling and Analysis
Chapter 7 Sampling and Sampling Distributions
A gentle introduction to fluid and diffusion limits for queues Presented by: Varun Gupta April 12, 2006.
Time Series Basics Fin250f: Lecture 3.1 Fall 2005 Reading: Taylor, chapter
KYIV SCHOOL OF ECONOMICS Financial Econometrics (2nd part): Introduction to Financial Time Series May 2011 Instructor: Maksym Obrizan Lecture notes II.
Probability By Zhichun Li.
Presenting: Assaf Tzabari
Parametric Inference.
Extreme Value Analysis, August 15-19, Bayesian analysis of extremes in hydrology A powerful tool for knowledge integration and uncertainties assessment.
Lecture II-2: Probability Review
Chapter 21 Random Variables Discrete: Bernoulli, Binomial, Geometric, Poisson Continuous: Uniform, Exponential, Gamma, Normal Expectation & Variance, Joint.
Principles of the Global Positioning System Lecture 10 Prof. Thomas Herring Room A;
Flood Frequency Analysis
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
References for M/G/1 Input Process
Jrothenb – 1 Joerg Rothenbuehler. jrothenb – 2 The distribution of the Maximum:
Simulation Output Analysis
Standard Statistical Distributions Most elementary statistical books provide a survey of commonly used statistical distributions. The reason we study these.
Chapter 4 – Modeling Basic Operations and Inputs  Structural modeling: what we’ve done so far ◦ Logical aspects – entities, resources, paths, etc. 
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 7 Sampling Distributions.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Review and Preview This chapter combines the methods of descriptive statistics presented in.
Traffic Modeling.
Random Sampling, Point Estimation and Maximum Likelihood.
Stochastic Processes A stochastic process is a model that evolves in time or space subject to probabilistic laws. The simplest example is the one-dimensional.
© 2009 IBM Corporation 1 Improving Consolidation of Virtual Machines with Risk-aware Bandwidth Oversubscription in Compute Clouds Amir Epstein Joint work.
FREQUENCY ANALYSIS.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Continuous Distributions The Uniform distribution from a to b.
Queuing Theory Basic properties, Markovian models, Networks of queues, General service time distributions, Finite source models, Multiserver queues Chapter.
Borgan and Henderson:. Event History Methodology
1 A non-Parametric Measure of Expected Shortfall (ES) By Kostas Giannopoulos UAE University.
Lévy copulas: Basic ideas and a new estimation method J L van Velsen, EC Modelling, ABN Amro TopQuants, November 2013.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
Week 21 Stochastic Process - Introduction Stochastic processes are processes that proceed randomly in time. Rather than consider fixed random variables.
EAS31116/B9036: Statistics in Earth & Atmospheric Sciences Lecture 3: Probability Distributions (cont’d) Instructor: Prof. Johnny Luo
Stats Probability Theory Summary. The sample Space, S The sample space, S, for a random phenomena is the set of all possible outcomes.
ECE-7000: Nonlinear Dynamical Systems Overfitting and model costs Overfitting  The more free parameters a model has, the better it can be adapted.
Expectation. Let X denote a discrete random variable with probability function p(x) (probability density function f(x) if X is continuous) then the expected.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
STOCHASTIC HYDROLOGY Stochastic Simulation of Bivariate Distributions Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National.
New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop,
Probability distributions
S TOCHASTIC M ODELS L ECTURE 4 P ART II B ROWNIAN M OTIONS Nan Chen MSc Program in Financial Engineering The Chinese University of Hong Kong (Shenzhen)
Stochastic Excess-of-Loss Pricing within a Financial Framework CAS 2005 Reinsurance Seminar Doris Schirmacher Ernesto Schirmacher Neeza Thandi.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
ARCH AND GARCH V AIBHAV G UPTA MIB, D OC, DSE, DU.
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
A major Hungarian project for flood risk assessment A.Zempléni (Eötvös Loránd University, Budapest, visiting the TU Munich as a DAAD grantee) Technical.
Selecting Input Probability Distributions. 2 Introduction Part of modeling—what input probability distributions to use as input to simulation for: –Interarrival.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Application of Extreme Value Theory (EVT) in River Morphology
Introduction to Probability - III John Rundle Econophysics PHYS 250
Stochastic Process - Introduction
Flood Frequency Analysis
Hydrologic Statistics
Threshold Autoregressive
Presentation transcript:

Extremal cluster characteristics of a regime switching model, with hydrological applications Péter Elek, Krisztina Vasas and András Zempléni Eötvös Loránd University, Budapest 4th Conference on Extreme Value Analysis Gothenburg, 2005

Contents Outline of EVT for stationary series –extremal index –limiting cluster size distribution (e.g. distribution of flood length) –distribution of aggregate excesses (e.g. distribution of flood volume) Two models: –a light-tailed conditionally heteroscedastic model –a regime switching autoregressive model Extremal behaviour of the regime switching model Application to the study of flood dynamics

Quantities of interest in the analysis of time series extremes Some are determined by the marginal distribution: –probability of exceeding a high threshold –distribution of exceedances of a high threshold Others are determined by the clustering dynamics of extreme values: –average length of an extremal event (e.g. length of a flood) –distribution of the length of an extremal event –distribution of aggregate excesses (e.g. distribution of the flood volume)

Extremal index Conditions D(u n ) or  (u n ) are always assumed. A stationary series has extremal index  if there exists a real sequence u n for which n(1-F(u n ))   P(M 1,n  u n )  exp(-  ) where M 1,n = max(X 1,X 2,...,X n ) Under D(u n ) the extremal index can be estimated as:  = lim P(M 1,p(n)  u n | X 0 >u n ) where p(n) is an appropriately increasing sequence p(n) is regarded as the cluster size

Cluster size distribution and point process convergence Distribution of the number of exceedances in [1,p n ]:  n (j) = P( 1{X 1 >u n } {X p(n) >u n } = j | M 1,p(n) >u n ) The point process of exceedances: N n (.) =   i/n (.)1{X i >u n } Under appropriate conditions: –  n converges to some limiting distribution  –N n (.) converges weakly to a compound Poisson process whose underlying Poisson process has intensity  and whose i.i.d clusters are distributed as  High-level exceedances occur in clusters, with cluster size distribution . Moreover, E(  )=1/ .

Distribution of aggregate excess Aggregate excess above u in time interval [k,l]: W k,l (u) = (X k -u) + +(X k+1 -u) (X l -u) + This value (called flood volume in hydrology) is a good indicator of the severity of extreme events. Under appropriate conditions (Smith et al., 1997): W 1,n (u n )  d W 1 +W W K where K~Poisson(  ) and the variables W i are i.i.d, independent of K. The distribution of W i can be regarded as the limiting aggregate excess distribution during an extremal event.

Problems Estimation of limiting quantities ( , , W) is difficult. Often the subasymptotic behaviour is of interest, too, since the convergence to the limit is very slow. To overcome these problems, one can restrict attention to certain families of models. A large class of Markov-chains behaves like a random walk at extreme levels which can be used to simulate extremal clusters in a Markov-chain, see e.g. Smith et al. (1997)

Water discharge series are non-Markovian – even above high thresholds If the series were Markovian, (X t -X t-1 | X t-1,X t-1 -X t-2 >0) ~ (X t –X t-1 | X t-1,X t-1 -X t-2 <0) would hold The following plots show X t -X t-1 as a function of X t-1 (if X t-1 is above the 98% quantile), conditionally on the sign of X t-1 -X t-2 The two plots are not similar!

A light-tailed conditionally heteroscedastic model X t -c t =  a i (X t-i -c t-i ) +  t +  b j  t-j  t =  t Z t  t = [d 0 + d 1 (X t-1 -m) + ] 1/2 Z t is an i.i.d. sequence with zero mean and unit variance c t describes the deterministic seasonal behaviour in mean If all moments of Z t are finite, then all moments of X t are finite However, the exact tail behaviour is unknown (a special case of a similar model has Weibull-like tails, see Robert, 2000) The model approximates the extremal properties of water discharge series well (see Elek and Márkus, 2005)

A regime switching (RS) autoregressive model X t = X t-1 +  1t if I t = 1 (rising regime) X t = aX t-1 +  0t if I t = 0 (falling regime)  1t is an i.i.d noise, distributed as Gamma( , )  0t is an i.i.d noise, distributed as Normal(0,  ) 0<a<1 Successive regime durations are independent and distributed as –NegBinom(  1,p 1 ) in the rising regime –NegBinom(  0,p 0 ) in the falling regime

Properties of the RS-model Heuristic explanation: –X t gets independent positive shocks in the rising regime –it develops as a mean-reverting autoregression in the falling regime If  1 =  0 =1, then I t is a Markov-chain and X t is a Markov-switching autoregression The model is stationary by applying the result of Brandt (1986) for stochastic difference equations Regime switching models have deep roots in hydrology (see e.g. Bálint and Szilágyi, 2005)

The model gives back the asymmetric shape of the hydrograph

Tail behaviour of the stationary distribution Theorem: The process has Gamma-like upper tail: P( X t >u | I t =1 ) ~ K 1 u  -1 exp{- u[1-(1-p 1 ) 1/  ]} P( X t >u | I t =0 ) ~ K 0 u  -1 exp{- u[1-(1-p 1 ) 1/  ]/a} thus: P( X t >u ) ~ K 1 u  -1 exp{- u[1-(1-p 1 ) 1/  ]}. The proof is based on the observations that the aggregate increment during a rising regime has Gamma-like tail which becomes “negligible” during the falling regime. Corollary: Exceedances above high thresholds are asymptotically exponentially distributed: lim u  P(X t >x+u | X t >u) = exp{- x[1-(1-p 1 ) 1/  ]}

Limiting cluster quantities in the model I. Even when the regime lengths are negative binomial, the extremal index is p 1, and the limiting cluster size distribution is geometric with parameter p 1.

Limiting cluster quantities in the model II. If  =1, the limiting aggregate excess distribution is W = E 1 + 2E NE N –where N is geometric with parameter p 1 –the variables E i are exponential with parameter, independent from each other and from N The exponential moments are infinite, but all polynomial moments are finite. Anderson and Dancy (1992) suggested to model the aggregate excesses of a hydrological data set with Weibull-distribution.

Slow convergence to the limiting quantities The plot gives  (u,p) if  =p 1 =0.5, p 0 =0.1, a=0.5 and  =  0 =  1 =1 –for p=100 and 200 and –for u ranging from the 99% to the 99.99% quantile  = lim p  lim u  P( M 1,p  u | X 0 >u ) =  (u,p)

Parameter estimation Estimation of the whole model with hidden regimes: –(reversible jump) MCMC –maximum likelihood if  1 =  0 =1 (i.e. in the Markov- switching case) – but it is computationally infeasible However, if we focus only on extremal dynamics and assume that the regime durations (at least above a high level) are geometrically distributed we can write down the likelihood based solely on data during floods (i.e. above a high threshold)  =1 is also assumed (in accordance with the empirical data)

Exponential QQ-plot for the positive increments above the threshold 900 m 3 /s

Likelihood computations Likelihood can be determined recursively: –q t =P( I t =1 | X t, X t-1, …) –q 1cond = P( I t =1 | X t-1,…) = (1-p 1 )q t-1 + p 0 (1-q t-1 ) –q 0cond = P( I t =0 | X t-1,…) = p 1 q t-1 + (1-p 0 )(1-q t-1 ) –f 1 = f(X t, I t =1 | X t-1,…) = q 1cond f Exp( ) (X t -X t-1 ) –f 0 = f(X t, I t =0 | X t-1,…) = q 0cond f N(0,  ) (X t -aX t-1 ) –f(X t | X t-1,…) = f 0 + f 1 –q t = f 1 /(f 0 + f 1 ) Some care is needed: –at the beginning of the floods q t is determined from the tail behaviour of the model –at the end of the floods the observation is censored

Advantages of using only the data over a threshold Model dynamics may be different at lower levels –For physical reasons, the rate of decay in the falling regime (characterised by a) is varying over the decay Fast maximum likelihood estimation –Smaller sample size –Regimes separate very well at high levels

Application to flood analysis Data: 50 years of daily water discharge series at Tivadar (river Tisza) – about observations We assume  =  0 =  1 =1 Threshold: 900m 3 /s (about 98% quantile) Parameter estimates and asymptotic standard errors: –p 1 =0.598 (0.037) on average 1.7 days of further increase – in accordance with emp. value –p 0 =0.027 (0.011) has a negligible effect on the dynamics over the threshold –a=0.823 (0.007) high persistence even in the falling regime – = (0.0003) –  =137.1 (8.0)

Empirical and simulated flood dynamics Shape of the empirical and simulated floods are very similar. Subasymptotic behaviour is important: –Simulated water discharge remains over the threshold for 1.4 days in average after the peak

Exceedances over a threshold Maximal exceedance over a threshold is approximately exponential with parameter p 1 =1/392 in the model, in good accordance with the empirical distribution. The plot shows the exceedance over the threshold 1250m 3 /s.

Aggregate excess (flood volume) Threshold = 1250 m 3 /s Operational definition: two floods are separated when the water discharge goes below a lower threshold (900 m 3 /s) between them There are only 48 such floods in 50 years Emp. mean: 72.1 mill. m 3 Sim. mean: 76.9 mill m 3 The QQ-plot shows the fit of the distribution, too.

Dependence of p 1 on the threshold

Conclusions The limiting cluster quantities can be determined in our physically motivated regime switching model Simulations are still needed since the subasymptotic behaviour is important at the relevant thresholds To determine return levels of, e.g., flood volume, the occurence of extreme events should also be modelled, by a Poisson-process. Further work: what parametric multivariate extreme value distribution does a reasonable multivariate regime switching model suggest?

References Anderson, C.W. and Dancy, G.P. (1992): The severity of extreme events, Research Report 409/92 University of Sheffield. Bálint, G. and Szilágyi, J. (2005): A hybrid, Markov-chain based model for daily streamflow generation, Journal of Hydrol. Engineering, in press. Brandt, A. (1986): The stochastic equation Y n+1 =A n Y n +B n with stationary coefficients, Adv. in Appl. Prob., 18, Elek, P. and Márkus, L. (2004): A long range dependent model with nonlinear innovations for simulating daily river flows, Natural Hazards and Earth Systems Sciences, 4, Elek, P. and Márkus, L. (2005): A light-tailed conditionally heteroscedastic model with applications to river flows, in preparation. Robert, C. (2000): Extremes of alpha-ARCH models, in: Measuring Risk in Complex Stochastic Systems (ed. by Franke et al.), XploRe e-books. Segers, J. (2003): Functionals of clusters of extremes, Adv. in Appl. Prob., 35, Smith, R.L., Tawn, J.A. and Coles, S.G. (1997): Markov chain models for threshold exceedances, Biometrika, 84,

Thank you for your attention!