Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007
Extreme values and risk Extreme value theory (EVT) is a branch of statistics concerned with the frequency & size of rare events EVT methods are widely used in finance, hydrology & engineering, usually for risk assessment, but are not yet widely used in the biological sciences
Extreme values and risk Risk assessment: What is the probability we will have more than 100mm of rain on a given day? Risk management: I need to build a flood defense, and I want the probability that it fails on any particular day to be less than 1-in How high should it be?
Extreme values and risk What is the chance of getting a log daily return of less than –0.1? (i.e. a drop in value of 9% or more since the previous day)
Extreme values and risk Common features We are interested in a process that can be quantified, and for which we have some data …and we want to use these data to say something about the probability that a rare or extreme event will occur
Extreme values and risk We will usually be interested in events that are beyond the range of the data i.e. we want to extrapolate Extrapolation is rarely advisable, but it is sometimes unavoidable, especially when doing risk assessment The standard approach would be to assume that the data come from, for example, a normal distribution… jk
Extreme values and risk P(X < –0.1)
Extreme values and risk …but: The extreme values don’t play much of a role when we estimate the parameters, so the model that we end up fitting might not describe the extreme values at all well…
Empirical: P(X < –0.05) Normal: P(X < –0.05) Extreme values and risk
…and, worse still, extrapolations beyond the range of the data often differ radically between models that provide a very similar fit to the bulk of the data…
Extreme values and risk Cauchy: P(X < –0.1) 0.02 Normal: P(X < –0.1)
Extreme values and risk
In EVT we adopt the principle that we should only make use of the most extreme data that we have observed we throw away almost all of the data
Extreme values and risk Threshold exceedances
Extreme values and risk
We consider exceedances of a high threshold EVT tells us that a good statistical model for exceedances, x, is the Generalised Pareto Distribution (GPD), P(x) = 1 – [1 + (x / )] -1/ (x > 0) = “scale parameter” = “shape parameter”
Extreme values and risk GPD: impact of the scale parameter = 1 = 2 = 3 = 0 = “scale parameter” = “shape parameter”
Extreme values and risk GPD: impact of the shape parameter = 0 = 1 = -0.5 = 1 = “scale parameter” = “shape parameter”
Extreme values and risk Threshold = u = 25mm and estimated by maximum likelihood to be 7.70 and respectively P(X > 100) estimated to be (once in a 131 years)
Extreme values and risk …but why is the GPD the “right model” to use? In theory: for almost any random variable X, the exceedances of a high threshold u will tend towards the GPD model as u tends towards infinity In practice: we use a threshold that is high but still finite: we rely on the fact that if this level is sufficiently high then the asymptotic result will still be approximately true
Extreme value methods “Parameter stability plot”
Extreme values and risk Other extreme value models A related approach involves analysing the maximum values per day, per month or per year (block maxima) EVT suggests that a good model to use in this case is the GEV (Generalised Extreme Value) ;
Extreme values and risk Advantages Robust Relies on weak assumptions Avoids bias Theoretically sound Justified by asymptotic theory Quick & relatively easy to use Honest …about the uncertainties involved in making statements about very rare events Disadvantages Inefficient Most of the data are thrown away …we may over-estimate uncertainty …relies on having a large sample size Asymptotics The theory only holds exactly for infinitely extreme events Difficult to extend to multivariate case Data quality Sensitive to errors in extreme data
Extreme values and risk Practicalities Basic course: Software: routines in… R, Genstat, S-plus, Matlab Extremes toolkit: Recommended book: Coles (2001) An introduction to statistical modeling of extreme values. Springer. Contact me: