1 Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Data-driven.

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009.
Hydrologic Statistics Reading: Chapter 11, Sections 12-1 and 12-2 of Applied Hydrology 04/04/2006.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Econ. & Mat. Enrique Navarrete Palisade Risk Conference
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Random Sampling and Data Description
Hydrologic Statistics
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Maximum likelihood (ML) and likelihood ratio (LR) test
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
Climate Change and Extreme Wave Heights in the North Atlantic Peter Challenor, Werenfrid Wimmer and Ian Ashton Southampton Oceanography Centre.
Quantitative Methods for Flood Risk Management P.H.A.J.M. van Gelder $ $ Faculty of Civil Engineering and Geosciences, Delft University of Technology THE.
WFM 5201: Data Management and Statistical Analysis
CHAPTER 6 Statistical Analysis of Experimental Data
Statistics and Probability Theory Prof. Dr. Michael Havbro Faber
Inferences About Process Quality
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Continuous Random Variables and Probability Distributions.
Flood Frequency Analysis
Chapter 9 Title and Outline 1 9 Tests of Hypotheses for a Single Sample 9-1 Hypothesis Testing Statistical Hypotheses Tests of Statistical.
Hydrologic Statistics
Inference for regression - Simple linear regression
Short Resume of Statistical Terms Fall 2013 By Yaohang Li, Ph.D.
Extreme Value Analysis What is extreme value analysis?  Different statistical distributions that are used to more accurately describe the extremes of.
Topic 4 - Continuous distributions
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
SPC for Real-World Processes William A. Levinson, P.E. Intersil Corporation Mountaintop, PA.
9 Mar 2007 EMBnet Course – Introduction to Statistics for Biologists Nonparametric tests, Bootstrapping
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
FREQUENCY ANALYSIS.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
Ch5. Probability Densities II Dr. Deshi Ye
Chapter 7 Point Estimation
1 Lecture 16: Point Estimation Concepts and Methods Devore, Ch
1 A non-Parametric Measure of Expected Shortfall (ES) By Kostas Giannopoulos UAE University.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Extreme Value Theory: Part II Sample (N=1000) from a Normal Distribution N(0,1) and fitted curve.
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Extreme Value Prediction in Sloshing Response Analysis
Sampling and estimation Petter Mostad
Identification of Extreme Climate by Extreme Value Theory Approach
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop,
Probability distributions
Extreme Value Analysis
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
CHAPTER 2.3 PROBABILITY DISTRIBUTIONS. 2.3 GAUSSIAN OR NORMAL ERROR DISTRIBUTION  The Gaussian distribution is an approximation to the binomial distribution.
Hydrological Forecasting. Introduction: How to use knowledge to predict from existing data, what will happen in future?. This is a fundamental problem.
Copyright © 2009 Pearson Education, Inc. 9.2 Hypothesis Tests for Population Means LEARNING GOAL Understand and interpret one- and two-tailed hypothesis.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Sampling: Distribution of the Sample Mean (Sigma Known) o If a population follows the normal distribution o Population is represented by X 1,X 2,…,X N.
A major Hungarian project for flood risk assessment A.Zempléni (Eötvös Loránd University, Budapest, visiting the TU Munich as a DAAD grantee) Technical.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Application of Extreme Value Theory (EVT) in River Morphology
Estimating standard error using bootstrap
Sampling Distributions
Concepts in Probability, Statistics and Stochastic Modeling
STATISTICS POINT ESTIMATION
Flood Frequency Analysis
Stochastic Hydrology Hydrological Frequency Analysis (II) LMRD-based GOF tests Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Hydrologic Statistics
The normal distribution
Parametric Methods Berlin Chen, 2005 References:
Environmental Statistics
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Presentation transcript:

1 Abdel H. El-Shaarawi National Water Research Institute and Department of Mathematics and Statistics, McMaster University Data-driven and Physically-based Models for Characterization of Processes in Hydrology, Hydraulics, Oceanography and Climate Change January 6-28, 2008 IMS, Singapore Modeling Extreme Events Data

2 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

3 References Beirlant Jan, Yuri Goegebeur, Johan Segers and Jozef Teugels (2004), Statistics of Extremes: Theory and Applications, NewYork: John Wiley & Sons. Castillo, E. and Hadi, A. S. (1994), Parameter and Quantile Estimation for the Generalized Extreme-Value Distribution, Environmetrics, 5, 417– 432. Castillo, E. and Hadi, A. S. (1995), A Method for Estimating Parameters and Quantiles of Continuous Distributions of Random Variables, Computational Statistics and Data Analysis, 20, 421–439.

4 References Castillo, E., Hadi, A. S., Balakrishnan, N., and Sarabia, J. M. (2006), Extreme Value and Related Models in Engineering and Science Applications, New York: John Wiley & Sons. Coles, S. (2001). An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London, England. El-Shaarawi, A. H., and Hadi, A. S.,Modified Likelihood Function for Parameter and Quantile Estimation, Work in progress. Nadarajah, S. and El-Shaarawi, A. H. (2006). On the Ratios for Extreme Value Distributions with Applications to Rainfall Modeling. Environmetrics Kotz, S. and Nadarajah, S. (2000). Extreme Value Distributions: Theory and Applications. London:Imperial College Press.

5 Software: S-plus & R Stuart Coles S-plus package available at URL: URL: extRemes R package available at

6 Examples of Extreme Events Data In many statistical applications, the interest is centered on estimating some population characteristics based on random samples taken from a population under study. For example, we wish to estimate: the average rainfall, the average temperature, the median income, … etc.

7 Examples of Extreme Events Data In other areas of applications, we are not interested in estimating the average but rather in estimating the maximum or the minimum. 1. Ocean Engineering: In the design of offshore platforms, breakwaters, dikes and other harbor works, engineers rely upon the knowledge of the probability distribution of the maximum, not the average wave height. Some Examples:

8 Examples of Extreme Events Data 2.Structural Engineering: Modern building codes and standards require: Estimation of extreme wind speeds and their recurrence intervals during the lifetime of the building. Knowledge of the largest loads acting on the structure during its lifetime. Seismic incidence: the maximum earthquake intensity during the lifetime of the building.

9 Examples of Extreme Events Data 3.Designing Dams: Engineers would not be interested in the probability distribution of the average flood, but in the maximum floods. 4.Agriculture: Farmers would be interested in both the minimum and maximum rain fall (drought versus flooding). 5.Insurance companies would be interested in the maximum insurance claims.

10 Examples of Extreme Events Data 6.Pollution Control: The pollution of air and water has become a common problem in many countries due to large concentrations of people, traffic, and industries (producing smoke, human, chemical, nuclear wastes, etc.). Government regulations, require pollution indices to remain below a given critical level. Thus, the regulations are satisfied if, and only if, the largest pollution concentration during the period of interest is less than the critical level.

11 Nile meter

12 U.S. Bureau of the census, Watson and Pauly (2002) Living resources: food security

13 Niagara River Fraser River

14 Upstream-Downstream Water Quality Monitoring Human and Ecosystem Health: Regulations and Control

15 Time Plots: Fraser Hope

16 Evolution of the Flow along the Fraser River Hansard/Red Pass

17 Max of log (Flow) at Hope

18 Some Results for Max (Hope)

19 Yearly maximum significant wave-height data Two More Example: wave-height & Temperature (Basel)

20 Two Stations: Ratio of GEV Distributions W=X/(X+Y)

21

22

23 Seoul Rainfall Data

24 Microbiological Regulations (Human health)

25 Approximate expression for probability of compliance with the regulations

26 Sample size n=5 and 10 # of simulations =10000

27 Ratio of single sample rejection probability to that of the mean rule (n = 5,10 and 20)

28 The Temperature Data: Change-Point

29 Relative Likelihood Function for the Change Point

30 Relative Likelihood function for the Change Point (Temp. Data)

31 Q-Q plots for the two segements

32 Return Levels

33 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

34 Types of Extreme Events Data The choice of model and estimation methods depends on the type of available data. Data, x 1, x 2, …, x n, drawn from a possibly unknown population, are available. We wish to: 1. Find an appropriate parametric model, F ( x ;  ), that fits the data reasonably well 2. Estimate the parameters,  and quantiles, X ( p ), of such a model

35 Types of Extreme Events Data Examples: 1.Complete Data: All n observations are available. Daily/Monthly energy consumption Daily/Monthly rain fall, stream discharge or flood flow

36 Types of Extreme Events Data Examples: 2.Maxima/Minima: Only maxima or minima are available. Maximum/minimum daily/monthly temperatures Maximum daily/monthly wave heights Maximum daily/monthly wind speeds, pollution concentrations, etc.

37 Types of Extreme Events Data 3.Exceedances over/under a threshold: When using yearly maxima (minima), then an important part of the information large (small) values (other than the two extremes occurring the same year) is lost. The alternative is to use the exceedances over (under) a given threshold.

38 Exceedances Over/Under a Threshold We are interested in events that cause failure such as exceedances of a random variable over a threshold value. For example, waves can destroy a breakwater when their heights exceed a given value, say 9 meters. Then it does not matter whether the height of a wave is 9.5, 10 or 12 meters because the consequences of these events are similar.

39 Exceedances Over/Under a Threshold So, only failure causing observations exceeding a given threshold are available. Definition: Let X be a random variable and u be a given threshold value. The event { X = x } is said to be an exceedance at the level u if X > u.

40 Summary: Types of Data Extreme events data come in one of three types: 1. Complete observations, 2. Maxima/Minima, or 3. Exceedances over/under a threshold value

41 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

42 Commonly Used Models for Extremes The choice of model depends on the type of available data: Distributions of Order Statistics (DOS): Used when we have complete data Generalized Extreme Value (GEV) Distribution (AKA: Von Mises Family): Used for maxima/minima type of data Generalized Pareto Distribution (GPD): Used for exceedances over/under threshold type of data

43 Distributions of Order Statistics Let X 1, X 2, …, X n be a sample of size n from a possibly unknown cdf F ( x;  ), depending on unknown vector-valued parameter . Let X 1:n < X 2:n < … < X n:n be the corresponding order statistics. X i:n is called the ith order statistic. Of particular interest is the minimum, X 1:n, and the maximum, X n:n order statistics.

44 Distributions of Order Statistics The distributions of the the order statistics are well know. For example: The cdf of the maximum order statistics is: The cdf of the minimum order statistics is:

45 Problems with Distributions of OS The distributions of the order statistics have the following practical problems: 1. The cdf of the parent population, F ( x;  ), is usually unknown 2. When the data consist only of maxima or minima, the sample sizes are usually unknown

46 Non-Degenerate Limiting Distributions The answer to the above problem is: Theorem: 1. The only non-degenerate cdf family satisfying (1) is the Maximal Generalized Extreme Value Distribution (GEV M ). 2. The only non-degenerate cdf family satisfying (2) is the Minimal Generalized Extreme Value Distribution (GEV m ).

47 Generalized Extreme Value Distributions Thus, there are two GEV distributions, one maximal, GEV M, and one minimal, GEV m. The GEV (AKA, Von Mises) distributions were introduced by Jenkinson (1955). They are used when we have a large sample or the observations themselves are either minima or maxima. Their cdf are given later.

48 Generalized Extreme Value Distributions The GEV distributions are now widely used to model extremes of natural and environmental data. Examples are found in: Flood Studies Report of the USA’s Natural Environment Research Council (1975) Several articles in Tiago de Oliveira (1984) Hosking, Wallis, and Wood (1985) Castillo et al. (2006)

49 Maximal Generalized Extreme Value The cumulative distribution function (cdf) of the maximal GEV M distribution is:

50 Minimal Generalized Extreme Value The cumulative distribution function (cdf) of the minimal GEV m distribution is:

51 Relationship Between GEV M and GEV m Theorem: If the cdf of X is L (, ,  ), then the cdf of Y =  X is H ( , ,  ). Implication: One form of the cdf can be obtained from the other.

52 Maximal Generalized Extreme Value The GEV M family has three-parameters: is a location parameter  is a scale parameter (  > 0)  is a shape parameter The parameter  is the most important of the three. The pth quantile is (0 < p < 1):

53 Special Cases of the Maximal GEV The family of GEV M has three special cases: 1.The Maximal Weibull distribution is obtained when  > 0. Its cdf is:

54 Special Cases of the Maximal GEV 2.The Maximal Gumbel distribution is obtained when  = 0. Its cdf is:

55 Special Cases of the Maximal GEV 3.The Maximal Frechet distribution is obtained when  < 0. Its cdf is:

56 Weibull, Gumbel, and Frechet Weibull and Frechet converge to Gumbel

57 Summary The GEV family can be used when: 1. The cdf of the parent population, F ( x;  ), is unknown 2. The sample size is very large (no degeneracy problems) 3. The data consist only of maxima or minima (we do not need to know the sample sizes)

58 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

59 Types of Extreme Events Data Recall the three types of extreme events data: 1. Complete Data: All n observations are available. 2. Maxima/Minima: Only maxima or minima are available 3. Exceedances over/under a threshold: Only observations exceeding a given threshold are available Use distributions of order statistics if we know F(x) and n is not too large; else, use GEV. Use GPD. Use GEV.

60 Exceedances Over/Under a Threshold As mentioned earlier, we are interested in events that cause failure such as exceedances of a random variable over a threshold value. The differences between the actual values and the threshold value are called exceedances over/under the threshold.

61 Generalized Maximal Pareto Distributions Pickands (1975) demonstrates that when the threshold tends to the upper end of the random variable, the exceedances follow a generalized Pareto distribution, GPD M ( ,  ), with cdf

62 Generalized Maximal Pareto Distribution The GPD M family has a two-parameters:  is a scale parameter (  > 0)  is a shape parameter The pth quantile is (0 < p < 1): Note that when

63 Special Cases of the Maximal GPD The GPD M has three special cases: 1. When  = 0, the GPD M reduces to the Exponential distribution with mean . 2. When  = 1, the GPD M reduces to the Uniform U(0,  ). 3. When  < 0, the GPD M becomes the Pareto distribution.

64 Generalized Minimal Pareto Distribution A similar family exists for the case of exceedances under a threshold. These are called the the Generalized Minimal Pareto distributions or the Reversed Generalized Pareto distributions.

65 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

66 Parameter and Quantile Estimation Available estimation methods include: 1. The maximum likelihood (MLE): Jenkinson (1969) Prescott and Walden (1980, 1983) Smith (1984, 1985) 2.The method of moments (MOM)

67 Parameter and Quantile Estimation 3. The probability weighted moments (PWM): Greenwood et al. (1979),Hosking et al. (1985) 4.The Elemental Percentile method (EPM): Castillo and Hadi (1995) 5.Order Statistics (Least Squares): El-Shaarawi 5.Modified Likelihood Function (MLF): El- Shaarawi and Hadi (work in progress).

68 Problems With Traditional Estimators Traditional methods of estimation (MLE and the moments-based methods) have problems because: The range of the distribution depends on the parameters: x 0 x > +  / ,for  > 0 So, MLE do not have the usual asymptotic properties.

69 Problems With Traditional Estimators The MLE requires numerical solutions. For some samples, the likelihood may not have a local maximum. For  > 1, the MLE do not exist (the likelihood can be made infinite).

70 Problems With Traditional Estimators When  <  1, the mean and higher moments do not exist. So, MOM and PWM do not exist when  <  1. The PWM estimators are good for cases where –0.5 <  < 0.5. Outside this range of , the PWM estimates may not exist, and if they do exist their performance worsens as  increases.

71 Recently Proposed Estimation Methods 4.The Elemental Percentile method (EPM): Castillo and Hadi (1995) 5.Modified Likelihood Function (MLF): El-Shaarawi and Hadi (work in progress). This leaves us with two recently proposed methods for estimating the parameters and quantiles of the extreme models:

72 Elemental Percentile method (EPM) 1. Initial estimates are obtained by equating three distinct order statistics to their corresponding percentiles:

73 Elemental Percentile method (EPM) 2. Substitute the cdf of the GEV M, we obtain: These are three equations in three unknowns:, , and .

74 Elemental Percentile method (EPM) To solve these equations, we eliminate and , and obtain: where Solving this equation for  by the bisection method, we obtain an initial estimate

75 Elemental Percentile method (EPM) Substituting in two of the above equations and solve for and  :

76 Elemental Percentile method (EPM) Theorem: The initial estimates are asymptotically normal and consistent. Final estimates of, , and  are obtained by combining all possible triplets and obtain efficient estimates using a suitable function such as the trimmed mean.

77 The Modified Likelihood Function (MLF) The MLF method can be thought of as a marriage between the maximum likelihood method and the method of moments. The ideas behind the method are: 1. The log likelihood function is:

78 The Modified Likelihood Function (MLF) 2. The modified likelihood: A Taylor series expansion of around gives

79 The Modified Likelihood Function (MLF) 3. Let where are plotting positions. 4. Substitute these in the modified likelihood and solve for .

80 The Modified Likelihood Function (MLF) We think this will be a happy marriage, but to be sure we are: Investigating (analytically and using simulation) the properties of the proposed estimators and their dependence on the choice of the plotting positions p i:n. This is still work in progress.

81 Outline Some references Examples of extreme events data Types of extreme events data Commonly used models for extremes: Distributions of order statistics Generalized extreme value distributions Generalized Pareto distributions Parameter and quantile estimation of extremes Summary and concluding remarks

82 Summary The choice of models for extremes depends on the type of data available: 1. Complete Data: All n observations are available. 2. Maxima/Minima: Only maxima or minima are available 3. Exceedances over/under a threshold: Only observations exceeding a given threshold are available Use GPD. Use GEV. Use distributions of order statistics if we know F(x) and n is not too large; else, use GEV.