New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop, 08.07.2002.

Slides:



Advertisements
Similar presentations
Introduction to modelling extremes
Advertisements

Introduction to modelling extremes Marian Scott (with thanks to Clive Anderson, Trevor Hoey) NERC August 2009.
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Regression Eric Feigelson Lecture and R tutorial Arcetri Observatory April 2014.
Parameters of distribution
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Lecture 13 L1 , L∞ Norm Problems and Linear Programming
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Use of Kalman filters in time and frequency analysis John Davis 1st May 2011.
Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Budapest May 27, 2008 Unifying mixed linear models and the MASH algorithm for breakpoint detection and correction Anders Grimvall, Sackmone Sirisack, Agne.
Chap 8: Estimation of parameters & Fitting of Probability Distributions Section 6.1: INTRODUCTION Unknown parameter(s) values must be estimated before.
Sampling Distributions (§ )
STAT 497 APPLIED TIME SERIES ANALYSIS
Visual Recognition Tutorial
Lecture 9 Inexact Theories. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture 03Probability and.
Maximum likelihood (ML) and likelihood ratio (LR) test
Extremes ● An extreme value is an unusually large – or small – magnitude. ● Extreme value analysis (EVA) has as objective to quantify the stochastic behavior.
Climate Change and Extreme Wave Heights in the North Atlantic Peter Challenor, Werenfrid Wimmer and Ian Ashton Southampton Oceanography Centre.
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Lecture 8 The Principle of Maximum Likelihood. Syllabus Lecture 01Describing Inverse Problems Lecture 02Probability and Measurement Error, Part 1 Lecture.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
Visual Recognition Tutorial
Linear and generalised linear models
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Market Risk VaR: Historical Simulation Approach
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
On Model Validation Techniques Alex Karagrigoriou University of Cyprus "Quality - Theory and Practice”, ORT Braude College of Engineering, Karmiel, May.
Statistical Decision Theory
The Triangle of Statistical Inference: Likelihoood
Traffic Modeling.
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
Random Sampling, Point Estimation and Maximum Likelihood.
An Empirical Likelihood Ratio Based Goodness-of-Fit Test for Two-parameter Weibull Distributions Presented by: Ms. Ratchadaporn Meksena Student ID:
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Ch 2. Probability Distributions (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by Yung-Kyun Noh and Joo-kyung Kim Biointelligence.
Extreme values and risk Adam Butler Biomathematics & Statistics Scotland CCTC meeting, September 2007.
Lecture 4: Statistics Review II Date: 9/5/02  Hypothesis tests: power  Estimation: likelihood, moment estimation, least square  Statistical properties.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Statistical Decision Theory Bayes’ theorem: For discrete events For probability density functions.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University ECON 4550 Econometrics Memorial University of Newfoundland.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
Point Estimation of Parameters and Sampling Distributions Outlines:  Sampling Distributions and the central limit theorem  Point estimation  Methods.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Extreme Value Analysis
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Chris Ferro Climate Analysis Group Department of Meteorology University of Reading Extremes in a Varied Climate 1.Significance of distributional changes.
Review of Statistical Inference Prepared by Vera Tabakova, East Carolina University.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
SYSTEMS Identification Ali Karimpour Assistant Professor Ferdowsi University of Mashhad.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
A major Hungarian project for flood risk assessment A.Zempléni (Eötvös Loránd University, Budapest, visiting the TU Munich as a DAAD grantee) Technical.
Application of Extreme Value Theory (EVT) in River Morphology
Estimating standard error using bootstrap
Inference for Geostatistical Data: Kriging for Spatial Interpolation
Stochastic Hydrology Hydrological Frequency Analysis (II) LMRD-based GOF tests Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
More about Posterior Distributions
Hydrologic Statistics
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Parametric Methods Berlin Chen, 2005 References:
Sampling Distributions (§ )
Applied Statistics and Probability for Engineers
Presentation transcript:

New approaches in extreme-value modeling A.Zempléni, A. Beke, V. Csiszár (Eötvös Loránd University, Budapest) Flood Risk Workshop,

Analysis of extreme values Probably the most important part of the project (its aim: to estimate return levels – values which are supposed to be observed once in a given period, 100 years for example). Classical methods: based on annual maxima (other values are not used). Peaks-over-threshold methods: utilize all values higher than a given (high) threshold.

Extreme-value distributions (for modeling annual maxima) Letbe independent, identically distributed random variables. If we can find norming constants a n, b n such that has a nondegenerate limit, then this limit is necessarily a max-stable or so-called extreme value distribution. X 1, X 2,…,X n [max(X 1, X 2,…, X n )-a n ]/ b n

Characterisation of extreme-value distributions Limit distributions of normalised maxima: Frechet:(x>0) is a positive parameter. Weibull:(x<0) Gumbel: (Location and scale parameters can be incorporated.)

Estimation methods Maximum likelihood, based on the unified parametrisation: if the most widely used, with optimal asymptotic properties, if ξ>-0.5 Probability-weighted moments (PWM) Method of L-moments

Probability-weighted moments Analogous to the method of moments It puts more weight to the high values The estimators are got by equating the empirical and the theoretical weighted moments and solving the equations for the parameter vector.

Method of L-moments The basic characteristics (mean, variance, skewness and kurtosis) of the observed distribution are equated to their respective theoretical values. These values can be estimated by the help of the probability weighted moments.

Comparison Maximum likelihood is preferable, since –asymptotic properties are known, allowing the construction of confidence intervals –covariates can be incorporated into the model For the other methods, there is no firm theory.

Further investigations Estimates for return levels Confidence bounds should be calculated, possible methods –based on asymptotic properties of maximum likelihood estimator –profile likelihood –resampling methods (bootstrap, jackknife) –Bayesian approach

Confidence intervals For maximum likelihood: –By asymptotic normality of the estimator: where is the (i,i)th element of the inverse of the information matrix –By profile likelihood For other nonparametric methods by bootstrap.

Profile likelihood One coordinate of the parameter vector is fixed, the maximization is with respect the other components: Its main advantages: –The uncertainty can be visualized –More exact (asymmetric) confidence bounds –Model selection for nested models is possible by the likelihood ratio test

Model diagnostics Probability plot (P-P plot), the points: Quantile plot (Q-Q plot), the points: Both diagram should be close to the unit diagonal if the fit is good.

An example: a simulated 100- element sample of unit exponentials

Peaks over threshold methods Those events are considered extreme, which exceed a given (high) threshold Advantages: –More data can be used –Estimators are not affected by the small “floods” Disadvantages: –Dependence on threshold choice –Declustering not always obvious

Theoretical foundations Let X 1, X 2,…,X n be independent, identically distributed random variables. If the normalised maximum of this sequence converges to an extreme value distribution (with parameters μ,σ,ξ), then if y>0 and where The asymptotics holds if n and u increases.

Inference Similar to the annual maxima method: –Maximum likelihood is to be preferred –Confidence bounds can be based on profile likelihood –Model fit can be analyzed by P-P plots and Q-Q plots –Return levels/upper bounds can be estimated

Threshold selection Mean excess plot: For any u (threshold), plot the mean of X-u (for those observations for which X>u) against u. If the Pareto model is true, this plot should be nearly linear. The interpretation is made difficult by the great variability near the upper endpoint of the observations.

Another, very recent method: maximum cross-entropy Kullback introduced the concept of probabilistic distance (cross-entropy) of a posterior distribution h(t) from a prior f(t): The method (Pandey, 2002) minimizes the cross- entropy of x(t) (the observed quantile function of exceedances) with respect to its prior estimate y(t), which is chosen as the exponential (motivated by its central role within the GPD family).

Moment conditions PWM constraints: (usually N=3 is used), where is unbiased estimator for the k th PWM, based on the ordered sample of size n

Some results for Vásárosnamény Six threshold values were used: 440, 480, 520, 560, 600 and 640 centimetres. As constraints, we considered the first four PMWs, that is, N=3. The estimated 100-, 500-, and 1000-year return levels

Comments The results were not as stable as it was claimed in the original paper. We intended to add bootstrap confidence bounds to the estimates, but this was too time consuming in its original version and the used simplifications have not proven to be realistic.

Stationary sequences If the independence does not hold (as it is the case for the original daily observations), the limit of the normalized maxima is still a GEV distribution, if the dependence among far away observations tend to diminish. (See the talk of S. Gáspár.) So the GEV model for annual maxima has sound theoretical background. For POT models, the maximum of the clusters of exceedances may be used. (Clusters need to be defined).

To cope with nonstationarity Linear regression-type models can be incorporated into the maximum likelihood framework Profile likelihood, likelihood-ratio tests can be performed for nested models

Water-level data example: Vásárosnamény At least two observations per day for each station (there are approx. 50 of them) for 100 years. Reduction: one observation per day.

Annual maxima

The fitted model (shape= , right-endpoint: 778 cm)

Another station (estimated max: 945 cm)