Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January.

Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January 2008 Regionalization of Statistics Describing the Distribution of Hydrologic Extremes

Extreme Value Theory & Hydrology Annual maximum flood may be daily maximum, or instantaneous maximum. Annual maximum 24-hour rainfall may be daily maximum or maximum 1440-minute values. Annual maximums are not maximum of I.I.D. series: Years have definite “wet” and “dry” seasons Daily values are correlated Because of El Niño and atmospheric patterns, some years extreme-event prone, others are not. Peaks-over-threshold (PDS) another alternative.

Outline Summarizing Data: Moments and L-moments Parameter estimation for GEV –Use of a prior on  –PDS versus AMS with GMLEs Bayesian GLS Regression for regionalization Concluding observations

Definitions: Product-Moments Mean, measure of location µ x = E[ X ] Variance, measure of spread  x 2 = E[ (X – µ x )2] Coef. of Skewness, asymmetry  x = E[ (X – µ x )3] /  x 3

Conventional Moment Ratios Conventional descriptions of shape are Coefficient of Variation, CV:  Coefficients of skewness,  : E[(X-µ) 3 ] /  3 Coefficients of kurtosis,  : E[(X-µ) 4 ] /  4

Samples drawn from a Gumbel distribution.

L-Moments An alternative to product moments now widely used in hydrology.

L-Moments: an alternative L-moments can summarize data as do conventional moments using linear combinations of the ordered observations. Because L-moments avoid squaring and cubing the data, their ratios do not suffer from the severe bias problems encountered with product moments. Estimate using order statistics

L-Moments: an alternative Let X (i|n) be ith largest obs. in sample of size n. Measure of Scale expected difference largest and smallest observations in sample of 2: 2 = (1/2) E[ X (2|2) - X (1|2) ] Measure of Asymmetry 3 = (1/3) E[ X (3|3) - 2 X (2|3) + X (1|3) ] where 3 > 0 for positively skewed distributions

L-Moments: an alternative Measure of Kurtosis 4 = (1/4) E[ X (4|4) – 3 X (3|4) – 3 X (2|4) + X (1|4) ] For highly kurtotic distributions, 4 large. For the uniform distribution 4 = 0.

Dimensionless L-moment ratios L-moment Coefficient of variation (L-CV):         /µ L-moment coef. of skew (L-Skewness)       L-moment coef. of kurtosis (L-Kurtosis)       (Note: Hosking calls L-CV  instead of  .)

Samples drawn from a Gumbel distribution.

Generalized Extreme Value (GEV) distribution Gumbel's Type I, II & III Extreme Value distr.: F(x) = exp{ – [ 1 – (  /a)(x-  )] 1 /  } for  ≠ 0  = shape;  = scale,  = location. Mostly -0.3 <  ≤ 0 [Others use for shape .]

GEV Prob. Density Function

GEV Prob. Density Function large x

Simple GEV L-Moment Estimators Using L-moments – Hosking, Wallis & Wood (1985) c = 2/(  3 + 3) – ln(2)/ln(3);  3 = 3 / 2 then  = 7.8590 c + 2.9554 c 2 ;  3  ≤ 0.5  =  2 / [  (1+  ) (1 – 2 -  ) ]  = 1 +  [  (1+  ) – 1 ] /  Quantiles: x p =  + (  ) { 1 – [ -ln(p) ]  } Method of L-moments simple and attractive.

Index Flood Methodology Research has demonstrated potential advantages of index flood procedures for combining regional and at-site data to improve the estimators at individual sites.

Hosking and Wallis (1997) Development of L-moments for regional flood frequency analysis. Research done in the 1980-1995 period. J.R.M. Hosking and J.R. Wallis, Regional Frequency Analysis: An Approach Based on L- moments, Cambridge University Press, 1997.

Compute for region average L-CV and L-CS which yields regional y p

Index Flood Methodology Use data from hydrologically "similar" basins to estimate a dimensionless flood distribution which is scaled by at-site sample mean. "Substitutes Space for Time" by using regional information to compensate for relatively short records at each site. Most of these studies have used the GEV distribution and L-moments or equivalent.

Trouble with MLEs for GEV X 0.999 = 14.9 (true) = 6,000,000 (est.) CASE: N = 15, X ~ GEV(  = 0,  = 1,  = –0.20) MLE Solution:

Parameter Estimators for 3-parameter GEV distribution 1.Maximum Likelihood (ML) 2.Method of Moments (MOM) 3.Method of L-moments (LM) 4. Generalized Maximum Likelihood (GML) Introduces a prior distribution for  that ensures estimator within ( -0.5, +0.5), and encourages values within (-0.3, +0.1) Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood GEV quantile estimators for hydrologic data, Water Resour. Res.. 36(3), 737-744, 2000. Or can use a penalty to enfore constraint that  > -1: Coles, S.G., and M.J.Dixon, Likelihood-Based Inference for Extreme Value Models, Extremes 2:1, 5-23, 1999.

Prior distribution on GEV 

Performance Alternative Estmators of x 0.99 for GEV distribution, n = 25 0 2 4 6 8 -0.3-0.2-0.100.10.20.3  RMSE ML LM MOM GML

Performance Alternative Estmators of x 0.99 for GEV distribution, n = 100 RMSE 0 1 2 3 4 -0.3-0.2-0.100.10.20.3 ML LM MOM GML 

GEV Estimators In 1985 when Hosking, Wallis and Wood introduced L-moment (PWM) estimators for GEV, they were much better than MLEs and Quantile estimators In 1998 Madsen and Rosbjerg demonstrated MOM were not so bad, perhaps better than L-Moments. Finally in 2000 Martins & Stedinger demonstrated that adding realistic control of GEV shape parameter  yielded estimators that dominated competition. This is a distribution with modest-accuracy regional description of shape parameter.

Partial Duration or Annual Maximum Series. by seeing more little floods, do we know more about big floods ?

Partial Duration Series (PDS) Peaks over threshold (POT)

Poisson/Pareto model for PDS = arrival rate for floods > x 0 which follow a Poisson process G(x) = Pr[ X ≤ x ] for peaks over threshold x > x 0 is a Generalized Pareto distribution = 1 – { 1 -  [ (x - x 0 )/  ] } 1/  Then annual maximums have Generalized Extreme Value distribution F(x) = exp{ – ( 1 -  [ (x -  )/  ’ ] ) 1/    = x 0 +  (1 – -  )/   ’ =   same 

Which is more precise: AMS or PDS? Consider where estimate only 2 parameter. Fix  = 0, corresponding to Poisson arrivals with exponential exceendances: Share & Lynn (1964) model for flood risk.

Poisson Arrivals with Exponential Exceedances (  = 0 )

Which is more precise: AMS=GP or PDS=GEV ? RMSE-ratio = Now estimate 3 parameters using PDS data employing XXX = MOM, L-Moments (LM) and GML with Generalized Pareto distribution and compare RMSE of PDS-XXX to RMSE of AMS-GMLE GEV estimator.

RMSE 3 PDS estimators vs AMS-GML = 5 events/year RMSE-Ratio PDS/AMS-GMLE shape parameter   -0.3 -0.2 -0.1 0 +0.1 +0.2 +0.3

RMSE 3 PDS estimators vs AMS-GML  = – 0.30 RMSE-Ratio PDS/AMS-GMLE  events per year

Conclusions: PDS versus AMS For  < 0, with PDS data, again GML quantile estimators generally better than MOM, LM and ML. Precision of GML quantile estimators insensitive to  A year of PDS data generally worth a year of AMS data for estimating 100-year flood when employing the GMLE estimators of GP and GEV parameters: more little floods do not tell us about the distribution of large floods.

GLS Regression for Regional Analyses GOAL– Obtain efficient estimators of the mean, standard deviation, T-yr flood, or GEV parameters as a function of physiographic basin characteristics; and provide the precision of that estimator. MODEL– log[Statistic-of-interest ] =  +  1 log(Area) +  2 log(Slope) +... + Error

GLS Analysis: Complications With available records, only obtain sample estimates of Statistic-of-Interest, denoted y i Total error  i  is a combination of – (i)time-sampling-error  i in sample estimators y i which are often cross-correlated, and (ii)underlying model error  i (true lack of fit). Variance of those errors about prediction X  depends on statistics-of-interest at each site. ^ ^ Model error Sample error Total error Prediction ^

GLS for Regionalization Use Available record lengths n i, concurrent record lengths m ij, regional estimates of stan. deviations  i, or  2i,  3i and cross-correlations  ij of floods to estimate variance & cross-correlations of  describing errors in i. With true model error variance    determine covariance matrix  (   ) of residual errors:  (   ) =   I +   where  ( ) is covariance matrix of the estimator

GLS Analysis: Solution GLS regression model (Stedinger & Tasker, 1985, 1989) = X  +  with parameter estimator b for  { X T  (   ) -1 X } b = X T  (   ) -1 Can estimate model-error   using moments ( – X b) T  (   ) -1 ( – X b) = n - k  (   ) =   I +   n = dimension of y; k = dimension of b

Likelihood function - model error     Tibagi River, Brazil, n=17) Maximum of likelihood may be at zero, but larger values are very probable. Zero clearly not in middle of likely range of values. Method of moments has Same problem zero estimate.

Advantages of Bayesian Analysis Provides posterior distribution of parameters  model error variance   2, and predictive distribution for dependent variable Bayesian Approach is a natural solution to the problem

Bayesian GLS Model Prior distribution:  ( ,    ) -Parameter  are multivariate normal (  ) -Model error variance    Exponential dist. ( ); E[    ] = = 24 Likelihood function : Assume data is multivariate N[ X ,  ]

Quasi-Analytic Bayesian GLS  Joint posterior distribution  Marginal posterior of    where integrate analytically normal likelihood & prior to determine f in closed-form.

MM-GLS for      0.000 MLE-GLS for      0.000 Bayesian GLS for      0.046 Example of a posterior of     Model 1,  Tibagi, Brazil, n =17) Model error variance   

Quasi-Analytic Result From joint posterior distribution can compute marginal posterior of  and moments by 1- dimensional num. integrations

Bayesian GLS for Regionalization of Flood Characteristics in Korea Dae Il Jeong Post-doctoral Researcher, Cornell University Jery R. Stedinger Professor, Cornell University Young-Oh Kim Associate Professor, Seoul National University Jang Hyun Sung Graduate Student, Seoul National University

Korean River basins Land Area: 120,000 km 2 Major river basins: Han, Nakdong, Geum Total Annual Precipitation: (TAP) = 1283mm Two thirds of TAP occurs during 3-month flood season (Jul~Sep) Available sites: 31 Average length: 22 years Han River Basin Nakdong River Basin Geum River Basin

Korean Application Regional estimators of L-CV  2 and L-CS  3 for flood frequency analysis using GEV distribution 6 Explanatory Variables 2 indicators (Han-Nakdong-Geum basins) logs of drainage area logs of channel slope mean precipitation SD of annual maximum precipitation

Cross-correlation concurrent maxima

Monte Carlo results for cross-correlation L-CS estimators GEV+ when  = -0.3 and  2 = 0.3  xy - cross-correlation annual maxima  xy - cross- Corre- lation L-CS estimators

Regression Results L-CV Model Name Const.Ln(Area) Mean Ppt Model Error Var.    Avg Sampling Var. AVP GLS Pseudo R 2 (%) ERL (years) B-GLS00.41780.00770.00090.0087014 (0.0306)(0.0033) B-GLS2 0.4220-0.0416-0.13070.00430.0015 0.00574521 (0.0285)(0.0116)(0.0522)(0.0021) [0.1 %][1.3 %] Standard error in parentheses ( - ); p-value in brackets [ - ].

Performance Measures Average Variance of Prediction (AVP) How well model estimates true value of quantity of interest on average across sites Pseudo R 2 : improvement of GLS(k) versus GLS(0) Effective Record Length (ERL) Relative uncertainty of regional estimate compared to an at-site estimator

Regression Results L-CS  3 Model Name Const.Ln(Area) Model Error Var.    Avg Sampling Var. AVP Pseudo R 2 (%) ERL (years) B-GLS00.34020.00940.00290.0123039 (0.0538)(0.0056) B-GLS10.3405-0.05350.00600.00350.00943751 (0.0489)(0.0183)(0.0044) [0.6 %] Standard error in parentheses ( - ); p-value in brackets [ - ].

Model Diagnostic Measures Pseudo ANOVA table -Variation explained by regional model -Residual variation due to model errors -Residual variation due sampling errors -Represents partition of TOTAL variation

Pseudo ANOVA Table for L-CV and L-CS Source Degrees- of-freedom Sum of squares EquationsL-CVL-CS Modelk = 1 or 2 n[   2 (0) -   2 (k)] 0.1080.106 Model error δn - k - 1 n2(k)n2(k) 0.1320.185 Sampling error ηn0.1560.624 Total2n - 10.3960.916 1.183.38 2.893.76 Pseudo R 2 45 %37 %, where w is the vector ( ) We need GLS regression analysis ERL (years) = 21 51

Conclusion: Value in Korea Regional estimator for L-Coefficient of Variation should be combined with its at-site estimator ERL(  2 ) = 21 years ≈ average record length (22 yrs) Regional estimator for L-skewness was more precise than at-site estimators ERL(  3 ) = 51 years > average record length (22 yrs) Clearly advantageous to use BOTH regional and at-site information in analysis of annual maxima.

Diagnostic Statistics Statistics for evaluating data concerns, precision of predicted values, sources of variation, and model adequacy: Leverage and Influence Measures of Prediction Precision Pseudo R 2 and ANOVA Modeling Diagnostics: EVR & MBV Bayesian Plausibility Level

Bayesian Hierarchical Model: Solve whole problem at once? Assume values for each site i for i = 1, …, K X it ~ GEV(  ), t = 1, …, n i where for parameters we have  i ~ N(µ     i ~ N(µ     where perhaps  i  i /  I or coef. of variation  i ~ N(µ     with priors on µ    ; µ    ; µ    whose values for each site I may depend on at-site physiographic characteristics of that site. Ignores cross-correlations: need multivariate model for K variates? Beware of special cases and lack of fit.

Concluding Remarks GEV distribution used by many water agencies and countries to describe the distribution of extremes. L-moments provide simple estimators, but not efficient. Generalized Maximum Likelihood Estimators [GMLEs] (modest prior on  ) solve problems with MLEs and were the most precise. PDS (GPD-Poisson) no better than AMS (GEV) when estimating three parameters with GMLE.

Final Comments Regional regression procedures should account for precision of at-site estimators and their cross- correlations, as can be done with Generalized Least Squares regression Otherwise estimates of model accuracy and of precision of parameter estimates will be in error. When model error variance is small relative to errors in estimated hydrologic statistics, Bayesian model error variance estimator is particularly attractive.

Hosking and Wallis (1997) We can do better than simple index flood procedures that everywhere use regional average L-CV  2 and L-CS  3 values.

Conclusion: Applicability of GLS Developed Bayesian Generalized Least Squares modeling framework to analyze regional information addressing distribution parameters recognizing –Sampling error in at-site estimators as function of record length, cross-correlation of concurrent events, and concurrent record lengths, and –regional model error (true precision of regional model) Developed regression models for L-CV and L-CS for Korean annual maximum flood using B-GLS analysis

Background Reading Stedinger, J.R., Flood Frequency Analysis and Statistical Estimation of Flood Risk, Chapter 12, Inland Flood Hazards: Human, Riparian and Aquatic Communities, E.E. Wohl (ed.), Cambridge University Press, Stanford, United Kingdom, 2000. References Hosking, J. R. M., L-Moments: Analysis and Estimation of Distributions Using Linear Combinations of Order Statistics, J. of Royal Statistical Society, B, 52(2), 105-124, 1990. Hosking, J.R.M., and J.R. Wallis, Regional Frequency Analysis: An Approach Based on L- moments, Cambridge University Press, 1997. Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood GEV quantile estimators for hydrologic data, Water Resources Research. 36(3), 737-744, 2000. Martins, E.S., and J.R. Stedinger, Generalized Maximum Likelihood Pareto-Poisson Flood Risk Analysis for Partial Duration Series, Water Resources Research.37(10), 2559-2567, 2001. Stedinger, J. R., and L. Lu, Appraisal of Regional and Index Flood Quantile Estimators, Stochastic Hydrology and Hydraulics, 9(1), 49-75, 1995. Flood Frequency References

GLS References Griffis, V. W., and J. R. Stedinger, The Use of GLS Regression in Regional Hydrologic Analyses, J. of Hydrology, 344(1-2), 82-95, 2007 [doi:10.1016/j.jhydrol.2007.06.023]. Gruber, Andrea M., Dirceu S. Reis Jr., and Jery R. Stedinger, Models of Regional Skew Based on Bayesian GLS Regression, Paper 40927-3285, World Environ. & Water Resour. Conf. - Restoring our Natural Habitat, K.C. Kabbes editor, Tampa, FL, May 15-18, 2007. Jeong, Dae Il, Jery R. Stedinger, Young-Oh Kim, and Jang Hyun Sung, Bayesian GLS for Regionalization of Flood Characteristics in Korea, Paper 40927-2736, World Environ. & Water Resour. Conf. - Restoring our Natural Habitat, Tampa, FL, May 15-18, 2007. Martins, E.S., and J.R. Stedinger, Cross-correlation among estimators of shape, Water Resources Research, 38(11), doi: 10.1029/2002WR001589, 26 November 2002. Reis, D. S., Jr., J. R. Stedinger, and E. S. Martins, Bayesian generalized least squares regression with application to log Pearson type 3 regional skew estimation, Water Resour. Res., 41, W10419, doi:10.1029/2004WR003445, 2005. Stedinger, J.R., and G.D. Tasker, Regional Hydrologic Analysis, 1. Ordinary, Weighted and Generalized Least Squares Compared, Water Resour. Res., 21(9), 1421-1432, 1985. Tasker, G.D., and J.R. Stedinger, Estimating Generalized Skew With Weighted Least Squares Regression, J. of Water Resources Planning and Management, 112(2), 225-237, 1986. Tasker, G.D., and J.R. Stedinger, An Operational GLS Model for Hydrologic Regression, J. of Hydrology, 111(1-4), 361-375, 1989.

Pseudo R 2 for GLS Not interested in total error  that includes sampling error  which cannot explain. Traditional adjusted R 2 : How much of critical model error  can we explain, where Var [  ] =   (k) for model with k parameters? Consider the GLS model:

Pseudo ANOVA Table SourceDegrees of FreedomEstimator Modelk Model Error  n - k - 1 Sampling Error  n Total2n - 1

Modeling Diagnostics To evaluate whether OLS might be sufficient consider the Error Variance Ratio EVR. If EVR > 20%, then sampling error  in estimators of y are potentially an important fraction of the observed total error  = . Do we need WLS or GLS to correctly analyze this data?

Modeling Diagnostics EVR > 20% suggests a need for WLS or GLS. But when is cross-correlation so large that a GLS analysis is needed? Misrepresentation of Beta Variance (MBV) Describes error made by WLS in its evaluation of precision of estimator b 0 of the constant term.

OLS, WLS and GLS for L-CS Model Name Const.Ln(Area) Model Error Var. Average Sampling Var. AVP new Pseudo R 2 (%) ERL (years) OLS1 0.3679-0.0472 0.0221 0.00140.023516 21 (0.0267)(0.0181) B-WLS1 0.3792-0.04920.00590.00160.007431 65 (0.0261)(0.0206)(0.0047) B-GLS1 0.3405-0.05350.00600.00350.009437 51 (0.0489)(0.0188)(0.0044) Standard error in parentheses ( - ).

Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January.

Similar presentations

Presentation on theme: "Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January.

Similar presentations

Presentation on theme: "Jery R. Stedinger Cornell University Research with G. Tasker, E. Martins, D. Reis, A. Gruber, V. Griffis, D.I. Jeong and Y.O. Kim SAMSI Workshop 23 January."— Presentation transcript:

Similar presentations

About project

Feedback