Stochastic Hydrology Fundamentals of Hydrological Frequency Analysis Professor Ke-sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
General concept of hydrological frequency analysis Hydrological frequency analysis is the work of determining the magnitude of hydrological variables that corresponds to a given exceedance probability. Frequency analysis can be conducted for many hydrological variables including floods, rainfalls, and droughts. The work can be better understood by treating the interested variable as a random variable. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Let X represent the hydrological (random) variable under investigation Let X represent the hydrological (random) variable under investigation. A value xc is chosen such that an event is said to occur if X assumes a value exceeding xc. Every time when a random experiment (or a trial) is conducted the event may or may not occur. We are interested in the number of Bernoulli trials in which the first success occur. This can be described by the geometric distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Geometric distribution Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Recurrence interval vs return period Average number of trials to achieve the first success. Recurrence interval vs return period 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
The general equation of frequency analysis 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Collecting required data. Estimating the mean, standard deviation and coefficient of skewness. Determining appropriate distribution. Calculating xT using the general eq. It is apparent that calculation of involves determining the type of distribution for X and estimation of its mean and standard deviation. The former can be done by GOF tests and the latter is accomplished by parametric point estimation. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Data series for frequency analysis Complete duration series A complete duration series consists of all the observed data. Partial duration series A partial duration series is a series of data which are selected so that their magnitude is greater than a predefined base value. If the base value is selected so that the number of values in the series is equal to the number of years of the record, the series is called an “annual exceedance series”. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Peak-over-threshold series Data independency Extreme value series An extreme value series is a data series that includes the largest or smallest values occurring in each of the equally- long time intervals of the record. If the time interval is taken as one year and the largest values are used, then we have an “annual maximum series”. Annual exceedance series and annual maximum series are different. Peak-over-threshold series Data independency Why is it important? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Parameter estimation Method of moments Maximum likelihood method Method of L-moments (Gaining more attention in recent years) Depending on the distribution types, parameter estimation may involve estimation of the mean, standard deviation and/or coefficient of skewness. Parameter estimation exemplified by the gamma distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Gamma distribution parameter estimation Gamma distribution is a special case of the Pearson type III distribution (with zero location parameter). Gamma density where , , and are the mean, standard deviation, and coefficient of skewness of X (or Y), respectively, and and are respectively the scale and shape parameters of the gamma distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
MOM estimators 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Maximum likelihood estimator 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Evaluating bias of different estimators of coefficient of skewness 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Evaluating mean square error of different estimators of coefficient of skewness 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Techniques for goodness-of-fit test A good reference for detailed discussion about GOF test is: Goodness-of-fit Techniques. Edited by R.B. D’Agostino and M.A. Stephens, 1986. Probability plotting Chi-square test Kolmogorov-Smirnov Test Moment-ratios diagram method L-moments based GOF tests 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Probability plotting Fundamental concept Probability papers Empirical CDF vs theoretical CDF Misuse of probability plotting 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Suppose the true underlying distribution depends on a location parameter and a scale parameter (they need not to be the mean and standard deviation, respectively). The CDF of such a distribution can be written as where Z is referred to as the standardized variable and G(z) is the CDF of Z. If the random sample is truly from a cumulative distribution F(X), then Z=G-1(F(X)) and X are linearly related. In practice, Z can be found by using Z=G-1(Fn(X)). 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
where x represents the observed values of the random variable X. Also let Fn(X) represents the empirical cumulative distribution function (ECDF) of X based on a random sample of size n. A probability plot is a plot of on x where x represents the observed values of the random variable X. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Most of the plotting position methods are empirical Most of the plotting position methods are empirical. If n is the total number of values to be plotted and m is the rank of a value in a list ordered by descending magnitude, the exceedence probability of the mth largest value, xm, is , for large n, shown in the following table. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Misuse of probability plotting Log Pearson Type III ? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Misuse of probability plotting 48-hr rainfall depth Log Pearson Type III ? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Fitting a probability distribution to annual maximum series (Non-parametric GOF tests) How do we fit a probability distribution to a random sample? What type of distribution should be adopted? What are the parameter values for the distribution? How good is our fit? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Chi-square GOF test 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Chi-square Goodness-of-fit test in R 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Kolmogorov-Smirnov GOF test The chi-square test compares the empirical histogram against the theoretical histogram. By contrast, the K-S test compares the empirical cumulative distribution function (ECDF) against the theoretical CDF. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
In order to measure the difference between Fn(X) and F(X), ECDF statistics based on the vertical distances between Fn(X) and F(X) have been proposed. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Stochastic convergence Almost-sure convergence or Convergence with probability 1 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Hypothesis test using Dn 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Values of for the Kolmogorov-Smirnov test 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
K-S Goodness-of-fit test in R (ks.test) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Interpretation of the probability distribution of the test statistic 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
IDF curve fitting using the Horner’s equation The intensity-duration-frequency (IDF) relationship of the design storm depths 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
DDF curves 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
IDF curves 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Alternative IDF fitting (Return-period specific) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Further discussions on frequency analysis Extracting annual maximum series Probabilistic interpretation of the design total depth Joint distribution of duration and total depth Selection of the best-fit distribution 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Annual maximum series Data in an annual maximum series are considered IID and therefore form a random sample. For a given design duration tr, we continuously move a window of size tr along the time axis and select the maximum total values within the window in each year. Determination of the annual maximum rainfall is NOT based on the real storm duration; instead, a design duration which is artificially picked is used for this purpose. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Random sample for estimation of design storm depth The design storm depth of a specified duration with return period T is the value of D(tr) with the probability of exceedance equals /T. Estimation of the design storm depth requires collecting a random sample of size n, i.e., {x1, x2, …, xn}. A random sample is a collection of independently observed and identically distributed (IID) data. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Probabilistic interpretation of the design storm depth 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
It should also be noted that since the total depth in the depth-duration- frequency relationship only represents the total amount of rainfall of the design duration (not the real storm duration), the probability distributions in the preceding figure do not represent distributions of total depth of real storm events. Or, more specifically, the preceding figure does not represent the bivariate distribution of duration and total depth of real storm events. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
The usage of annual maximum series for rainfall frequency analysis is more of an intelligent and convenient engineering practice and the annual maximum data do not provide much information about the characteristics of the duration and total depth of real storm events. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Joint distribution of the total depth and duration Total rainfall depth of a storm event varies with its storm duration. [A bivariate distribution for (D, tr).] For a given storm duration tr, the total depth D(tr) is considered as a random variable and its magnitudes corresponding to specific exceedance probabilities are estimated. [Conditional distribution] In general, 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Selection of the best-fit distribution Methods of model selection based on loss of information. Akaike information criterion (AIC) Schwarz's Bayesian information criterion (BIC) Hannan-Quinn information criterion (HQIC) Anderson-Darling criterion (ADC) Common practices of WRA-Taiwan SE and U SSE and SE 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Information-criteria-based model selection where is the log-likelihood function for the parameter associated with the model, n is the sample size, and p is the dimension of the parametric space. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
WRA Practice p: Number of distribution parameters Weibull plotting position formula is used for calculation of cumulative probability. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Model selection based on information criteria using R The nsRFA package MSClaio2008(x) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
MSClaio2008 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Indicatively, AICc should be used when (n/p) < 40. When the sample size, n, is small, with respect to the number of estimated parameters, p, the AIC may perform inadequately. In those cases a second- order variant of AIC, called AICc, should be used: Indicatively, AICc should be used when (n/p) < 40. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Rationale of the information criteria The Akaike information criterion uses the Kullback-Leibler divergence as the discrepancy measure between the true model f(x) and the approximating model g(x). Information and entropy 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
What is information? Consider the following statements: I will eat some food tomorrow. A major earthquake will strike Taiwan tomorrow. Which statement conveys more information? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Definition of entropy 侯如真,2001. 訊息熵應用於雨量站網設計之理論探討。國立臺灣大學農業工程學研究所碩 士論文。 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Kullback-Leibler Divergence 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
where pj is the number of parameters of the jth model. If there are several candidate distributions, we only need to calculate H(X|qi(X)) since H(X|p(X)) is a constant. In practical applications, the above term is estimated as (Akaike, 1973) where pj is the number of parameters of the jth model. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University