Download presentation
Presentation is loading. Please wait.
Published byClifton Lang Modified over 5 years ago
1
Stochastic Hydrology Fundamentals of Hydrological Frequency Analysis
Professor Ke-sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
2
General concept of hydrological frequency analysis
Hydrological frequency analysis is the work of determining the magnitude of hydrological variables that corresponds to a given exceedance probability. Frequency analysis can be conducted for many hydrological variables including floods, rainfalls, and droughts. The work can be better understood by treating the interested variable as a random variable. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
3
Let X represent the hydrological (random) variable under investigation
Let X represent the hydrological (random) variable under investigation. A value xc is chosen such that an event is said to occur if X assumes a value exceeding xc. Every time when a random experiment (or a trial) is conducted the event may or may not occur. We are interested in the number of Bernoulli trials in which the first success occur. This can be described by the geometric distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
4
Geometric distribution
Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
5
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
6
Recurrence interval vs return period
Average number of trials to achieve the first success. Recurrence interval vs return period 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
7
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
8
The general equation of frequency analysis
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
9
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
10
Collecting required data.
Estimating the mean, standard deviation and coefficient of skewness. Determining appropriate distribution. Calculating xT using the general eq. It is apparent that calculation of involves determining the type of distribution for X and estimation of its mean and standard deviation. The former can be done by GOF tests and the latter is accomplished by parametric point estimation. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
11
Data series for frequency analysis
Complete duration series A complete duration series consists of all the observed data. Partial duration series A partial duration series is a series of data which are selected so that their magnitude is greater than a predefined base value. If the base value is selected so that the number of values in the series is equal to the number of years of the record, the series is called an “annual exceedance series”. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
12
Peak-over-threshold series Data independency
Extreme value series An extreme value series is a data series that includes the largest or smallest values occurring in each of the equally- long time intervals of the record. If the time interval is taken as one year and the largest values are used, then we have an “annual maximum series”. Annual exceedance series and annual maximum series are different. Peak-over-threshold series Data independency Why is it important? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
13
Parameter estimation Method of moments Maximum likelihood method
Method of L-moments (Gaining more attention in recent years) Depending on the distribution types, parameter estimation may involve estimation of the mean, standard deviation and/or coefficient of skewness. Parameter estimation exemplified by the gamma distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
14
Gamma distribution parameter estimation
Gamma distribution is a special case of the Pearson type III distribution (with zero location parameter). Gamma density where , , and are the mean, standard deviation, and coefficient of skewness of X (or Y), respectively, and and are respectively the scale and shape parameters of the gamma distribution. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
15
MOM estimators 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
16
Maximum likelihood estimator
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
17
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
18
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
19
Evaluating bias of different estimators of coefficient of skewness
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
20
Evaluating mean square error of different estimators of coefficient of skewness
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
21
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
22
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
23
Techniques for goodness-of-fit test
A good reference for detailed discussion about GOF test is: Goodness-of-fit Techniques. Edited by R.B. D’Agostino and M.A. Stephens, 1986. Probability plotting Chi-square test Kolmogorov-Smirnov Test Moment-ratios diagram method L-moments based GOF tests 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
24
Probability plotting Fundamental concept
Probability papers Empirical CDF vs theoretical CDF Misuse of probability plotting 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
25
Suppose the true underlying distribution depends on a location parameter and a scale parameter (they need not to be the mean and standard deviation, respectively). The CDF of such a distribution can be written as where Z is referred to as the standardized variable and G(z) is the CDF of Z. If the random sample is truly from a cumulative distribution F(X), then Z=G-1(F(X)) and X are linearly related. In practice, Z can be found by using Z=G-1(Fn(X)). 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
26
where x represents the observed values of the random variable X.
Also let Fn(X) represents the empirical cumulative distribution function (ECDF) of X based on a random sample of size n. A probability plot is a plot of on x where x represents the observed values of the random variable X. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
27
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
28
Most of the plotting position methods are empirical
Most of the plotting position methods are empirical. If n is the total number of values to be plotted and m is the rank of a value in a list ordered by descending magnitude, the exceedence probability of the mth largest value, xm, is , for large n, shown in the following table. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
29
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
30
Misuse of probability plotting
Log Pearson Type III ? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
31
Misuse of probability plotting
48-hr rainfall depth Log Pearson Type III ? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
32
Fitting a probability distribution to annual maximum series (Non-parametric GOF tests)
How do we fit a probability distribution to a random sample? What type of distribution should be adopted? What are the parameter values for the distribution? How good is our fit? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
33
Chi-square GOF test 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
34
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
35
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
36
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
37
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
38
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
39
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
40
Chi-square Goodness-of-fit test in R
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
41
Kolmogorov-Smirnov GOF test
The chi-square test compares the empirical histogram against the theoretical histogram. By contrast, the K-S test compares the empirical cumulative distribution function (ECDF) against the theoretical CDF. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
42
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
43
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
44
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
45
In order to measure the difference between Fn(X) and F(X), ECDF statistics based on the vertical distances between Fn(X) and F(X) have been proposed. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
46
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
47
Stochastic convergence Almost-sure convergence or
Convergence with probability 1 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
48
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
49
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
50
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
51
Hypothesis test using Dn
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
52
Values of for the Kolmogorov-Smirnov test
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
53
K-S Goodness-of-fit test in R (ks.test)
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
54
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
55
Interpretation of the probability distribution of the test statistic
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
56
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
57
IDF curve fitting using the Horner’s equation
The intensity-duration-frequency (IDF) relationship of the design storm depths 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
58
DDF curves 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
59
IDF curves 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
60
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
61
Alternative IDF fitting (Return-period specific)
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
62
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
63
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
64
Further discussions on frequency analysis
Extracting annual maximum series Probabilistic interpretation of the design total depth Joint distribution of duration and total depth Selection of the best-fit distribution 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
65
Annual maximum series Data in an annual maximum series are considered IID and therefore form a random sample. For a given design duration tr, we continuously move a window of size tr along the time axis and select the maximum total values within the window in each year. Determination of the annual maximum rainfall is NOT based on the real storm duration; instead, a design duration which is artificially picked is used for this purpose. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
66
Random sample for estimation of design storm depth
The design storm depth of a specified duration with return period T is the value of D(tr) with the probability of exceedance equals /T. Estimation of the design storm depth requires collecting a random sample of size n, i.e., {x1, x2, …, xn}. A random sample is a collection of independently observed and identically distributed (IID) data. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
67
Probabilistic interpretation of the design storm depth
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
68
It should also be noted that since the total depth in the depth-duration- frequency relationship only represents the total amount of rainfall of the design duration (not the real storm duration), the probability distributions in the preceding figure do not represent distributions of total depth of real storm events. Or, more specifically, the preceding figure does not represent the bivariate distribution of duration and total depth of real storm events. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
69
The usage of annual maximum series for rainfall frequency analysis is more of an intelligent and convenient engineering practice and the annual maximum data do not provide much information about the characteristics of the duration and total depth of real storm events. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
70
Joint distribution of the total depth and duration
Total rainfall depth of a storm event varies with its storm duration. [A bivariate distribution for (D, tr).] For a given storm duration tr, the total depth D(tr) is considered as a random variable and its magnitudes corresponding to specific exceedance probabilities are estimated. [Conditional distribution] In general, 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
71
Selection of the best-fit distribution
Methods of model selection based on loss of information. Akaike information criterion (AIC) Schwarz's Bayesian information criterion (BIC) Hannan-Quinn information criterion (HQIC) Anderson-Darling criterion (ADC) Common practices of WRA-Taiwan SE and U SSE and SE 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
72
Information-criteria-based model selection
where is the log-likelihood function for the parameter associated with the model, n is the sample size, and p is the dimension of the parametric space. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
73
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
74
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
75
WRA Practice p: Number of distribution parameters
Weibull plotting position formula is used for calculation of cumulative probability. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
76
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
77
Model selection based on information criteria using R
The nsRFA package MSClaio2008(x) 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
78
MSClaio2008 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
79
Indicatively, AICc should be used when (n/p) < 40.
When the sample size, n, is small, with respect to the number of estimated parameters, p, the AIC may perform inadequately. In those cases a second- order variant of AIC, called AICc, should be used: Indicatively, AICc should be used when (n/p) < 40. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
80
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
81
Rationale of the information criteria
The Akaike information criterion uses the Kullback-Leibler divergence as the discrepancy measure between the true model f(x) and the approximating model g(x). Information and entropy 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
82
What is information? Consider the following statements:
I will eat some food tomorrow. A major earthquake will strike Taiwan tomorrow. Which statement conveys more information? 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
83
Definition of entropy 侯如真,2001. 訊息熵應用於雨量站網設計之理論探討。國立臺灣大學農業工程學研究所碩 士論文。
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
84
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
85
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
86
Kullback-Leibler Divergence
8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
87
where pj is the number of parameters of the jth model.
If there are several candidate distributions, we only need to calculate H(X|qi(X)) since H(X|p(X)) is a constant. In practical applications, the above term is estimated as (Akaike, 1973) where pj is the number of parameters of the jth model. 8/3/2019 Dept. of Bioenvironmental Systems Engineering, National Taiwan University
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.