Frequency Analysis Professor Ke-Sheng Cheng

Slides:



Advertisements
Similar presentations
1 Radio Maria World. 2 Postazioni Transmitter locations.
Advertisements

The Fall Messier Marathon Guide
Números.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
AGVISE Laboratories %Zone or Grid Samples – Northwood laboratory
Trend for Precision Soil Testing % Zone or Grid Samples Tested compared to Total Samples.
PDAs Accept Context-Free Languages
/ /17 32/ / /
Lecture 8: Hypothesis Testing
Reflection nurulquran.com.
EuroCondens SGB E.
Worksheets.
STATISTICS Joint and Conditional Distributions
STATISTICS Linear Statistical Models
STATISTICS HYPOTHESES TEST (III) Nonparametric Goodness-of-fit (GOF) tests Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
STATISTICS Random Variables and Probability Distributions
STATISTICS HYPOTHESES TEST (I)
STATISTICS INTERVAL ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Professor Ke-Sheng Cheng Dept. of Bioenvironmental Systems Engineering
STATISTICS POINT ESTIMATION Professor Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering National Taiwan University.
Detection of Hydrological Changes – Nonparametric Approaches
STATISTICS Univariate Distributions
STATISTICS Random Variables and Distribution Functions
Addition and Subtraction Equations
By John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
1 When you see… Find the zeros You think…. 2 To find the zeros...
Western Public Lands Grazing: The Real Costs Explore, enjoy and protect the planet Forest Guardians Jonathan Proctor.
Add Governors Discretionary (1G) Grants Chapter 6.
CALENDAR.
CHAPTER 18 The Ankle and Lower Leg
Summative Math Test Algebra (28%) Geometry (29%)
ASCII stands for American Standard Code for Information Interchange
The 5S numbers game..
突破信息检索壁垒 -SciFinder Scholar 介绍
A Fractional Order (Proportional and Derivative) Motion Controller Design for A Class of Second-order Systems Center for Self-Organizing Intelligent.
Sampling in Marketing Research
The basics for simulations
Factoring Quadratics — ax² + bx + c Topic
MM4A6c: Apply the law of sines and the law of cosines.
Chapter 16 Goodness-of-Fit Tests and Contingency Tables
Figure 3–1 Standard logic symbols for the inverter (ANSI/IEEE Std
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Dynamic Access Control the file server, reimagined Presented by Mark on twitter 1 contents copyright 2013 Mark Minasi.
TCCI Barometer March “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
Statistics Review – Part I
Hydrologic Statistics Reading: Chapter 11, Sections 12-1 and 12-2 of Applied Hydrology 04/04/2006.
Progressive Aerobic Cardiovascular Endurance Run
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
TCCI Barometer September “Establishing a reliable tool for monitoring the financial, business and social activity in the Prefecture of Thessaloniki”
When you see… Find the zeros You think….
2011 WINNISQUAM COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=1021.
Before Between After.
2011 FRANKLIN COMMUNITY SURVEY YOUTH RISK BEHAVIOR GRADES 9-12 STUDENTS=332.
Subtraction: Adding UP
Numeracy Resources for KS2
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Static Equilibrium; Elasticity and Fracture
ANALYTICAL GEOMETRY ONE MARK QUESTIONS PREPARED BY:
Resistência dos Materiais, 5ª ed.
Lial/Hungerford/Holcomb/Mullins: Mathematics with Applications 11e Finite Mathematics with Applications 11e Copyright ©2015 Pearson Education, Inc. All.
Biostatistics course Part 14 Analysis of binary paired data
9. Two Functions of Two Random Variables
Chart Deception Main Source: How to Lie with Charts, by Gerald E. Jones Dr. Michael R. Hyman, NMSU.
1 Non Deterministic Automata. 2 Alphabet = Nondeterministic Finite Accepter (NFA)
Schutzvermerk nach DIN 34 beachten 05/04/15 Seite 1 Training EPAM and CANopen Basic Solution: Password * * Level 1 Level 2 * Level 3 Password2 IP-Adr.
Stochastic Hydrology Hydrological Frequency Analysis (II) LMRD-based GOF tests Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Stochastic Hydrology Hydrological Frequency Analysis (I) Fundamentals of HFA Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
Stochastic Hydrology Fundamentals of Hydrological Frequency Analysis
Presentation transcript:

Frequency Analysis Professor Ke-Sheng Cheng Dept. of Bioenvironmental Systems Engineering National Taiwan University

General interpretation of hydrological frequency analysis Hydrological frequency analysis is the work of determining the magnitude of hydrological variables that corresponds to a given probability of exceedance. Frequency analysis can be conducted for many hydrological variables including floods, rainfalls, and droughts. The work can be better perceived by treating the interested variable as a random variable.

Let X represent the hydrological (random) variable under investigation Let X represent the hydrological (random) variable under investigation. A value xc associating to some event is chosen such that if X assumes a value exceeding xc the event is said to occur. Every time when a random experiment (or a trial) is conducted the event may or may not occur. We are interested in the number of Bernoulli trials in which the first success occur. This can be described by the geometric distribution.

Geometric distribution Geometric distribution represents the probability of obtaining the first success in x independent and identical Bernoulli trials.

Recurrence interval vs return period Average number of trials to achieve the first success. Recurrence interval vs return period

The frequency factor equation

It is apparent that calculation of involves determining the type of distribution for X and estimation of its mean and standard deviation. The former can be done by GOF test and the latter is accomplished by parametric point estimation. Collecting required data. Determining appropriate distribution. Estimating the mean and standard deviation. Calculating xT using the general eq.

Data series used for frequency analysis Complete duration series A complete duration series consists of all the observed data. Partial duration series A partial duration series is a series of data which are selected so that their magnitude is greater than a predefined base value. If the base value is selected so that the number of values in the series is equal to the number of years of the record, the series is called an “annual exceedance series”.

Extreme value series Data independency An extreme value series is a data series that includes the largest or smallest values occurring in each of the equally-long time intervals of the record. If the time interval is taken as one year and the largest values are used, then we have an “annual maximum series”. Data independency Why is it important?

Techniques for goodness-of-fit test A good reference for detailed discussion about GOF test is: Goodness-of-fit Techniques. Edited by R.B. D’Agostino and M.A. Stephens, 1986. Probability plotting Chi-square test Kolmogorov-Smirnov Test Moment-ratios diagram method L-moments based GOF tests

Rainfall frequency analysis Consider event total rainfall at a location. What is a storm event? Parameters related to partition of storm events Minimum inter-event-time A threshold value for rainfall depth

Total depths of storm events Total rainfall depth of a storm event varies with its storm duration. [A bivariate distribution for (D, tr).] For a given storm duration tr, the total depth D(tr) is considered as a random variable and its magnitudes corresponding to specific exceedance probabilities are estimated. [Conditional distribution] In general,

Probabilistic Interpretation of the Design Storm Depth

Random Sample For Estimation of Design Storm Depth The design storm depth of a specified duration with return period T is the value of D(tr) with the probability of exceedance equals  /T. Estimation of the design storm depth requires collecting a random sample of size n, i.e., {x1, x2, …, xn}. A random sample is a collection of independently observed and identically distributed (IID) data.

Annual Maximum Series Data in an annual maximum series are considered IID and therefore form a random sample. For a given design duration tr, we continuously move a window of size tr along the time axis and select the maximum total values within the window in each year. Determination of the annual maximum rainfall is NOT based on the real storm duration; instead, a design duration which is artificially picked is used for this purpose.

Fitting A Probability Distribution to Annual Maximum Series How do we fit a probability distribution to a random sample? What type of distribution should be adopted? What are the parameter values for the distribution? How good is our fit?

Chi-square GOF test

Kolmogorov-Smirnov GOF test The chi-square test compares the empirical histogram against the theoretical histogram. In contrast, the K-S test compares the empirical cumulative distribution function (ECDF) against the theoretical CDF.

In order to measure the difference between Fn(X) and F(X), ECDF statistics based on the vertical distances between Fn(X) and F(X) have been proposed.

Hypothesis test using Dn

Values of for the Kolmogorov-Smirnov test

GOF test using L-moment-ratios diagram (LMRD) Concept of identifying appropriate distributions using moment-ratio diagrams (MRD). Product-moment-ratio diagram (PMRD) L-moment-ratio diagram (LMRD) Two-parameter distributions Normal, Gumbel (EV-1), etc. Three-parameter distributions Log-normal, Pearson type III, GEV, etc.

Moment ratios are unique properties of probability distributions and sample moment ratios of ordinary skewness and kurtosis have been used for selection of probability distribution. The L-moments uniquely define the distribution if the mean of the distribution exists, and the L-skewness and L-kurtosis are much less biased than the ordinary skewness and kurtosis.

A two-parameter distribution with a location and a scale parameter plots as a single point on the LMRD, whereas a three-parameter distribution with location, scale and shape parameters plots as a curve on the LMRD, and distributions with more than one shape parameter generally are associated with regions on the diagram. However, theoretical points or curves of various probability distributions on the LMRD cannot accommodate for uncertainties induced by parameter estimation using random samples.

Ordinary (or product) moment-ratios diagram (PMRD)

The ordinary (or product) moment ratios diagram

Sample estimates of product moment ratios

(D'Agostino and Stephens, 1986) 95% 90%

Even though joint distribution of the ordinary sample skewness and sample kurtosis is asymptotically normal, such asymptotic property is a poor approximation in small and moderately samples, particularly when the underlying distribution is even moderately skew.

Scattering of sample moment ratios of the normal distribution (100,000 random samples)

L-moments and the L-moment ratios diagram

L-moment-ratio diagram of various distributions

Sample estimates of L-moment ratios (probability weighted moment estimators)

Sample estimates of L-moment ratios (plotting-position estimators)

Hosking and Wallis (1997) indicated that is not an unbiased estimator of , but its bias tends to zero in large samples. and are respectively referred to as the probability-weighted-moment estimator and the plotting-position estimator of the L-moment ratio .

Establishing acceptance region for L-moment ratios The standard normal and standard Gumbel distributions (zero mean and unit standard deviation) are used to exemplify the approach for construction of acceptance regions for L-moment ratio diagram. L-moment-ratios ( , ) of the normal and Gumbel distributions are respectively (0, 0.1226) and (0.1699, 0.1504).

Stochastic simulation of the normal and Gumbel distributions For either of the standard normal and standard Gumbel distribution, a total of 100,000 random samples were generated with respect to the specified sample size20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000. For each of the 100,000 samples, sample L-skewness and L-kurtosis were calculated using the probability-weighted-moment estimator and the plotting-position estimator.

Scattering of sample L-moment ratios Normal distribution (100,000 random samples)

Normal distribution ? (100,000 random samples)

Non-normal distribution ! 95% acceptance region 99% acceptance region Non-normal distribution ! (100,000 random samples)

Scattering of sample L-moment ratios Gumbel distribution (100,000 random samples)

(100,000 random samples)

(100,000 random samples)

For both distribution types, the joint distribution of sample L-skewness and L-kurtosis seem to resemble a bivariate normal distribution for a larger sample size (n = 100). However, for sample size n = 20, the joint distribution of sample L-skewness and L-kurtosis seems to differ from the bivariate normal. Particularly for Gumbel distribution, sample L-moments of both estimators are positively skewed.

For smaller sample sizes (n = 20 and 50), the distribution cloud of sample L-moment-ratios estimated by the plotting-position method appears to have its center located away from ( , ), an indication of biased estimation. However, for sample size n = 100, the bias is almost unnoticeable, suggesting that the bias in L-moment-ratio estimation using the plotting-position estimator is negligible for larger sample sizes.

In contrast, the distribution cloud of the sample L-moment-ratios estimated by the probability-weighted-moment method appears to have its center almost coincide with ( , ).

Bias of sample L-skewness and L-kurtosis - Normal distribution

Bias of sample L-skewness and L-kurtosis - Gumbel distribution

Mardia test for bivariate normality of sample L-skewness and L-kurtosis

Mardia test for bivariate normality of sample L-skewness and L-kurtosis

Mardia test for bivariate normality of sample L-skewness and L-kurtosis

It appears that the assumption of bivariate normal distribution for sample L-skewness and L-kurtosis of both distributions is valid for moderate to large sample sizes. However, for random samples of normal distribution with sample size , the bivariate normal assumption may not be adequate. Similarly, the bivariate normal assumption for sample L-skewness and L-kurtosis of the Gumbel distribution may not be adequate for sample size .

Establishing acceptance regions for LMRD-based GOF tests For moderate to large sample sizes, the sample L-skewness and L-kurtosis of both the normal and Gumbel distributions have asymptotic bivariate normal distributions. Using this property, the acceptance region of a GOF test based on sample L-skewness and L-kurtosis can be determined by the equiprobable density contour of the bivariate normal distribution with its encompassing area equivalent to .

The probability density function of a multivariate normal distribution is generally expressed by The probability density function depends on the random vector X only through the quadratic form which has a chi-square distribution with p degrees of freedom.

Therefore, probability density contours of a multivariate normal distribution can be expressed by for any constant . For a bivariate normal distribution (p=2) the above equation represents an equiprobable ellipse, and a set of equiprobable ellipses can be constructed by assigning to c for various values of .

Consequently, the acceptance region of a GOF test based on the sample L-skewness and L-kurtosis is expressed by where is the upper quantile of the distribution at significance level .

For bivariate normal random vector , the density contour of can also be expressed as However, the expected values and covariance matrix of sample L-skewness and L-kurtosis are unknown and can only be estimated from random samples generated by stochastic simulation.

Thus, in construction of the equiprobable ellipses, population parameters must be respectively replaced by their sample estimates . The Hotelling’s T2 statistic

The Hotelling’s T2 is distributed as a multiple of an F-distribution, i.e., For large N, Therefore, the distribution of the Hotelling’s T2 can be well approximated by the chi-square distribution with degree of freedom 2.

Thus, if the sample L-moments of a random sample of size n falls outside of the corresponding ellipse, i.e. the null hypothesis that the random sample is originated from a normal or Gumbel distribution is rejected.

Scattering of sample L-moment ratios Normal distribution (100,000 random samples)

Normal distribution ? (100,000 random samples)

Variation of 95% acceptance regions with respect to sample size n Non-normal distribution ! What if n=36? (100,000 random samples)

Empirical relationships between parameters of acceptance regions and sample size Since the 95% acceptance regions of the proposed GOF tests are dependent on the sample size n, it is therefore worthy to investigate the feasibility of establishing empirical relationships between the 95% acceptance region and the sample size. Such empirical relationships can be established using the following regression model

Empirical relationships between the sample size and parameters of the bivariate distribution of sample L-skewness and L-kurtosis

Empirical relationships between the sample size and parameters of the bivariate distribution of sample L-skewness and L-kurtosis

Example Suppose that a random sample of size n = 44 is available, and the plotting-position sample L-skewness and L-kurtosis are calculated as ( , ) = (0.214, 0.116). We want to test whether the sample is originated from the Gumbel distribution.

From the regression models for plotting-position estimators, we find to be respectively 0.1784, 0.1369, 0.005119, 0.002924, and 0.6039. The Hotelling’s T2 is then calculated as 0.9908. The value of T2 is much smaller than the threshold value

The null hypothesis that the random sample is originated from the Gumbel distribution is not rejected.

95% acceptance regions of L-moments-based GOF test for the normal distribution Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).

Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).

95% acceptance regions of L-moments-based GOF test for the Gumbel distribution Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).

Acceptance ellipses correspond to various sample sizes (n = 20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000).

Validity check of the LMRD acceptance regions The sample-size-dependent confidence intervals established using empirical relationships described in the last section are further checked for their validity. This is done by stochastically generating 10,000 random samples for both the standard normal and Gumbel distributions, with sample size20, 30, 40, 50, 60, 75, 100, 150, 250, 500, and 1,000.

For validity of the sample-size-dependent 95% acceptance regions, the rejection rate should be very close to the level of significance ( 0.05) or the acceptance rate be very close to 0.95.

Acceptance rate of the validity check for sample-size-dependent 95% acceptance regions of sample L-skewness and L-kurtosis pairs. Based on 10,000 random samples for any given sample size n.

End of this session.