Concepts in Probability, Statistics and Stochastic Modeling Loucks et al., 2005, Chapter 7 Learning Objective Be able to use probability and statistics to quantify uncertainty and natural variability in physical quantities
How Express a Distribution Cumulative Density Probability Density Which method conveys the information best to you? Equation Probability Plot
Carl Friedrich Gauß, immortalized
A random variable X is a variable whose outcomes (values) are governed by the laws of chance. Probability density function
Cumulative distribution function
Continuous and Discrete Random Variables From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804
Generating a random variable from a given distribution F(U) F(X) U X Generate U from a uniform distribution between 0 and 1 Solve for X=F-1(U) Basis P(X<x)=P(U<F(x))=P(F-1(U)<x) F-1(U) is randomly distributed with CDF F(x)
Generating a Pseudo random number There is a lot of lore about this. Refer to: Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, (1988), Numerical Recipes in C : The Art of Scientific Computing, Cambridge University Press, New York, 735 p. Congruential method Each r is an integer random number between 0 and m-1. by (m-1) gives a number between 0 and 1 that repeats after at most m numbers. Numerical recipes gives "good" choices for a, c and m. R has built in functions runif to generate uniform random numbers, as well as other distributions, e.g rnorm, rgamma.
Moments of Random Variables
L-Moments
Probability weighted moments
L-moment estimators
L-Moment Diagrams From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804
If needed. PLOT THE DISTRIBUTION OF THESE IN R AS A LIVE DEMO!! USETHE NIST ONLINE MANUAL AS A GUIDE TO THE RANGE ON THE X-AXIS AND THE SHAPE PARAMETER. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p.
If needed. PLOT THE DISTRIBUTION OF THESE IN R AS A LIVE DEMO!! USETHE NIST ONLINE MANUAL AS A GUIDE TO THE RANGE ON THE X-AXIS AND THE SHAPE PARAMETER. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p.
Fitting a probability distribution to data Hillsborough River at Zephyr Hills, September flows = 8621 mgal S = 8194 mgal n = 31 mgal
Method of Moments Using the sample moments as the estimate for the population parameters
Method of Moments Gamma distribution =1.1 =1.3 x 10-3
Method of Moments Log-Normal distribution =0.643 =8.29
Method of Maximum Likelihood “Back into” the estimate by assuming the parameters we are trying to estimate from the data are known. How likely are the sample values we have, given a certain set of parameter values? We can express this as the joint density of the random sample given the parameter value. After we obtain the data (random sample), we use the joint density to define the Likelihood function. Say… each data point is treated as an indep sample from the prob dist. For a given distribution, what is
Likelihood ln(L)= -311 (for gamma) ln(L)= -312 (for log normal) Could use maximization of L or ln(L) to select parameters rather than fitting moments
Normalization Much theory relies on the central limit theorem so applies to Normal Distributions Where the data is not normally distributed normalizing transformations are used Log Box Cox (Log is a special case of Box Cox)
Box-Cox Normalization The Box-Cox family of transformations that includes the logarithmic transformation as a special case (l=0). It is defined as: z = (x -1)/ ; 0 z = ln(x); = 0 where z is the transformed data, x is the original data and is the transformation parameter.
Box-Cox Normalization So… the log looked OK ( = 0). Is that what we really want? Let’s skip the derivations for now and look at the answer for our three proposed methods.
Determining Transformation Parameters Trial and error: apply a series of trial lambda values and evaluate statistic. PPCC (Filliben’s Statistic): R2 of best fit line of the QQplot Kolomgorov-Smirnov (KS) Test (any distribution): p-value Shapiro-Wilks Test for Normality: p-value
Quantiles Rank the data Theoretical distribution, e.g. Standard Normal x1 x2 x3 . xn pi qi qi is the distribution specific theoretical quantile associated with ranked data value xi
Quantile-Quantile Plots QQ-plot for Raw Flows QQ-plot for Log-Transformed Flows ln(xi) qi xi qi Need transformation to make the Raw flows Normally distributed.
Example: Determining Transformation Parameters Alafia River historical monthly flows Evaluate using all three criteria Test a range of lambda values from -2 to 2 by 0.1 for Filliben’s and KS Test a range of lambda values from -1 to 1 by 0.1 for Shapiro-Wilks (errors for larger lambda values).
Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using PPCC This is close to 0, = -0.14
Kolmogorov-Smirnov Test Specifically, it computes the largest difference between the target CDF FX(x) and the observed CDF, F*(X). The test statistic D2 is: where X(i) is the ith largest observed value in the random sample of size n.
Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using Kolmogorov-Smirnov (KS) Statistic This is not as close to 0, = -0.39
http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/wilkshap.htm
Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using Shapiro-Wilks Statistic This is close to 0, = -0.14. Same as PPCC.