Concepts in Probability, Statistics and Stochastic Modeling

Loucks et al., 2005, Chapter 7 Learning Objective Be able to use probability and statistics to quantify uncertainty and natural variability in physical quantities

2 How Express a Distribution
Cumulative Density Probability Density Which method conveys the information best to you? Equation Probability Plot

3 Carl Friedrich Gauß, immortalized

4 A random variable X is a variable whose outcomes (values) are governed by the laws of chance.
Probability density function

5 Cumulative distribution function

6 Continuous and Discrete Random Variables
From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p,

7 Generating a random variable from a given distribution
F(U) F(X) U X Generate U from a uniform distribution between 0 and 1 Solve for X=F-1(U) Basis P(X<x)=P(U<F(x))=P(F-1(U)<x) F-1(U) is randomly distributed with CDF F(x)

8 Generating a Pseudo random number
There is a lot of lore about this. Refer to: Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, (1988), Numerical Recipes in C : The Art of Scientific Computing, Cambridge University Press, New York, 735 p. Congruential method Each r is an integer random number between 0 and m by (m-1) gives a number between 0 and 1 that repeats after at most m numbers. Numerical recipes gives "good" choices for a, c and m. R has built in functions runif to generate uniform random numbers, as well as other distributions, e.g rnorm, rgamma.

9 Moments of Random Variables

10 L-Moments

11 Probability weighted moments

12 L-moment estimators

13 L-Moment Diagrams From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p,

USETHE NIST ONLINE MANUAL AS A GUIDE TO THE RANGE ON THE X-AXIS AND THE SHAPE PARAMETER. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p.

16 Fitting a probability distribution to data
Hillsborough River at Zephyr Hills, September flows = 8621 mgal S = 8194 mgal n = 31 mgal

17 Method of Moments Using the sample moments as the estimate for the population parameters

18 Method of Moments Gamma distribution =1.1 =1.3 x 10-3

19 Method of Moments Log-Normal distribution =0.643 =8.29

20 Method of Maximum Likelihood
“Back into” the estimate by assuming the parameters we are trying to estimate from the data are known. How likely are the sample values we have, given a certain set of parameter values? We can express this as the joint density of the random sample given the parameter value. After we obtain the data (random sample), we use the joint density to define the Likelihood function. Say… each data point is treated as an indep sample from the prob dist. For a given distribution, what is

21 Likelihood ln(L)= -311 (for gamma) ln(L)= -312 (for log normal)
Could use maximization of L or ln(L) to select parameters rather than fitting moments

22 Normalization Much theory relies on the central limit theorem so applies to Normal Distributions Where the data is not normally distributed normalizing transformations are used Log Box Cox (Log is a special case of Box Cox)

23 Box-Cox Normalization
The Box-Cox family of transformations that includes the logarithmic transformation as a special case (l=0). It is defined as: z = (x -1)/ ;   0 z = ln(x);  = 0 where z is the transformed data, x is the original data and  is the transformation parameter.

24 Box-Cox Normalization
So… the log looked OK ( = 0). Is that what we really want? Let’s skip the derivations for now and look at the answer for our three proposed methods.

25 Determining Transformation Parameters
Trial and error: apply a series of trial lambda values and evaluate statistic. PPCC (Filliben’s Statistic): R2 of best fit line of the QQplot Kolomgorov-Smirnov (KS) Test (any distribution): p-value Shapiro-Wilks Test for Normality: p-value

26 Quantiles Rank the data Theoretical distribution, e.g. Standard Normal
x1 x2 x3 . xn pi qi qi is the distribution specific theoretical quantile associated with ranked data value xi

27 Quantile-Quantile Plots
QQ-plot for Raw Flows QQ-plot for Log-Transformed Flows ln(xi) qi xi qi Need transformation to make the Raw flows Normally distributed.

28 Example: Determining Transformation Parameters
Alafia River historical monthly flows Evaluate using all three criteria Test a range of lambda values from -2 to 2 by 0.1 for Filliben’s and KS Test a range of lambda values from -1 to 1 by 0.1 for Shapiro-Wilks (errors for larger lambda values).

29 Box-Cox Normality Plot for Monthly September Flows on Alafia R.
Using PPCC This is close to 0,  = -0.14

30 Kolmogorov-Smirnov Test
Specifically, it computes the largest difference between the target CDF FX(x) and the observed CDF, F*(X). The test statistic D2 is: where X(i) is the ith largest observed value in the random sample of size n.

31 Box-Cox Normality Plot for Monthly September Flows on Alafia R.
Using Kolmogorov-Smirnov (KS) Statistic This is not as close to 0,  = -0.39


33 Box-Cox Normality Plot for Monthly September Flows on Alafia R.
Using Shapiro-Wilks Statistic This is close to 0,  = Same as PPCC.

