Concepts in Probability, Statistics and Stochastic Modeling

Slides:



Advertisements
Similar presentations
Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
Advertisements

Chapter 7 Title and Outline 1 7 Sampling Distributions and Point Estimation of Parameters 7-1 Point Estimation 7-2 Sampling Distributions and the Central.
Estimation  Samples are collected to estimate characteristics of the population of particular interest. Parameter – numerical characteristic of the population.
Hydrologic Statistics
SOLVED EXAMPLES.
Statistical Inference Chapter 12/13. COMP 5340/6340 Statistical Inference2 Statistical Inference Given a sample of observations from a population, the.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
PROBABILITY AND SAMPLES: THE DISTRIBUTION OF SAMPLE MEANS.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Statistical Theory; Why is the Gaussian Distribution so popular? Rob Nicholls MRC LMB Statistics Course 2014.
Traffic Modeling.
Statistics for Data Miners: Part I (continued) S.T. Balke.
Random Sampling, Point Estimation and Maximum Likelihood.
Theory of Probability Statistics for Business and Economics.
Functions of Random Variables. Methods for determining the distribution of functions of Random Variables 1.Distribution function method 2.Moment generating.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
CS433: Modeling and Simulation Dr. Anis Koubâa Al-Imam Mohammad bin Saud University 15 October 2010 Lecture 05: Statistical Analysis Tools.
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch5. Probability Densities II Dr. Deshi Ye
Chapter 7 Point Estimation
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Lecture 2 Basics of probability in statistical simulation and stochastic programming Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius,
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
Probability = Relative Frequency. Typical Distribution for a Discrete Variable.
© 2002 Prentice-Hall, Inc.Chap 5-1 Statistics for Managers Using Microsoft Excel 3 rd Edition Chapter 5 The Normal Distribution and Sampling Distributions.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
Quantifying Uncertainty
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Conditional Expectation
Chapter 6 The Normal Distribution and Other Continuous Distributions
Normalizing Transformations and fitting a marginal distribution
Parameter, Statistic and Random Samples
Modeling and Simulation CS 313
Probability Distributions
The Maximum Likelihood Method
CEE Water and Environmental Seminar
Why Stochastic Hydrology ?
STATISTICS POINT ESTIMATION
Stat 223 Introduction to the Theory of Statistics
Modeling and Simulation CS 313
STAT 311 REVIEW (Quick & Dirty)
Probability.
Random Variables and their Properties
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Models to Represent the Relationships Between Variables (Regression)
Parameter Estimation 主講人:虞台文.
STATISTICAL INFERENCE PART I POINT ESTIMATION
The Maximum Likelihood Method
Two Concepts of Probability
Chapter 7: Sampling Distributions
Parameter, Statistic and Random Samples
Market Risk VaR: Historical Simulation Approach
Quantifying uncertainty using the bootstrap
Alafia river: Autocorrelation Autocorrelation of standardized flow.
Probability & Statistics Probability Theory Mathematical Probability Models Event Relationships Distributions of Random Variables Continuous Random.
Hydrologic Statistics
Stochastic Hydrology Hydrological Frequency Analysis (I) Fundamentals of HFA Prof. Ke-Sheng Cheng Department of Bioenvironmental Systems Engineering.
The normal distribution
POINT ESTIMATOR OF PARAMETERS
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Goodness-of-Fit Tests Applications
STOCHASTIC HYDROLOGY Random Processes
Statistics Lecture 12.
Applied Statistics and Probability for Engineers
Generating Random Variates
Empirical Distributions
Presentation transcript:

Concepts in Probability, Statistics and Stochastic Modeling Loucks et al., 2005, Chapter 7 Learning Objective Be able to use probability and statistics to quantify uncertainty and natural variability in physical quantities

How Express a Distribution Cumulative Density Probability Density Which method conveys the information best to you? Equation Probability Plot

Carl Friedrich Gauß, immortalized

A random variable X is a variable whose outcomes (values) are governed by the laws of chance. Probability density function

Cumulative distribution function

Continuous and Discrete Random Variables From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804

Generating a random variable from a given distribution F(U) F(X) U X Generate U from a uniform distribution between 0 and 1 Solve for X=F-1(U) Basis P(X<x)=P(U<F(x))=P(F-1(U)<x) F-1(U) is randomly distributed with CDF F(x)

Generating a Pseudo random number There is a lot of lore about this. Refer to: Press, W. H., B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, (1988), Numerical Recipes in C : The Art of Scientific Computing, Cambridge University Press, New York, 735 p. Congruential method Each r is an integer random number between 0 and m-1. by (m-1) gives a number between 0 and 1 that repeats after at most m numbers. Numerical recipes gives "good" choices for a, c and m. R has built in functions runif to generate uniform random numbers, as well as other distributions, e.g rnorm, rgamma.

Moments of Random Variables

L-Moments

Probability weighted moments

L-moment estimators

L-Moment Diagrams From: Loucks, D. P., E. van Beek, J. R. Stedinger, J. P. M. Dijkman and M. T. Villars, (2005), Water Resources Systems Planning and Management: An Introduction to Methods, Models and Applications, UNESCO, Paris, 676 p, http://hdl.handle.net/1813/2804

If needed. PLOT THE DISTRIBUTION OF THESE IN R AS A LIVE DEMO!! USETHE NIST ONLINE MANUAL AS A GUIDE TO THE RANGE ON THE X-AXIS AND THE SHAPE PARAMETER. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p.

If needed. PLOT THE DISTRIBUTION OF THESE IN R AS A LIVE DEMO!! USETHE NIST ONLINE MANUAL AS A GUIDE TO THE RANGE ON THE X-AXIS AND THE SHAPE PARAMETER. From: Salas, J. D., J. W. Delleur, V. Yevjevich and W. L. Lane, (1980), Applied Modeling of Hydrologic Time Series, Water Resources Publications, Littleton, Colorado, 484 p.

Fitting a probability distribution to data Hillsborough River at Zephyr Hills, September flows = 8621 mgal S = 8194 mgal n = 31 mgal

Method of Moments Using the sample moments as the estimate for the population parameters

Method of Moments Gamma distribution =1.1 =1.3 x 10-3

Method of Moments Log-Normal distribution =0.643 =8.29

Method of Maximum Likelihood “Back into” the estimate by assuming the parameters we are trying to estimate from the data are known. How likely are the sample values we have, given a certain set of parameter values? We can express this as the joint density of the random sample given the parameter value. After we obtain the data (random sample), we use the joint density to define the Likelihood function. Say… each data point is treated as an indep sample from the prob dist. For a given distribution, what is

Likelihood ln(L)= -311 (for gamma) ln(L)= -312 (for log normal) Could use maximization of L or ln(L) to select parameters rather than fitting moments

Normalization Much theory relies on the central limit theorem so applies to Normal Distributions Where the data is not normally distributed normalizing transformations are used Log Box Cox (Log is a special case of Box Cox)

Box-Cox Normalization The Box-Cox family of transformations that includes the logarithmic transformation as a special case (l=0). It is defined as: z = (x -1)/ ;   0 z = ln(x);  = 0 where z is the transformed data, x is the original data and  is the transformation parameter.

Box-Cox Normalization So… the log looked OK ( = 0). Is that what we really want? Let’s skip the derivations for now and look at the answer for our three proposed methods.

Determining Transformation Parameters Trial and error: apply a series of trial lambda values and evaluate statistic. PPCC (Filliben’s Statistic): R2 of best fit line of the QQplot Kolomgorov-Smirnov (KS) Test (any distribution): p-value Shapiro-Wilks Test for Normality: p-value

Quantiles Rank the data Theoretical distribution, e.g. Standard Normal x1 x2 x3 . xn pi qi qi is the distribution specific theoretical quantile associated with ranked data value xi

Quantile-Quantile Plots QQ-plot for Raw Flows QQ-plot for Log-Transformed Flows ln(xi) qi xi qi Need transformation to make the Raw flows Normally distributed.

Example: Determining Transformation Parameters Alafia River historical monthly flows Evaluate using all three criteria Test a range of lambda values from -2 to 2 by 0.1 for Filliben’s and KS Test a range of lambda values from -1 to 1 by 0.1 for Shapiro-Wilks (errors for larger lambda values).

Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using PPCC This is close to 0,  = -0.14

Kolmogorov-Smirnov Test Specifically, it computes the largest difference between the target CDF FX(x) and the observed CDF, F*(X). The test statistic D2 is: where X(i) is the ith largest observed value in the random sample of size n.

Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using Kolmogorov-Smirnov (KS) Statistic This is not as close to 0,  = -0.39

http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/wilkshap.htm

Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using Shapiro-Wilks Statistic This is close to 0,  = -0.14. Same as PPCC.