Alafia river: Autocorrelation Autocorrelation of standardized flow.

Slides:



Advertisements
Similar presentations
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Advertisements

Biomedical Statistics Testing for Normality and Symmetry Teacher:Jang-Zern Tsai ( 蔡章仁 ) Student: 邱瑋國.
CHAPTER 21 Inferential Statistical Analysis. Understanding probability The idea of probability is central to inferential statistics. It means the chance.
Outline input analysis input analyzer of ARENA parameter estimation
Chap 10: Summarizing Data 10.1: INTRO: Univariate/multivariate data (random samples or batches) can be described using procedures to reveal their structures.
Simulation Modeling and Analysis
Jan Shapes of distributions… “Statistics” for one quantitative variable… Mean and median Percentiles Standard deviations Transforming data… Rescale:
Descriptive statistics Experiment  Data  Sample Statistics Experiment  Data  Sample Statistics Sample mean Sample mean Sample variance Sample variance.
Today Today: Chapter 9 Assignment: 9.2, 9.4, 9.42 (Geo(p)=“geometric distribution”), 9-R9(a,b) Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Lecture II-2: Probability Review
Hydrologic Statistics
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
PATTERN RECOGNITION AND MACHINE LEARNING
Probability theory 2 Tron Anders Moger September 13th 2006.
Random Sampling, Point Estimation and Maximum Likelihood.
SPC for Real-World Processes William A. Levinson, P.E. Intersil Corporation Mountaintop, PA.
2 Input models provide the driving force for a simulation model. The quality of the output is no better than the quality of inputs. We will discuss the.
1 Statistical Distribution Fitting Dr. Jason Merrick.
Tests for Random Numbers Dr. Akram Ibrahim Aly Lecture (9)
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Chapter 5.6 From DeGroot & Schervish. Uniform Distribution.
Selecting Input Probability Distribution. Simulation Machine Simulation can be considered as an Engine with input and output as follows: Simulation Engine.
Example: Bioassay experiment Problem statement –Observations: At each level of dose, 5 animals are tested, and number of death are observed.
LECTURE 3: ANALYSIS OF EXPERIMENTAL DATA
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
 Assumptions are an essential part of statistics and the process of building and testing models.  There are many different assumptions across the range.
1 WHY WE USE EXPLORATORY DATA ANALYSIS DATA YES NO ESTIMATES BASED ON NORMAL DISTRIB. KURTOSIS, SKEWNESS TRANSFORMATIONS QUANTILE (ROBUST) ESTIMATES OUTLIERS.
Chapter 20 Statistical Considerations Lecture Slides The McGraw-Hill Companies © 2012.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Computacion Inteligente Least-Square Methods for System Identification.
Quantifying Uncertainty
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
Normalizing Transformations and fitting a marginal distribution
Parameter, Statistic and Random Samples
Modeling and Simulation CS 313
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Multiple Random Variables and Joint Distributions
Chapter 4: Basic Estimation Techniques
Probability Distributions
Concepts in Probability, Statistics and Stochastic Modeling
The Maximum Likelihood Method
Chapter 7. Classification and Prediction
BAE 6520 Applied Environmental Statistics
CEE Water and Environmental Seminar
Why Stochastic Hydrology ?
BAE 5333 Applied Water Resources Statistics
Modeling and Simulation CS 313
Random Variables and their Properties
Models to Represent the Relationships Between Variables (Regression)
Parameter Estimation 主講人:虞台文.
The Maximum Likelihood Method
Two Concepts of Probability
Flood Frequency Analysis
Nonparametric Density Estimation
Checking Regression Model Assumptions
Subject Name: SYSTEM MODELING AND SIMULATION [UNIT-7 :INPUT MODELING]
3.1 Expectation Expectation Example
Propagating Uncertainty In POMDP Value Iteration with Gaussian Process
Statistical Methods For Engineers
Checking Regression Model Assumptions
Hydrologic Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
Goodness-of-Fit Tests Applications
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and 2 Now, we need procedures to calculate  and 2 , themselves.
Diagnostics and Remedial Measures
Introductory Statistics
Empirical Distributions
Presentation transcript:

Alafia river: Autocorrelation Autocorrelation of standardized flow

Alafia River: Monthly streamflow distribution

Storage-Yield Analysis Sequent Peak Procedure Rt = y Kt = Kt-1 + Rt – Qt If Kt < 0, Kt=0 S = Max(Kt)

Reservoir Storage-Yield Analysis R/Q

Box Plot Outliers: beyond 1.5*IQR Whiskers: 1.5*IQR or largest value Box: 25th %tile to 75th %tile Line: Median (50th %tile) - not the mean Note: The range shown by the box is called the “Inter-Quartile Range” or IQR. This is a robust measure of spread. It is insensitive to outliers since it is based purely on the rank of the values.

Reservoir Reliability Analysis

General function fitting

General function fitting – Independent data samples x1 x2 x3 y ……… Input Output Independent data vectors Example linear regression y=a x + b+

Time series function fitting x1 x2 . xt

Time series autoregressive function fitting – Method of delays Embedding dimension x1 x2 x3 x4 x2 x3 x4 x5 x3 x4 x5 x6 ……… xt-3 xt-2 xt-1 xt Samples data vectors constructed using lagged copies of the single time series ExampleAR1 model xt =  xt-1 +  Trajectory matrix

Generating a random variable from a given distribution F(U) F(X) U X Generate U from a uniform distribution between 0 and 1 Solve for X=F-1(U) Basis P(X<x)=P(U<F(x))=P(F-1(U)<x) F-1(U) is randomly distributed with CDF F(x)

Fitting a probability distribution to data Hillsborough River at Zephyr Hills, September flows = 8621 mgal S = 8194 mgal n = 31 mgal

Method of Moments Using the sample moments as the estimate for the population parameters

Method of Moments Gamma distribution =1.1 =1.3 x 10-3

Method of Moments Log-Normal distribution =0.643 =8.29

Method of Maximum Likelihood “Back into” the estimate by assuming the parameters we are trying to estimate from the data are known. How likely are the sample values we have, given a certain set of parameter values? We can express this as the joint density of the random sample given the parameter value. After we obtain the data (random sample), we use the joint density to define the Likelihood function. Say… each data point is treated as an indep sample from the prob dist. For a given distribution, what is

Likelihood ln(L)= -311 (for gamma) ln(L)= -312 (for log normal) Could use maximization of L or ln(L) to select parameters rather than fitting moments

Normalizing Transformations and fitting a marginal distribution Much theory relies on the central limit theorem so applies to Normal Distributions Where the data is not normally distributed normalizing transformations are used Log Box Cox (Log is a special case of Box Cox) A specific PDF, e.g. Gamma A non parametric PDF

Approach Select the class of distributions you want to fit Estimate parameters using an appropriate goodness of fit measure Likelihood PPCC (Filliben’s statistic) Kolmogorov Smirnov p value Shapiro Wilks W

Normalizing transformation for arbitrary distribution Arbitrary distribution F(x) Normal distribution Fn(y) x y Normalizing transformation Back transformation

Kernel Density Estimate (KDE) Place “kernels” at each data point Sum up the kernels Width of kernel determines level of smoothing Determining how to choose the width of the kernel could be a full day lecture! Narrow kernel Sum of kernels Medium kernel Individual kernels Wide kernel

1-d KDE of Log-transformed Flow Level of smoothing: 0.5 Rug plot: shows location of data points Level of smoothing: 0.2 Level of smoothing: 0.8

Non parametric PDF in R # Read in Willamette R. flow data q=matrix(scan("willamette_data.txt"),ncol=3,byrow=T) # Assign variables yr=q[,1] mo=q[,2] flow=q[,3]   # Format flows into a matrix fmat=matrix(flow,ncol=12,byrow=T) # focus on January and February # Marginal distributions # Create histogram for each month, with actual streamflow data on x-axis and KDE # of marginal distribution using....Gaussian kernel and nrd0 bandwidth par(mfrow=c(1,2)) for(i in 1:2){ x=fmat[,i] hist(x,nclass=15,main= month.name[i] ,xlab="cfs",probability=T) lines(density(x,bw="nrd0",na.rm=TRUE),col=2) rug(x,,,,2) box() } hist(x,nclass=15,main= month.name[i] ,xlab="cfs",probability=T) lines(density(x,bw="nrd0",na.rm=TRUE),col=2) rug(x,,,,2)

Non parametric CDF in R cdf.r=function(density) { x=density$x yt=cumsum(density$y) n=length(yt) y=(yt-yt[1])/(yt[n]-yt[1]) # force onto the range 0,1 without checking for significant error list(x=x,y=y) } dd=density(x,bw="nrd0",na.rm=TRUE) cdf=cdf.r(dd) plot(cdf,type="l") cdf.r=function(density) { x=density$x yt=cumsum(density$y) n=length(yt) y=(yt-yt[1])/(yt[n]-yt[1]) # force onto the range 0,1 without checking for significant error list(x=x,y=y) } dd=density(x,bw="nrd0",na.rm=TRUE) cdf=cdf.r(dd) plot(cdf,type="l")   ylookup.r=function(x,cdf) int=sum(cdf$x<x) # This identifies the interval for interpolation n=length(cdf$x) if(int < 1){ y=cdf$y[1] }else if(int > n-1) y=cdf$y[n] else y=((x-cdf$x[int])*cdf$y[int+1]+(cdf$x[int+1]-x)*cdf$y[int])/(cdf$x[int+1]-cdf$x[int]) return(y) xlookup.r=function(y,cdf) int=sum(cdf$y<y) # This identifies the interval for interpolation x=cdf$x[1] x=cdf$x[n] x=((y-cdf$y[int])*cdf$x[int+1]+(cdf$y[int+1]-y)*cdf$x[int])/(cdf$y[int+1]-cdf$y[int]) return(x) ylookup.r=function(x,cdf) xlookup.r=function(y,cdf) { int=sum(cdf$y<y) # This identifies the interval for interpolation x=((y-cdf$y[int])*cdf$x[int+1]+(cdf$y[int+1]-y)*cdf$x[int])/(cdf$y[int+1]-cdf$y[int]) return(x) }

Gamma Estimate parameters using moments or maximum likelihood

Box-Cox Normalization The Box-Cox family of transformations that includes the logarithmic transformation as a special case (l=0). It is defined as: z = (x -1)/ ;   0 z = ln(x);  = 0 where z is the transformed data, x is the original data and  is the transformation parameter.

Log normalization with lower bound z = ln(x-)

Determining Transformation Parameters (, ) PPCC (Filliben’s Statistic): R2 of best fit line of the QQplot Kolomgorov-Smirnov (KS) Test (any distribution): p-value Shapiro-Wilks Test for Normality: p-value

Quantiles Rank the data Theoretical distribution, e.g. Standard Normal x1 x2 x3 . xn pi qi qi is the distribution specific theoretical quantile associated with ranked data value xi

Quantile-Quantile Plots QQ-plot for Raw Flows QQ-plot for Log-Transformed Flows ln(xi) qi xi qi Need transformation to make the Raw flows Normally distributed.

Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using PPCC This is close to 0,  = -0.14

Kolmogorov-Smirnov Test Specifically, it computes the largest difference between the target CDF FX(x) and the observed CDF, F*(X). The test statistic D2 is: where X(i) is the ith largest observed value in the random sample of size n.

Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using Kolmogorov-Smirnov (KS) Statistic This is not as close to 0,  = -0.39

shapiro.test(x) in R http://www.itl.nist.gov/div898/software/dataplot/refman1/auxillar/wilkshap.htm

Box-Cox Normality Plot for Monthly September Flows on Alafia R. Using Shapiro-Wilks Statistic This is close to 0,  = -0.14. Same as PPCC.

Testing simulated marginal distributions

Testing correlation and skewness

Testing state dependent correlations