Download presentation
Presentation is loading. Please wait.
Published byVincent Nelson Modified over 9 years ago
1
Statistics in WR: Lecture 1 Key Themes – Knowledge discovery in hydrology – Introduction to probability and statistics – Definition of random variables Reading: Helsel and Hirsch, Chapter 1
2
How is new knowledge discovered? By deduction from existing knowledge By experiment in a laboratory By observation of the natural environment After completing the Handbook of Hydrology in 1993, I asked myself the question: how is new knowledge discovered in hydrology? I concluded:
3
Deduction – Isaac Newton Deduction is the classical path of mathematical physics – Given a set of axioms – Then by a logical process – Derive a new principle or equation In hydrology, the St Venant equations for open channel flow and Richard’s equation for unsaturated flow in soils were derived in this way. (1687) Three laws of motion and law of gravitation http://en.wikipedia.org/wiki/Isaac_Newton
4
Experiment – Louis Pasteur Experiment is the classical path of laboratory science – a simplified view of the natural world is replicated under controlled conditions In hydrology, Darcy’s law for flow in a porous medium was found this way. Pasteur showed that microorganisms cause disease & discovered vaccination Foundations of scientific medicine http://en.wikipedia.org/wiki/Louis_Pasteur
5
Observation – Charles Darwin Observation – direct viewing and characterization of patterns and phenomena in the natural environment In hydrology, Horton discovered stream scaling laws by interpretation of stream maps Published Nov 24, 1859 Most accessible book of great scientific imagination ever written
6
Mean Annual Flow
7
Is there a relation between flow and water quality? Total Nitrogen in water
8
Are Annual Flows Correlated?
9
CE 397 Statistics in Water Resources, Lecture 2, 2009 David R. Maidment Dept of Civil Engineering University of Texas at Austin 9
10
Key Themes Statistics – Parametric and non-parametric approach Data Visualization Distribution of data and the distribution of statistics of those data Reading: Helsel and Hirsch p. 17-51 (Sections 2.1 to 2.3 Slides from Helsel and Hirsch (2002) “Techniques of water resources investigations of the USGS, Book 4, Chapter A3. 10
11
Characteristics of Water Resources Data Lower bound of zero Presence of “outliers” Positive skewness Non-normal distribution of data Data measured with thresholds (e.g. detection limits) Seasonal and diurnal patterns Autocorrelation – consecutive measurements are not independent Dependence on other uncontrolled variables e.g. chemical concentration is related to discharge 11
12
Normal Distribution From Helsel and Hirsch (2002) 12
13
Lognormal Distribution From Helsel and Hirsch (2002) 13
14
Method of Moments From Helsel and Hirsch (2002) 14
15
Statistical measures Location (Central Tendency) – Mean – Median – Geometric mean Spread (Dispersion) – Variance – Standard deviation – Interquartile range Skewness (Symmetry) – Coefficient of skewness Kurtosis (Flatness) – Coefficient of kurtosis 15
16
Histogram From Helsel and Hirsch (2002) 16 Annual Streamflow for the Licking River at Catawba, Kentucky 03253500
17
Quantile Plot From Helsel and Hirsch (2002) 17
18
Plotting positions i = rank of the data with i = 1 is the lowest n = number of data p = cumulative probability or “quantile” of the data value (its percentile value) 18
19
Normal Distribution Quantile Plot From Helsel and Hirsch (2002) 19
20
Probability Plot with Normal Quantiles (Z values) q z From Helsel and Hirsch (2002) 20
21
Annual Flows From HydroExcel 21 Annual Flows produced using Pivot Tables in Excel
22
22
23
CE 397 Statistics in Water Resources, Lecture 3, 2009 David R. Maidment Dept of Civil Engineering University of Texas at Austin 23
24
Key Themes Using HydroExcel for accessing water resources data using web services Descriptive statistics and histograms using Excel Analysis Toolpak Reading: Chapter 11 of Applied Hydrology by Chow, Maidment and Mays 24
25
CE 397 Statistics in Water Resources, Lecture 4, 2009 David R. Maidment Dept of Civil Engineering University of Texas at Austin 25
26
Key Themes Frequency and probability functions Fitting methods Typical distributions Reading: Chapter 4 of Helsel and Hirsh pp. 97- 116 on Hypothesis tests 26
27
27
28
Method of Moments 28
29
Maximum Likelihood 29
30
CE 397 Statistics in Water Resources, Lecture 5, 2009 David R. Maidment Dept of Civil Engineering University of Texas at Austin 30
31
Key Themes Using Excel to fit frequency and probability distributions Chi Square test and probability plotting Beginning hypothesis testing Reading: Chapter 3 of Helsel and Hirsh pp. 65- 97 on Describing Uncertainty Slides from Helsel and Hirsch Chap. 4 31
32
32
33
Statistics in Water Resources, Lecture 6 Key theme – T-distribution for distributions where standard deviation is unknown – Hypothesis testing – Comparing two sets of data to see if they are different Reading: Helsel and Hirsch, Chapter 6 Matched Pair Tests
34
Chi-Square Distribution http://en.wikipedia.org/wiki/Chi-square_distribution
35
t-, z and ChiSquare Source: http://en.wikipedia.org/wiki/Student's_t-distribution
36
Normal and t-distributions Normal t-dist for ν = 1 t-dist for ν = 30t-dist for ν = 5 t-dist for ν = 3 t-dist for ν = 2 t-dist for ν = 10
37
Standard Normal z – X 1, …, X n are independently distributed (μ,σ), and – then is normally distributed with mean 0 and std dev 1 Standard Normal and Student - t Student’s t-distribution – Applies to the case where the true standard deviation σ is unknown and is replaced by its sample estimate S n
38
38 p-value is the probability of obtaining the value of the test-statistic if the null hypothesis (H o ) is true If p-value is very small (<0.05 or 0.025) then reject H o If p-value is larger than α then do not reject H o
39
One-sided test
40
Two-sided test
41
Statistics in WR: Lecture 7 Key Themes – Statistics for populations and samples – Suspended sediment sampling – Testing for differences in means and variances Reading: Helsel and Hirsch Chapter 8 Correlation
42
Estimators of the Variance Maximum Likelihood Estimate for Population variance Unbiased estimate from a sample http://en.wikipedia.org/wiki/Variance
43
Bias in the Variance Common sense would suggest to apply the population formula to the sample as well. The reason that it is biased is that the sample mean is generally somewhat closer to the observations in the sample than the population mean is to these observations. This is so because the sample mean is by definition in the middle of the sample, while the population mean may even lie outside the sample. So the deviations from the sample mean will often be smaller than the deviations from the population mean, and so, if the same formula is applied to both, then this variance estimate will on average be somewhat smaller in the sample than in the population.
44
Suspended Sediment Sampling http://pubs.usgs.gov/sir/2005/5077/
45
T-test with same variances
46
T-test with different variances
47
Statistics in WR: Lecture 8 Key Themes – Replication in Monte Carlo experiments – Testing paired differences and analysis of variance – Correlation Reading: Helsel and Hirsch Chapter 9 Simple Regression
48
Statistics of Mean of Replicated Series
49
Patterns of data that all have correlation between x and y of 0.7
50
Monotonic nonlinear correlation Linear correlation Non-monotonic correlation
51
Statistics in WR: Lecture 9 Key Themes – Using SAS to compute cross-correlation between two data series – Using Excel to compute autocorrelation of a single data series – Correlation length and influence of data interval on that – Lagged Cross-correlation between rainfall and flow Reading: Helsel and Hirsch Chapter 12 Trend Analysis
52
Correlation Correlation (or cross-correlation) measures the association between two sets of data (x, y) Autocorrelation measures the correlation of a dataset with lagged or displace values of itself (either in time or space), e.g x(t) with x(t – L) where L is the lag time Lagged cross-correlation measures the association between one series y(t), and lagged values of another series x(t – L)
53
Statistics in WR: Lecture 10 Key Themes – Trend analysis using Simple Linear Regression – Characterization of outliers – Multiple Linear Regression Reading: Helsel and Hirsch Chapter 11 Multiple Linear Regression Slides are from Helsel and Hirsch, Chapter 9
54
H&H p.222
55
H&H p.226 Regression Formulas
56
H&H p.227 Regression Formulas
57
Statistics in WR: Lecture 11 Key Themes – Simple Linear Regression – Derivation of the normal equations – Multiple Linear Regression Reading: Helsel and Hirsch Chapter 7 Comparing several independent groups Reading: Barnett, Environmental Statistics Chapter 10 Time series methods Slides are from Helsel and Hirsch, Chapter 9
58
Regression Assumptions
59
Formulas used in the derivation of the normal equations
60
(1a) Plot the Data: TDS vs LogQ
61
(2) Interpret Regression Statistics
62
A good set of Residuals
63
Multiple Linear Regression
64
Simple vs Complex regression models
65
F-distribution http://en.wikipedia.org/wiki/F-test “If U is a Chisquare random variable with m degrees of freedom, V is a Chisquare random variable with n degrees of freedom, and if U and V are independent, then the ratio [(U/m)/V/n) has an F-distribution with (m, n) degrees of freedom.” Haan, Statistical Methods in Hydrology, p.122 The values of the F-statistic are tabulated at: http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm
66
Statistics in WR: Lecture 12 Key Themes – Regression y|x and x|y – Adjusted R 2 – Time series and seasonal variations
67
SUMMARY OUTPUT Regression Statistics Multiple R0.950344 R Square0.9031540.903154347 Adjusted R Square0.8985430.89854265 Standard Error159033.1 Observations23 ANOVA dfSSMSF Significance F Regression14.95309E+12 195.83994.07E-12 Residual (error)215.31122E+1125291521454 Total (y)225.48421E+12 R 2 and Adjusted R 2
68
Time Series Trend: Tide Levels at San Diego http://tidesandcurrents.noaa.gov/sltrends/sltrends_station.shtml?stnid=9410170%20San%20Diego,%20CA
69
One harmonic
70
Five harmonics http://en.wikipedia.org/wiki/Fourier_series
71
Statistics in WR: Lecture 13 Key Themes – ANOVA for sediment data – Fourier series for diurnal cycles – Fourier series for seasonal cycles
72
Analysis of Variance (ANOVA) Assumptions There are several variants (one factor, two factor, two factor with replication). We will deal just with One Factor ANOVA
73
Single Factor ANOVA
75
ANOVA Formulas
76
Single Factor ANOVA
77
TWDB Mean 189,000 Ton/yr USGS2 Mean 97,000 Ton/yr USGS1 Mean 218,000 Ton/yr Groups of Sediment Load Data (Ex3) Overall Mean 183,000 Ton/yr Zero 3.5 x 10 6 5.5 x 10 6 480,000
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.