Download presentation
Presentation is loading. Please wait.
Published byErik Stone Modified over 9 years ago
1
BASIC STATISTICAL CONCEPTS Statistical Moments & Probability Density Functions Ocean is not “stationary” “Stationary” - statistical properties remain constant in time Data collected have signal and noise Both signal and noise are assumed to have random behavior Population Sample
2
Most basic descriptive parameter for any set of measurements: Sample Mean over the duration of a time series – “time average” or over an ensemble of measurements – “ensemble mean” Sample mean is an unbiased estimate of the population mean ‘ ’ The population mean, μ, can be regarded as the expected outcome E(y) of an event y. If the measurement is executed many times, μ would be the most common outcome, i.e., it’d be E(y) (e.g. the weight printed on a bag of chips)
3
Sample Mean - locates center of mass of data distribution such that: Weighted Sample Mean relative frequency of occurrence of i th value
4
Variance - describes spread about the mean or sample variability Sample variance Sample standard deviation typical difference from the mean Population variance (unbiased) N needs to be > 1 to define variance and std dev Only for N < 30 s’ and are significantly different Computationally more efficient (only one pass through the data)
5
Population variance has one degree of freedom (dof) < Sample variance because we estimate population variance with sample variance (one less dependent measure) d.o.f. : = # of independent pieces of data being used to make a calculation. = measure of how certain we are that our sample is representative of the entire population The larger the more certain we are that we have sampled the entire population Example: we have 2 observations, when estimating the mean we have 2 independent observations: = 2 But when estimating the variance, we have one independent observation because the two observations are at the same distance from the mean: =1
6
Other values of Importance range (1.27) 0.66 -0.61 Median – equal number of values above and below = -0.007 Mode – value occurring most often N = 1601
7
Mode = -0.3 Two Modes Bimodal
8
Probability Provides procedures to infer population distribution from sample distribution and to determine how good the inference is The probability of a particular event to occur is the ratio of the number of occurrences of that event and the total number of occurrences for all possible events P (a dice showing ‘6’) = 1/6 The probability of a continuous variable is defined by a PROBABILITY DENSITY FUNCTION -- PDF 0 P (x) 1
9
Probability is measured by the area underneath PDF
11
Probability Density Function Gauss Normal Bell Gauss or Normal or Bell 11 22 33 erf(1/(2) ½ ) = 68.3% erf(2/(2) ½ ) = 95.4% erf(3/(2) ½ ) = 99.7%
12
11 22 33 68.3% 95.4% 99.7% standardized normal variable Probability Density Function Gauss Normal Bell Gauss or Normal or Bell
13
Probability Density FunctionGamma = 1 = 1 = 2 = 3 = 4
14
Probability Density FunctionGamma = 2 = 1 = 2 = 3 = 4
15
Probability Density Function Chi Square = /2 Special case for = 2 = 2 = 4 = 6 = 8 4 24 2 8 28 2 12 2 16 2
16
CONFIDENCE INTERVALS 1 - /2 known Confidence Interval for with known For N > 30 (large enough sample) confidence interval the 100 (1 - )% confidence interval is: standardized normal variable
17
(1 - /2) = 0.975 http://statistics.laerd.com/statistical-guides/normal-distribution-calculations.php z /2 = 1.96
18
C.I. 100 (1 - )% C.I. is: If = 0.05, z /2 = 1.96 Suppose we have a CT sensor at the outlet of a spring into the ocean. We obtain a burst sample of 50 measurements, once per second, with a sample mean of 26.5 ºC and a stdev of 1.2 ºC for the burst. What is the range of possible values, at the 95% confidence, for the population mean?
19
CONFIDENCE INTERVALS 1 - /2 unknown Confidence Interval for with unknown For N < 30 (small samples) confidence interval the 100 (1 - )% confidence interval is: Student’s t-distribution with = (N-1) degrees of freedom
20
/2 = 0.025 d.o.f.= 19
21
1 - /2 C.I. 100 (1 - )% C.I. is: If = 0.05, t 0.025,19 = 2.093 Suppose we do 20 CTD profiles at one station in St Augustine Inlet. We obtain a mean at the surface of 16.5 ºC and a stdev of 0.7 ºC. What is the range of possible values, at the 95% confidence, for the population mean?
22
CONFIDENCE INTERVALS 1 - /2 2 Confidence Interval for 2 To determine reliability of spectral peaks Need to know C.I. for 2 on the basis of s 2 = (N-1) degrees of freedom
23
1 - /2 Suppose that we have = 10 spectral estimates of a tidal record. C.I. 100 (1 - )% C.I. is: The background variance near a distinct spectral peak is 0.3 m 2 95% C.I. for variance? How large would the peak have to be to stand out, statistically, from background level? /2 = 0.025; 1 - /2 = 0.975 Look at Chi square table:
24
Chi Square Table The background variance lies in this range The spectral peak has to be greater than 0.92 m 2 to distinguish it from background levels
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.