Download presentation
Presentation is loading. Please wait.
Published byMillicent Hines Modified over 9 years ago
1
Tests dealing with the mean of data samples Tests dealing with the variance of the samples Tests dealing with correlation coefficients Tests dealing with regression parameters Testing sample mean: Is it equal/ larger/ smaller a prescribed value? Comparing two sample sets: Are the mean values different? Comparing paired samples: Are the differences equal/ larger/smaller a certain value? Testing the correlation coefficient obtained from two paired samples: Is correlation equal 0, larger 0, or smaller 0? Testing a single sample variance: Is the variance equal/greater/smaller a prescribed value? Testing the ratio between the estimated variances from two sample sets: Are the variances equal? Is the ratio between the variances equal 1 greater 1 or smaller 1 ? Testing a simple linear regression model: (a)Is the regression coefficient different from 0, greater 0 or smaller 0. (b)With multiple predictors: Are all regression coefficients as a whole significantly different from 0? Which individual regression parameters are different from 0?
2
Testing the significance of the differences in the speed (of the Starling bird flying through a corridor with striped walls) ExperimentSample size n Standard deviations (guessed) Horizontal stripes 16.5ft/s101.5 Vertical stripes 15.3ft/s101 Step 1: Identifying the type of statistical test: We want to test the difference in the two mean values: The test compares two estimated means. [Both are random variables with an underlying Probability Density Function (PDF)] The variance of samples (and the variance of the means) are also unknown and must be estimated from the data The samples are not paired (the experiments were all done independent)
3
Testing the significance of the differences in the speed (of the Starling bird flying through a corridor with striped walls) ExperimentSample size n Standard Deviations (guessed) Horizontal stripes 16.5ft/s101.5 Vertical stripes 15.3ft/s101 The appropriate test is: “A test for the differences of means under independence” (or “Comparing two independent population means with unknown population standard deviations”) The null hypothesis is H 0 : The average speed is the same in both experiments If H 0 is true then the random variable z is a realization from a population with approximate standard Gaussian distribution.* *Note: Only for large sample sizes n 1 and n 2
4
* `Student' (1908a). The probable error of a mean. Biometrika, 6, 1-25. + William S. Gosset: ‘He received a degree from Oxford University in Chemistry and went to work as a “brewer'' in 1899 at Arthur Guinness Son and Co. Ltd. in Dublin, Ireland’ (Steve Fienberg. "William Sealy Gosset" (version 4). StatProb: The Encyclopedia Sponsored by Statistics and Probability Societies. Freely available at http://statprob.com/encyclopedia/WilliamSealyGOSSET.html)Steve Fienberg Testing if Albany temperatures anomalies from 1950-1980 were different from 0: January 1950-1980 anomalies with respect to the 1981-2010 climatological mean Dashed line: Theoretical probability density function of our test variable. If H 0 was true then our test value should be a random sample from this distribution. That means we would expect it to be close to zero. The more our test value lies in the tails of the distribution, the more unlikely it is to be part of the distribution. The test value calculated from the sample
5
Testing if Albany temperatures anomalies from 1950-1980 were different from zero: Annual mean 1950-1980 anomalies with respect to the 1981-2010 climatological mean The test value calculated from the sample. Test variable The test variable t is calculated from a random sample. As any other quantity estimated from random samples, it is a random variable drawn from a theoretical population with
6
Testing H 0 : Albany (New York Central Park) temperatures anomalies from 1950-1980 not different from 0. NYC 1950-1980 JanAlbany 1950-1980 Jan Solid lines: Cumulative density function (for the test variable if H 0 is true)
7
Testing H 0 : Albany (New York Central Park) temperatures anomalies from 1950-1980 not different from 0. Alternative hypothesis: the mean anomaly was less than 0! (i.e. it was colder 1950-1980 than 1981-2010) NYC 1950-1980 JanAlbany 1950-1980 Jan Solid lines: Choose a significance test level 5% one sided t-test 0.05
8
Testing H 0 : Albany (New York Central Park) temperatures anomalies from 1950-1980 not different from 0. Alternative hypothesis: the mean anomaly was less than 0! (i.e. it was colder 1950-1980 than 1981-2010) NYC 1950-1980 JanAlbany 1950-1980 Jan Solid lines: Choose a significance test level 5% one sided t-test 0.05 Reject H 0 ! Accept alternative! Accept H 0 !
9
0 Null Hypothesis H 0 : Albany temperatures anomalies from 1950-1980 not different from 0. Alternative Hypothesis H a : Temperature anomalies were negative* *Note that we formed anomalies with respect to the 1981-2010 climatology. Thus we test if 1950-1980 was significantly cooler than the 1981-2010. t Area under the curve gives the probability P(t< t crit ) t crit
10
0 Null Hypothesis H 0 : Albany temperatures anomalies from 1950-1980 not different from 0. Alternative Hypothesis H a : Temperature anomalies were negative* *Note that we formed anomalies with respect to the 1981-2010 climatology. Thus we test if 1950-1980 was significantly cooler than the 1981-2010. t Area under the curve gives the probability p(t< t crit ) t crit We reject the null hypothesis if the calculated t-value falls into the tail of the distribution. The p-value is chosen usually chosen to be small 0.1 0.05 0.01 are typical –p-values. We then say: “We reject the null-hypothesis at the level of significance of 10% (5%) (1%)” Calculated t
11
0 Null Hypothesis H 0 : Albany temperatures anomalies from 1950-1980 not different from 0. Alternative Hypothesis H a : Temperature anomalies were different from zero *Note that we formed anomalies with respect to the 1981-2010 climatology. Thus we test if 1950-1980 was significantly cooler than the 1981-2010. t Area under the curve gives the probability P(t< -t crit ) -t crit Area under the curve gives the probability P(t > +t crit ) +t crit
12
0 Null Hypothesis H 0 : Albany temperatures anomalies from 1950-1980 not different from 0. Alternative Hypothesis H a : Temperature anomalies were different from zero t t crit Calculated t We cannot reject H 0 at the two-sided significance level of ‘p’-percent (e.g. 5%)
13
Hypothesis/ConclusionNull hypothesis H 0 trueNull hypothesis H 0 false Null hypothesis acceptedCorrect decisionFalse decision (Type II error) Null hypothesis rejected False decision (Type I error) Correct decision
14
H 0 : Here we would reject H 0 for the given p-value (α = 0.05) Figure 5.1 from Wilks “Statistical Methods in Atmospheric Sciences” (2006) Calculated test value
15
H 0 : Here we would accept H 0 for the given p-value (α = 0.05) Figure 5.1 from Wilks “Statistical Methods in Atmospheric Sciences” (2006) Calculated test value
16
Hypothesis/ConclusionNull hypothesis H 0 trueNull hypothesis H 0 false Null hypothesis acceptedCorrect decisionFalse decision (Type II error) Probability of this type of error is usually hard to quantify ( β‘beta’) Null hypothesis rejected False decision (Type I error) Probability of this error is given by the p-value ( α ‘alpha’) Correct decision
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.