Experimental Statistics - week 3 Chapter 8: Inferences about More Than 2 Population Central Values
PC SAS on Campus Library BIC Student Center SAS Learning Edition $125 http://support.sas.com/rnd/le/index.html
Hypothetical Sample Data Scenario A Scenario B Pop 1 Pop 2 5 8 7 9 6 6 3 8 4 9 Pop 1 Pop 2 3 7 10 4 3 12 1 4 8 13 For one scenario, | t | = 1.17 For the other scenario, | t | = 3.35
In general, for 2-sample t-tests: To show significance, we want the difference between groups to be ___________ compared to the variability within groups
Completely Randomized Design 1-Factor Analysis of Variance (ANOVA) Setting (Assumptions): - t populations - populations are normal denote the mean and variance of the ith population - mutually independent random samples are taken from the populations - the sample sizes to not have to all be equal
1-Factor ANOVA m1 s m2 s mk s . . .
Question: Notes: - not directional i.e. no “1-sided / 2-sided” issues - alternative doesn’t say that all means are distinct
Completely Randomized Design 1-Factor Analysis of Variance Example data setup where t = 5 and n = 4
Notation:
A Sum-of-Squares Identity Note: This is for the case in which all sample sizes are equal ( n ) The 3 sums of squares measure: - variability between samples - variability within samples - total variability Question: Which measures what?
In words: Total SS = SS between samples + within sample SS where TSS(total SS) = total sample variability SSB(SS between samples) = variability due to factor effects SSW(within sample SS) = variability due to uncontrolled error Note: Formula for unequal sample sizes given on page 388
Pop 1 5 5 5 5 Pop 2 9 9 9 9 Pop 3 7 7 7 7
Pop 1 4 8 3 9 Pop 2 6 10 2 6 Pop 3 5 8 7 4
Recall: For 2-sample t-test, we tested using To show significance, we want the difference between groups compared to the variability within groups
Note: Our test statistic for testing will be of the form This has an F distribution Question: What type of F values lead you to believe the null is NOT TRUE?
Analysis of Variance Table Note:
Note:
CAR DATA Example For this analysis, 5 gasoline types (A - E) were to be tested. Twenty cars were selected for testing and were assigned randomly to the groups (i.e. the gasoline types). Thus, in the analysis, each gasoline type was tested on 4 cars. A performance-based octane reading was obtained for each car, and the question is whether the gasolines differ with respect to this octane reading. A 91.7 91.2 90.9 90.6 B 91.7 91.9 90.9 C 92.4 91.2 91.6 91.0 D 91.8 92.2 92.0 91.4 E 93.1 92.9 92.4 means 91.10 91.35 91.55 91.85 92.70
ANOVA Table Output - car data Source SS df MS F p-value Between 6.108 4 1.527 6.80 0.0025 samples Within 3.370 15 0.225 Totals 9.478 19
F-table -- p.1106
Extracted from From Ex. 8.2, page 390-391 3 Methods for Reducing Hostility 12 students displaying similar hostility were randomly assigned to 3 treatment methods. Scores (HLT) at end of study recorded. Method 1 96 79 91 85 Method 2 77 76 74 73 Method 3 66 73 69 66 Test:
ANOVA Table Output - hostility data Source SS df MS F p-value Between samples Within Totals
SPSS ANOVA Table for Hostility Data SPSS ANOVA Table for Hostility Data
ANOVA Models Note: Example: Population has mean m = 5. Consider the random sample
For 1-factor ANOVA
Alternative form of the 1-Factor ANOVA Model General Form of Model: Alternative form of the 1-Factor ANOVA Model (pages 394-395) - random errors follow a Normal distribution, are independently distributed, and have zero mean and constant variance -- i.e. variability does not change from group to group
Analysis of Variance Table Recall: Note: - if no factor effects, we expect F _____ - if factor effects, we expect F _____
The CAR data set as SAS needs to see it: A 91.7 A 91.2 A 90.9 A 90.6 B 91.7 B 91.9 B 90.9 C 92.4 C 91.2 C 91.6 C 91.0 D 91.8 D 92.2 D 92.0 D 91.4 E 93.1 E 92.9 E 92.4
SAS file for CAR data Case 1: Data within SAS FILE : DATA one; DATA one; INPUT gas$ octane; DATALINES; A 91.7 A 91.2 . E 92.4 ; PROC GLM; CLASS gas; MODEL octane=gas; TITLE 'Gasoline Example - Completely Randomized Design'; MEANS gas; RUN; PROC MEANS mean var; class gas;
The SAS Output for CAR data: Gasoline Example - Completely Randomized Design General Linear Models Procedure Dependent Variable: OCTANE Sum of Mean Source DF Squares Square F Value Pr > F Model 4 6.10800000 1.52700000 6.80 0.0025 Error 15 3.37000000 0.22466667 Corrected Total 19 9.47800000 R-Square C.V. Root MSE OCTANE Mean 0.644440 0.516836 0.4739902 91.710000 Source DF Type I SS Mean Square F Value Pr > F GAS 4 6.10800000 1.52700000 6.80 0.0025 Textbook Format for ANOVA Table Output - car data Source SS df MS F p-value Between 6.108 4 1.527 6.80 0.0025 samples Within 3.370 15 0.225 Totals 9.478 19
Problem 1. Descriptive Statistics for CAR Data The MEANS Procedure Analysis Variable : octane Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.7100000 0.7062876 90.6000000 93.1000000
Problem 3. Descriptive Statistics by Gasoline ------------------------------------ gas=A ------------------------------------- The MEANS Procedure Analysis Variable : octane Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 91.1000000 0.4690416 90.6000000 91.7000000 ------------------------------------ gas=B ------------------------------------- 91.3500000 0.5259911 90.9000000 91.9000000 ------------------------------------ gas=C ------------------------------------- Mean Std Dev Minimum Maximum 91.5500000 0.6191392 91.0000000 92.4000000 ------------------------------------ gas=D ------------------------------------- Analysis Variable : octane 91.8500000 0.3415650 91.4000000 92.2000000 ------------------------------------ gas=E ------------------------------------- 92.7000000 0.3559026 92.4000000 93.1000000
Question 1: Which gasolines are different? Question 2: Why didn’t we just do t-tests to compare all combinations of gasolines? i.e. compare A vs B A vs C . . . D vs E
Simulation: i.e. using computer to generate data under certain known conditions and observing the outcomes
Setting: Simulation Experiment: Question: Normal population with: m = 20 and s = 5 Simulation Experiment: Generate 2 samples of size n = 10 from this population and run t-test to compare sample means. i.e test: Question: What do we expect to happen?
(which is what we expected) Simulation Results: t-test procedure: (a = .05) Reject H0 if | t | > 2.101 1 21.6 4.0 2 21.1 5.4 t = .235 so we do not reject H0 (which is what we expected)
Simulation results: Now - suppose we obtain 10 samples and test 1 21.6 4.0 2 21.1 5.4 3 20.9 6.2 4 18.3 3.2 5 23.1 6.7 6 18.6 4.8 7 22.2 5.8 8 19.1 5.9 9 20.3 2.5 10 19.3 3.2 Note: Comparing means 4 vs 5 we get t = 2.33 -- i.e. we reject the null (but it’s true!!)
Suppose we run all possible t-tests at significance level a = Suppose we run all possible t-tests at significance level a = .05 to compare 10 sample means of size n = 10 from this population - it can be shown that there is a 63% chance that at least one pair of means will be declared significantly different from each other F-test in ANOVA controls overall significance level.
Probability of finding at least 2 of k means significantly different using multiple t-tests at the a = .05 level when all means are actually equal. k Prob. 2 .05 3 .13 4 .21 5 .29 10 .63 20 .92
Fisher’s Least Significant Difference (LSD) Protected LSD: Preceded by an F-test for overall significance. Only use the LSD if F is significant. X Unprotected: Not preceded by an F-test (like individual t-tests).
Gasoline Example - Completely Randomized Design -- All 5 Gasolines The GLM Procedure Dependent Variable: octane Sum of Source DF Squares Mean Square F Value Pr > F Model 4 6.10800000 1.52700000 6.80 0.0025 Error 15 3.37000000 0.22466667 Corrected Total 19 9.47800000 R-Square Coeff Var Root MSE octane Mean 0.644440 0.516836 0.473990 91.71000 Source DF Type I SS Mean Square F Value Pr > F gas 4 6.10800000 1.52700000 6.80 0.0025