Comparison of 2 Population Means Goal: To compare 2 populations/treatments wrt a numeric outcome Sampling Design: Independent Samples (Parallel Groups) vs Paired Samples (Crossover Design) Data Structure: Normal vs Non-normal Sample Sizes: Large (n 1,n 2 >20) vs Small
Independent Samples Units in the two samples are different Sample sizes may or may not be equal Large-sample inference based on Normal Distribution (Central Limit Theorem) Small-sample inference depends on distribution of individual outcomes (Normal vs non-Normal)
Parameters/Estimates (Independent Samples) Parameter: Estimator: Estimated standard error: Shape of sampling distribution: –Normal if data are normal –Approximately normal if n 1,n 2 >20 –Non-normal otherwise (typically)
Large-Sample Test of Null hypothesis: The population means differ by 0 (which is typically 0): Alternative Hypotheses: –1-Sided: –2-Sided: Test Statistic:
Large-Sample Test of Decision Rule: –1-sided alternative If z obs z ==> Conclude If z obs Do not reject –2-sided alternative If z obs z ==> Conclude If z obs -z ==> Conclude If -z Do not reject
Large-Sample Test of Observed Significance Level (P-Value) –1-sided alternative P=P(z z obs ) (From the std. Normal distribution) –2-sided alternative P=2P( z |z obs | ) (From the std. Normal distribution) If P-Value then reject the null hypothesis
Large-Sample (1- 100% Confidence Interval for Confidence Coefficient (1- ) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples Rule:
Large-Sample (1- 100% Confidence Interval for For 95% Confidence Intervals, z.025 =1.96 Confidence Intervals and 2-sided tests give identical conclusions at same -level: –If entire interval is above 0, conclude –If entire interval is below 0, conclude –If interval contains 0, do not reject =
Example: Vitamin C for Common Cold Outcome: Number of Colds During Study Period for Each Student Group 1: Given Placebo Group 2: Given Ascorbic Acid (Vitamin C) Source: Pauling (1971)
2-Sided Test to Compare Groups H 0 : 1 2 No difference in trt effects) H A : 1 2 ≠ Difference in trt effects) Test Statistic: Decision Rule ( =0.05) –Conclude > 0 since z obs = 25.3 > z.025 = 1.96
95% Confidence Interval for Point Estimate: Estimated Std. Error: Critical Value: z.025 = % CI: 0.30 ± 1.96(0.0119) 0.30 ± (0.277, 0.323) Entire interval > 0
Small-Sample Test for Normal Populations (P. 538) Case 1: Common Variances ( 1 2 = 2 2 = 2 ) Null Hypothesis : Alternative Hypotheses : –1-Sided: –2-Sided : Test Statistic: (where S p 2 is a “pooled” estimate of 2 )
Small-Sample Test for Normal Populations Decision Rule: (Based on t-distribution with =n 1 +n 2 -2 df) –1-sided alternative If t obs t , ==> Conclude If t obs Do not reject –2-sided alternative If t obs t , ==> Conclude If t obs -t ==> Conclude If -t Do not reject
Small-Sample Test for Normal Populations Observed Significance Level (P-Value) Special Tables Needed, Printed by Statistical Software Packages –1-sided alternative P=P(t t obs ) (From the t distribution) –2-sided alternative P=2P( t |t obs | ) (From the t distribution) If P-Value then reject the null hypothesis
Small-Sample (1- 100% Confidence Interval for Normal Populations Confidence Coefficient (1- ) refers to the proportion of times this rule would provide an interval that contains the true parameter value if it were applied over all possible samples Rule: Interpretations same as for large-sample CI’s
Small-Sample Inference for Normal Populations (P.529) Case 2: 1 2 2 2 Don’t pool variances: Use “adjusted” degrees of freedom (Satterthwaites’ Approximation) :
Example - Maze Learning (Adults/Children) Groups: Adults (n 1 =14) / Children (n 2 =10) Outcome: Average # of Errors in Maze Learning Task Raw Data on next slide Conduct a 2-sided test of whether mean scores differ Construct a 95% Confidence Interval for true difference Source: Gould and Perrin (1916)
Example - Maze Learning (Adults/Children)
Example - Maze Learning Case 1 - Equal Variances H 0 : H A : 0 ( = 0.05) No significant difference between 2 age groups
Example - Maze Learning Case 2 - Unequal Variances H 0 : H A : 0 ( = 0.05) No significant difference between 2 age groups
SPSS Output
Small Sample Test to Compare Two Medians - Nonnormal Populations Two Independent Samples (Parallel Groups) Procedure (Wilcoxon Rank-Sum Test): –Rank measurements across samples from smallest (1) to largest (n 1 +n 2 ). Ties take average ranks. –Obtain the rank sum for each group (W 1,W 2 ) –1-sided tests:Conclude H A : M 1 > M 2 if W 2 W 0 –2-sided tests:Conclude H A : M 1 M 2 if min(W 1, W 2 ) W 0 –Values of W 0 are given in many texts for various sample sizes and significance levels. P-values are printed by statistical software packages.
Normal Approximation (Supp PP5-7) Under the null hypothesis of no difference in the two groups (let W=W 1 from last slide): A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution
Example - Maze Learning
As with the t-test, no evidence of population group differences
Computer Output - SPSS
Inference Based on Paired Samples (Crossover Designs) Setting: Each treatment is applied to each subject or pair (preferably in random order) Data: d i is the difference in scores (Trt 1 -Trt 2 ) for subject (pair) i Parameter: D - Population mean difference Sample Statistics:
Test Concerning D Null Hypothesis : H 0 : D = 0 (almost always 0) Alternative Hypotheses : –1-Sided: H A : D > 0 –2-Sided : H A : D 0 Test Statistic:
Test Concerning D Decision Rule: (Based on t-distribution with =n-1 df) 1-sided alternative If t obs t , ==> Conclude D If t obs Do not reject D 2-sided alternative If t obs t , ==> Conclude D If t obs -t ==> Conclude D If -t Do not reject D Confidence Interval for D
Example Antiperspirant Formulations Subjects - 20 Volunteers’ armpits Treatments - Dry Powder vs Powder-in-Oil Measurements - Average Rating by Judges –Higher scores imply more disagreeable odor Summary Statistics (Raw Data on next slide): Source: E. Jungermann (1974)
Example Antiperspirant Formulations
Evidence that scores are higher (more unpleasant) for the dry powder (formulation 1)
Small-Sample Test For Nonnormal Data Paired Samples (Crossover Design) Procedure (Wilcoxon Signed-Rank Test) –Compute Differences d i (as in the paired t-test) and obtain their absolute values (ignoring 0 s ) –Rank the observations by |d i | (smallest=1), averaging ranks for ties –Compute W + and W -, the rank sums for the positive and negative differences, respectively –1-sided tests:Conclude H A : M 1 > M 2 if W - T 0 –2-sided tests:Conclude H A : M 1 M 2 if min(W +, W - ) T 0 –Values of T 0 are given in many texts for various sample sizes and significance levels. P-values printed by statistical software packages.
Normal Approximation (Supp PP18-21) Under the null hypothesis of no difference in the two groups : A z-statistic can be computed and P-value (approximate) can be obtained from Z-distribution
Example - Caffeine and Endurance Step 1: Take absolute values of differences (eliminating 0s) Step 2: Rank the absolute differences (averaging ranks for ties) Step 3: Sum Ranks for positive and negative true differences Subjects: 9 well-trained cyclists Treatments: 13mg Caffeine (Condition 1) vs 5mg (Condition 2) Measurements: Minutes Until Exhaustion This is subset of larger study (we’ll see later) Source: Pasman, et al (1995)
Example - Caffeine and Endurance Original Data
Example - Caffeine and Endurance Absolute Differences Ranked Absolute Differences W + = =28 W - = 3+5+9=17
Example - Caffeine and Endurance Under the null hypothesis of no difference in the two groups: There is no evidence that endurance times differ for the 2 doses (we will see later that both are higher than no dose)
SPSS Output Note that SPSS is taking MG5-MG13, while we used MG13-MG5
Data Sources Pauling, L. (1971). “The Significance of the Evidence about Ascorbic Acid and the Common Cold,” Proceedings of the National Academies of Sciences of the United States of America, 11: Gould, M.C. and F.A.C. Perrin (1916). “A Comparison of the Factors Involved in the Maze Learning of Human Adults and Children,” Journal of Experimental Psychology, 1:122-??? Jungermann, E. (1974). “Antiperspirants: New Trends in Formulation and Testing Technology,” Journal of the Society of Cosmetic Chemists 25: Pasman, W.J., M.A. van Baak, A.E. Jeukendrup, and A. de Haan (1995). “The Effect of Different Dosages of Caffeine on Endurance Performance Time,” International Journal of Sports Medicine, 16: