Breaking Statistical Rules: How bad is it really? Presented by Sio F. Kong Joint work with: Janet Locke, Samson Amede Advisor: Dr. C. K. Chauhan
Background Make inference about populations based on information from random samples. The process is called Hypothesis Testing. Being Used in many areas such as Biology, Psychology, Business, etc.
Examples Mean heart rates: – white newborns vs. African American newborns. Mean daily intake of saturated fat: – Among a vegetarian population vs. 15 grams. Mean SAT score: – In a particular county vs. the national average.
Notations Population means: μ 1, μ 2 – (unknown most of the time) Sample means: Population standard deviations: σ 1, σ 2 – (unknown most of the time) Sample standard deviation: S 1, S 2 Pool standard deviation: S p Sample size: n 1, n 2
2-Samples Hypotheses Testing Example: Null Hypothesis:H 0 μ 1 -µ 2 = 0 (two means are equal) Research Hypothesis:H 1 μ 1 -µ 2 ≠ 0 (two means are not equal) is significantly away from reject Null Hypothesis That is, two means are NOT equal. The corresponding function has a t-distribution.
Important This test statistics has a t-distribution under certain conditions: – Samples are drawn randomly. – If samples are small, populations need to be normally distributed. – The two populations have equal variances, σ 1 = σ 2.
Objective To investigate the effect of the violation of equal variances on the testing procedure. Our textbook suggests that the effect of the violation is minimum when sample sizes are equal.
Measurement for a GOOD test Two types of errors: – Type 1 error – rejecting the true null hypothesis – Type 2 error – failing to reject a false hypothesis – = the probabilities of type 1 error is selected in advance, usually 5%. – Power = 1- Pr( type 2 errors ) can be calculated under various alternatives. A test is good if the power is high under various alternatives while stays the same level as selected.
In this research… 1000 tests are generated by simulations in each situation Simulation studies are done to calculate: – α: Probability of rejecting the true hypothesis – Power: Probability of rejecting the false hypothesis Based on various alternatives when equal variances assumption is violated.
Effect when σ 1 ≠ σ 2 : Pop1Pop2Pop1Pop2 Mean µ 1 = 10µ 2 = 10µ 1 = 10µ 2 = 14 Sample Size n 1 =10n 2 =10n 1 =10n 2 =10 αpower 1 =2, 2 =3 4.4%89.8% 1 =2, 2 =4 5.4%75.0% 1 =2, 2 =5 6.0%60.1% 1 =2, 2 =10 8.0%24.2%
Reject if
Condition not violated: σ 1 = σ 2 In this example: σ 1 = σ 2 = 2 n 1 = n 2 =10 n 1 ≠ n 2 α powern 1, n 2 α power 5.2% 98.5%12, 8 5.2% 98.9% 13, 7 5.0% 98.6% 14, 6 5.0% 97.7% Conclusion: When σ 1 = σ 2, it does not matter if n 1 = n 2 since it is not a requirement. Condition violated: σ 1 ≠ σ 2 In this example: σ 1 = 2 and σ 2 = 5 n 1 = n 2 = 10 n 1 ≠ n 2 α power n 1, n 2 α power 6.5% 60.1%12, 8 9.6% 65.2% 13, % 66.6% 14, % 64.2% Conclusion: When σ 1 ≠ σ 2, if n 1 ≠ n 2, effect on alpha is even more significant. Result
Conclusion If the difference between σ 1 and σ 2 get larger, α goes up and power goes up. Other interesting observations: – If smaller sample has larger standard deviation, α goes up. – If larger sample has larger standard deviation, α goes down.
Note This conclusion is only based on what this simulation study has shown. By selecting different parameters and choosing different alternatives, the result may be different.
Thank You!