Hypothesis Testing: One Sample Cases
Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z test) – One- vs. Two- tailed tests
Significant Differences Hypothesis testing is designed to detect significant differences: differences that did not occur by random chance. In the “one sample” case: we compare a random sample (from a large group) to a population. We compare a sample statistic to a population parameter to see if there is a significant difference.
Our Problem: The education department at a university has been accused of “grade inflation” so education graduates have much higher scores than students in general. Scores of all education graduates should be compared with the scores of all students. – There are 1000s of education graduates, far too many to interview. – How can this be investigated without interviewing all education graduates?
What we know: The mean scores for all students is 2.70 and the s.d = 0.7. These values are parameters. The population value To the right is the statistical information for a random sample of education graduates: = 2.70 = 0.70 Sample mean =3.00 Sample size n =117
Questions to ask: Is there a difference between the parameter (2.70) and the statistic (3.00) sample mean? Could the observed difference have been caused by random chance? Is the difference real (significant)?
Two Possibilities: 1.The sample mean (3.00) is the same as the pop. mean (2.70). – The difference is trivial and caused by random chance. 2.The difference is real (significant). – Education graduates are different from all students.
The Null and Alternative Hypotheses: 1.Null Hypothesis (H 0 ) – The difference is caused by random chance. – The H 0 always states there is “no significant difference.” In this case, we mean that there is no significant difference between the population mean and the sample mean. 2.Alternative hypothesis (H 1 ) – “The difference is real”. – (H 1 ) always contradicts the H 0. One (and only one) of these explanations must be true. Which one?
Test the Explanations We always test the Null Hypothesis. Assuming that the H 0 is true: – What is the probability of getting the sample mean (3.00) if the H 0 is true and all education graduates really have a mean of 2.70? – In other words, the difference between the means is due to random chance. – If the probability associated with this difference is less than 5%, reject the null hypothesis.
Test the Hypotheses Use the 5% value as a guideline to identify differences that would be rare or extremely unlikely if H 0 is true. This “alpha” value defines the “region of rejection.” Use the Z score formula for single samples and If the probability is less than 5%, the calculated or “observed” Z score will be beyond ±1.96 (the “critical” Z score).
Two-tailed Hypothesis Test: When α = 5%, then 2.5% of the area is distributed on either side of the curve in area (C ) The 95% in the middle section represents no significant difference between the population and the sample mean. The cut-off between the middle section and +/- 2.5% is represented by a Z-value of +/
Testing Hypotheses: Using The Five Step Model… 1.Make Assumptions and meet test requirements. 2.State the null hypothesis. 3.Select the sampling distribution and establish the critical region. 4.Compute the test statistic. 5.Make a decision and interpret results.
Step 1: Make Assumptions and Meet Test Requirements Random sampling – Hypothesis testing assumes samples were selected using random sampling. – In this case, the sample of 117 cases was randomly selected from all education graduates. Sampling Distribution is normal in shape – This is a “large” sample (n≥100).
Step 2 State the Null Hypothesis H 0 : μ = 2.7 You can also state H o : No difference between the sample mean and the population parameter – (In other words, the sample mean of 3.0 is really the same as the population mean of 2.7 – the difference is not real but is due to chance.) – The sample of 117 comes from a population that has a score of 2.7. – The difference between 2.7 and 3.0 is trivial and caused by random chance.
Step 2 (cont.) State the Alternate Hypothesis H 1 : μ≠2.7 Or H 1 : There is a difference between the sample mean and the population parameter – The sample of 117 comes from a population that does not have a score of 2.7. In reality, it comes from a different population. – The difference between 2.7 and 3.0 reflects an actual difference between education graduates and other students. – Note that we are testing whether the population the sample comes from is from a different population or is the same as the general student population.
Step 3 Select Sampling Distribution and Establish the Critical Region Sampling Distribution= Z – Alpha (α) = 5% – α is the indicator of “rare” events. – Any difference with a probability less than α is rare and will cause us to reject the H 0.
Step 3 (cont.) Select Sampling Distribution and Establish the Critical Region Critical Region begins at Z= ± 1.96 – This is the critical Z score associated with α = 5%, two-tailed test. – If the obtained Z score falls in the Critical Region, or “the region of rejection,” then we would reject the H 0.
Step 4: Use Formula to Compute the Test Statistic Z for large samples (≥ 100)
Test the Hypotheses Substituting the values into the formula, we calculate a Z score of
Step 5 Make a Decision and Interpret Results The obtained Z score falls in the Critical Region, so we reject the H 0. The Z value >1.96 – If the H 0 were true, a sample outcome of 3.00 would be unlikely. – Therefore, the H 0 is false and must be rejected. Education graduates have a score that is significantly different from the general student body (Z = 4.62, α = 5%).* *Note: Always report significant statistics.
Looking at the curve: (Area C = Critical Region when α=.05)
Summary: The score of education graduates is significantly different from the score of the general student body. In hypothesis testing, we try to identify statistically significant differences that did not occur by random chance. In this example, the difference between the parameter 2.70 and the statistic 3.00 was large and unlikely (p < 5%) to have occurred by random chance.
Summary (cont.) We rejected the H 0 and concluded that the difference was significant. It is very likely that Education graduates have scores higher than the general student body
Rule of Thumb: If the test statistic is in the Critical Region ( α = 5%, beyond ±1.96): – Reject the H 0. The difference is significant. If the test statistic is not in the Critical Region (at α = 5%, between and -1.96): – Fail to reject the H 0. The difference is not significant.
Two-tailed vs. One-tailed Tests In a two-tailed test, the direction of the difference is not predicted. A two-tailed test splits the critical region equally on both sides of the curve. In a one-tailed test, the researcher predicts the direction (i.e. greater or less than) of the difference. All of the critical region is placed on the side of the curve in the direction of the prediction.
Assumptions – Population Is Normally Distributed – If Not Normal, use large samples – Null Hypothesis Has =, , or Sign Only Z Test Statistic: One-Tail Z Test for Mean ( Known)
Z Reject H 0 Z Reject H 0 H 0 : H 1 : < H 0 : 0 H 1 : > Must Be Significantly Below = Small values don’t contradict H 0 Don’t Reject H 0 ! Rejection Region
Does an average box of cereal contain 368 grams of cereal? A random sample of 25 boxes showed X = The company has specified to be 15 grams. Test at the 5% level. 368 gm. Example: One Tail Test H 0 : 368 H 1 : > 368 _
Z What Is Z given = 5%? = 5% Finding Critical Values: One Tail Critical Value = InvNormal(.95) = Use % points table below normal table in formula book
= 5% n = 25 Critical Value: Test Statistic: Decision: Conclusion: Do Not Reject H 0 at = 5% No Evidence True Mean Is More than 368 Z Reject Example Solution: One Tail H 0 : 368 H 1 : > <critical value of
Z p Value 6.68% From Z Table: Lookup InvNorm (1.50) 93.32% Use the alternative hypothesis to find the direction of the test. P(Z 1.50) = 6.68% p Value Solution
Z Reject (p Value = 6.68%) ( = 5%). Do Not Reject. p Value = 6.68% = 5% Test Statistic Is In the Do Not Reject Region p Value Solution
Does an average box of cereal contains 368 grams of cereal? A random sample of 25 boxes showed X = The company has specified to be 15 grams. Test at the 5% level. 368 gm. Example: Two Tail Test H 0 : 368 H 1 : 368
= 5% n = 25 Critical Value: ±1.96 Test Statistic: Decision: Conclusion: Do Not Reject Ho at = 5% No Evidence that True Mean Is Not 368 Example Solution: Two Tail Z % Reject % H 0 : 386 H 1 : 386
The Curve for Two- vs. One-tailed Tests at α =.05: Two-tailed test: “is there a significant difference?” One-tailed tests: “is the sample mean greater than µ?” “is the sample mean less than µ?”
11.36 Interpreting the p-value… The smaller the p-value, the more statistical evidence exists to support the alternative hypothesis. If the p-value is less than 1%, there is overwhelming evidence that supports the alternative hypothesis. If the p-value is between 1% and 5%, there is a strong evidence that supports the alternative hypothesis. If the p-value is between 5% and 10% there is a weak evidence that supports the alternative hypothesis. If the p-value exceeds 10%, there is no evidence that supports the alternative hypothesis.
Type I error Occurs when an experimenter thinks she/he has a significant result, but it is really due to chance Analogous to a “false positive” on a drug test. Risk of a Type I error is the same as the significance level, e.g., p < 5% Solutions: use a more stringent significance level, use replication
Type II error Occurs when a researcher fails to find a significant result when, in fact, there was something significant going on. Analogous to a “false negative” on a drug test. Must be calculated with a test of statistical “power,” e.g., given the sample size, how big would an effect have to be in order to detect it? Solutions: increase sample size, use more sensitive precise measures, use replication
Type I and Type II errors
Implications To some extant, Type I and Type II errors trade off with one another – Decreasing the chance of a Type I error may increase the chance of a Type II error. A Type I error is the more shocking of the two – Type I entails shouting “Eureka” when you haven’t really found it. – Scientific skepticism makes Type II errors more palatable