Introduction to Hypothesis Testing Chapter 8
Applying what we know: inferential statistics z-scores + probability distribution of sample means HYPOTHESIS TESTING! HYPOTHESIS TESTING!
Some Familiar Concepts… Sampling error: There is always some diff. btwn. samples and populations, even when sample is untreated (control) Sampling error: There is always some diff. btwn. samples and populations, even when sample is untreated (control) M ≠μ just by chance. M ≠μ just by chance. So… how can we tell if a difference we observe is due to: So… how can we tell if a difference we observe is due to: –chance (random sampling error or fluctuation) or –treatment effect or true group differences (differences do exist in the population) ?
…and Some New Concepts H 1 : Alternate hypothesis H 1 : Alternate hypothesis –What we believe to be true –There is a change, difference, relationship But it’s easier to disprove than to prove, so… But it’s easier to disprove than to prove, so… H 0 : Null hypothesis H 0 : Null hypothesis –No change, no difference, no relationship –Try to prove this is wrong! –Disproving H 0 provides support for (but does not prove) H 1. Decide ahead of time which sample statistics (means) are: Decide ahead of time which sample statistics (means) are: –likely to be obtained if H 0 is true –likely to be obtained if H 0 not true (critical region!)
Figure 8-3 (p. 236): The set of potential samples is divided into those that are likely to be obtained and those that are very unlikely if the null hypothesis is true. What is this called? What is this value called? THE HYPOTHESIZED (NULL) DISTRIBUTION
Sampling Distribution Z-scores in a new light z = M – μ σ M z = M – μ σ M z = obtained M – hypothesized μ. standard error between M and μ z = obtained M – hypothesized μ. standard error between M and μ Ratio of: obtained difference (distance) typical, expected, standard distance Ratio of: obtained difference (distance) typical, expected, standard distance How far away from typical, or expected, is our sample? How far away from typical, or expected, is our sample?
Hypotheses A hypothesis states an expected relationship between two or more variables. A hypothesis states an expected relationship between two or more variables. May be causal: one variable causes the other. May be causal: one variable causes the other. May be descriptive: one variable is simply related to the other. May be descriptive: one variable is simply related to the other. Much of this chapter focuses on causal hypotheses, from experimental studies (treatment group and control group) Much of this chapter focuses on causal hypotheses, from experimental studies (treatment group and control group) AB AB
Where do Hypotheses come from? Personal observations, opinions Personal observations, opinions Existing research Existing research Theory Theory Models Models –more specific and concrete than theories –usually describe specific relationships among constructs/variables
Scientific Hypotheses Must Be Testable: Can a test be designed? Testable: Can a test be designed? Falsifiable: Could it potentially be incorrect? Room to be disproven? Falsifiable: Could it potentially be incorrect? Room to be disproven? Precise: Is it clearly defined? Precise: Is it clearly defined? Rational: Does it fit with existing facts? Rational: Does it fit with existing facts? Parsimonious: Is it as simple as possible? Parsimonious: Is it as simple as possible?
Hypotheses cannot be proven! A single experiment cannot PROVE a hypothesis A single experiment cannot PROVE a hypothesis Hypotheses are only supported or not supported by scientific data. Hypotheses are only supported or not supported by scientific data. We add evidence toward confirmation or disconfirmation of a hypothesis We add evidence toward confirmation or disconfirmation of a hypothesis
A Hypothesis Test A Jury Trial The null hypothesis: The alpha level: The sample data: The critical region: The conclusion:
A Hypothesis Test A Jury Trial The null hypothesis: We assume there is no treatment (tx) effect until there is enough evidence to show otherwise. Assume an individual is innocent until proven guilty. The alpha level: We are confident that the tx does have an effect because it is very unlikely that the data could occur simply by chance. Jury must be convinced beyond a reasonable doubt before finding defendant guilty. The sample data: The research study is conducted to gather data (evidence) to demonstrate that the treatment had an effect. Prosecutor presents evidence to demonstrate defendant guilty. The critical region: Either the sample data fall in the critical region (enough evidence to reject H 0 ) or the data don’t fall into critical region (not enough evidence to reject H 0 ). Either there is enough evidence to convince jury that defendant is guilty, or there is not. The conclusion: If the data aren’t in the critical region, the decision is to “fail to reject the null hypothesis.” We have not proven that the null is true; we simply have failed to reject it. If there is not enough evidence, the decision is “not guilty”.
Directional vs. Nondirectional Tests (one-tailed) (two-tailed). Nondirectional hypothesis/test Nondirectional hypothesis/test –Critical region is split between both tails: on either side of the mean –Allows possibility that tx effect in either direction –More common, more conservative test Directional hypothesis/test Directional hypothesis/test –H 1 specifies direction of the effect / difference –Critical region is only in one tail (either above or below mean) –Less conservative
Error Type I: H o true (treatment does not have an effect), but: Type I: H o true (treatment does not have an effect), but: –Hypothesis test detects a false treatment effect –Reject H o even though it’s true –Think have support for H 1 even though it’s not true Type II: H o false (treatment does have an effect), but: Type II: H o false (treatment does have an effect), but: –Hypothesis test failed to detect it –Retain H o even though it’s false
Type I and Type II Error ACTUALSITUATION Decision No Effect / H o true Effect Exists / H o false Reject H o (decide effect does exist) Type I Error False positive (probability = ) test too sensitive: detect nonexistent effect True positive (effect exists = correct!) Ability to detect effect=POWER p(reject false H o ) = 1- good sensitivity to detect effect Retain H o (decide no effect exists) True negative (no effect=correct!) good specificity, selectivity to catch a non-effect Type II Error False negative (probability = ) test too specific: fail to detect true effect
Power Probability that a test will correctly: Probability that a test will correctly: –reject a false null hypothesis –detect a real treatment effect in other words: Sensitivity of a statistical test to detect an effect that does exist Sensitivity of a statistical test to detect an effect that does exist
Group Activity! Make a graphical representation of these concepts: Make a graphical representation of these concepts: –Type I error (false positive) –Type II error (false negative) –True positive / negative –Alpha, Beta, Power –Sensitivity, specificity Some ideas: Some ideas: –Draw a concept map, decision tree, flow chart –Sketch all possibilities using the null distribution, the alternative distribution (see pp ) –Use sample data / a sample hypothesis (H o and H 1 ) –Use an analogy (like the trial by jury analogy)
Beyond p and chance: Effect Sizes Limitations of hypothesis tests: Limitations of hypothesis tests: –give ratio of obtained to expected difference –evaluate relative size of obtained difference (or tx effect) –Strongly influenced by sample size (big enough n small σ M easy to reject H o !) sample sizesample size Effect sizes: Effect sizes: –Give the absolute size of the obtained difference (or tx effect) –Scaled with std deviation, not std error –Thus, not influenced by sample size Cohen’s d Cohen’s d
Figure 8-15 (p. 268) The relationship between sample size and power. The top figure (a) shows a null distribution and a 20-point treatment distribution based on samples of n = 16 and a standard error of 10 points. Notice that the right-hand critical boundary is located in the middle of the treatment distribution so that roughly 50% of the treated samples fall in the critical region. In the bottom figure (b) the distributions are based on samples of n = 100 and the standard error is reduced to 4 points. In this case, essentially all of the treated samples fall in the critical region and the hypothesis test has power of nearly 100%.
For Wednesday Finish reading Chapter 9 Finish reading Chapter 9 Finish HW Chapter 9 (turn in start of class) Finish HW Chapter 9 (turn in start of class)