Lecture 2: Some general comments on statistical tests and statistical inference Elements of a statistical test Statistical null hypotheses The meaning of p Inference: How to translate p into a conclusion Statistical errors in hypothesis testing Power One versus two-tailed tests Problems with statistical hypothesis testing Bio 4118 Applied Biostatistics 2001
Elements of a statistical test Null hypothesis (H0) Observations (data) Test statistic Assumptions Bio 4118 Applied Biostatistics 2001
Statistical null hypotheses The “default” to which you compare your data Usually, one sets up the analysis such that if you reject the null hypothesis, you have a pattern which is consistent with the biological prediction… …so that in many cases, the null hypothesis specifies a lack of pattern. Bio 4118 Applied Biostatistics 2001
Biological questions versus statistical null hypotheses Do males and females differ in size? Average size of males and females are equal. Does age structure differ between two fish populations? Age (frequency) distribution is independent of population. Bio 4118 Applied Biostatistics 2001
Testing biological hypotheses Biological question Pose biological question (Do drugs A and B differ in their efficacy in treating breast cancer?) Generate biological hypothesis Generate biological prediction Induction Biological hypothesis Deduction Survival A B Bio 4118 Applied Biostatistics 2001
Observations Assume that the experimental data (observations) are collected in a proper manner. All observations are subject to a certain amount of measurement error! Bio 4118 Applied Biostatistics 2001
Statistical analysis as model building All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. “Model fitting” is then the process by which model parameters are estimated. X Linear regression ANOVA e42 m2 a2 Y m Group 1 Group 2 Group 3 Bio 4118 Applied Biostatistics 2001
Parameters, statistics and estimators Population parameters characterize populations (which in general cannot be completely enumerated) statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) Sample The process by which one obtains an estimate of a population parameter from a finite sample is called an estimation procedure. Bio 4118 Applied Biostatistics 2001
The meaning of p Informal: the probability that the null hypothesis is true Strictly correct: the probability of observing data as deviant (from the expected results) as the observed results if in fact the null hypothesis were true, assuming the data were properly collected, and all statistical assumptions are met. Bio 4118 Applied Biostatistics 2001
To reject or not reject? The decision to reject or accept the null hypothesis is based on p. This requires some agreement (convention) as to what p value we will consider as significant. This threshold value is arbitrary! Bio 4118 Applied Biostatistics 2001
Test statistics In standard statistical analysis, p is estimated by reference to the distribution of an appropriate test statistic. If we know the distribution of the test statistic, we can calculate the probability of getting a test statistic value at least as large (small) as the calculated value if H0 were true, i.e., p. Bio 4118 Applied Biostatistics 2001
An example Two samples (1, 2) with mean values that differ by some amount d. What is the probability p of observing this difference under H0 that the two means are in fact equal? Sample 2 Sample 1 Frequency Bio 4118 Applied Biostatistics 2001
An example (cont’d) Frequency Sample 2 Sample 1 If H0 is true, the expected distribution of the test statistic t is: Probability (p) t -3 -2 -1 1 2 3 Bio 4118 Applied Biostatistics 2001
An example (cont’d) For the two populations, suppose t = 2.01 Frequency Sample 2 Sample 1 For the two populations, suppose t = 2.01 What is the probability of getting a value at least this large under H0 that the two means are in fact equal? Since p is small, it is unlikely that H0 is true. Therefore, reject H0. -3 -2 -1 1 2 3 Probability t = 2.01 Bio 4118 Applied Biostatistics 2001
Inference: How to translate p into a conclusion? If p < 0.05, reject the null hypothesis... ...but keep p in mind! Report p, not just whether it is “significant” (or not). Remember, the p < 0.05 “convention” is entirely arbitrary! Bio 4118 Applied Biostatistics 2001
“Statistical significance” and real-world decision-making: an example If you were offered the same odds on each horse, on which would you bet? If you were a bookie, would you offer the same odds on each horse? And if you did, would you still be in business? Clyde’s Fancy Hypattia Bio 4118 Applied Biostatistics 2001
Statistical errors in hypothesis testing Two types: a true null hypothesis may be rejected, or a false null hypothesis may be accepted Type I error (a): the probability of rejecting a true null hypothesis Type II error (b) : the probability of accepting a false null hypothesis Bio 4118 Applied Biostatistics 2001
Errors in inference Reality Conclusion H0 is true H0 is false Accept H0 no error Reject H0 no error Bio 4118 Applied Biostatistics 2001
Errors in inference: an example Reality Conclusion No HIV HIV Seronegative 99% 5% Seropositive 1% 95% H0 HA Bio 4118 Applied Biostatistics 2001
One- and two-tailed null hypotheses -3 -2 -1 1 2 3 Probability a/2 1- a a/2 For 2-tailed H0, there are two rejection regions of size a/2. For 1-tailed H0 there is one rejection region of size a. -3 -2 -1 1 2 3 -3 -2 -1 1 2 3 Probability 1- a a a 1- a t Bio 4118 Applied Biostatistics 2001
Example: 2-tailed H0 No difference in populations H0: m1 = m2 Frequency Sample 2 Sample 1 No difference in populations H0: m1 = m2 Since H0 is 2- tailed, would reject H0 if m1 - m2 > 0 or m1 - m2 < 0. -3 -2 -1 1 2 3 Probability Bio 4118 Applied Biostatistics 2001
Example: 1-tailed H0 Frequency Sample 2 Sample 1 The average size of individuals in population 1 is greater than population 2 H0: m1 - m2 0 Since H0 is 1- tailed, would reject H0 if m1 - m2 > 0 only. -3 -2 -1 1 2 3 Probability Bio 4118 Applied Biostatistics 2001
One versus two-tailed hypotheses Sample 2 Frequency Sample 1 2-tailed hypothesis: reject if any non-random pattern is detected. 1-tailed hypothesis: reject if a specified directional non-random pattern is detected H0: m1 = m2 (2-tailed, reject) H0: m1 m2 (1-tailed, accept) Bio 4118 Applied Biostatistics 2001
Important note! -3 -2 -1 1 2 3 Probability For given “directionality”, 1- tailed test is more powerful than 2-tailed Therefore, always specify the nature of H0 before your analysis! a a/2 Probability 2 3 Bio 4118 Applied Biostatistics 2001
Parameters of statistical inference Type I error rate (a) Power (1 - Type II error rate = 1 - b) Sample size (N) Effect size (d) Each of the above is a function of the other three. Hence, if three are known, so is the fourth. Bio 4118 Applied Biostatistics 2001
Power Power is the probability of rejecting the null hypothesis when it is false and a specified alternate null hypothesis is true, i.e. 1- b. Power can only be calculated when a specific alternate null hypothesis is specified. Therefore, power depends on the alternate null hypothesis. Powerful tests can detect small differences, weak tests only large differences. Bio 4118 Applied Biostatistics 2001
Calculating power: an example Expected distribution of means of samples of 5 housefly wing lengths from normal populations specified by m as shown above curves and sY = 1.74. Centre curve represents null hypothesis, H0: m = 45.5, curves at sides represent alternative hypotheses, m = 37 or m = 54. Vertical lines delimit 5% rejection regions for the null hypothesis. H1 : m = 37 H0 : m = 45.5 H1 : m = 54 35 40 45 50 55 Bio 4118 Applied Biostatistics 2001
Power: cont’d 40 45 50 55 60 H0: m = m0 H1: m = m1 m1=54 m1=53 m1=50 m1=48.5 b=0.0096 b=0.0018 m0=45.5 b=0.2676 b=0.5948 Increases in type II error, b, as alternative hypothesis, H1, approaches null hypothesis, H0 -- that is, m1 approaches m . Shading represents b. Vertical lines mark off 5% critical regions (2.5% in each tail) for the null hypothesis. To simplify the graph, the alternative distributions are shown for one tail only. Bio 4118 Applied Biostatistics 2001
Effect size Every null hypothesis in any statistical test implies a value for some population parameter. E.g. if two sample means are equal, the absolute value of the difference d between the two populations is zero: Sample 2 Sample 1 Frequency X Bio 4118 Applied Biostatistics 2001
Effect size (cont’d) Sample 2 Sample 1 Frequency More generally, since H0 specifies a lack of some phenomenon, d quantifies the degree to which the phenomenon is present. So if H0 is false, it is false to some specific degree, quantified by d, the effect size. X Bio 4118 Applied Biostatistics 2001
Types of power analysis I: power as a function of a, d and N Often done after a statistical test, where N (sample size) and effect size (d) are determined and the null hypothesis has been accepted. Then, for specified a, we can calculate 1- b (the power of the test) If 1- b is low, then the Type II error rate is large, so there is a good chance we have accepted a false H0. Sample 2 Sample 1 Frequency X Bio 4118 Applied Biostatistics 2001
Types of power analysis II: N as a function of a, d and power A certain effect size (d) is anticipated (perhaps based on a preliminary sample) with a desired a and 1- b. Given a, b and d, we can calculate the minimum sample size Nmin required to achieve the desired specifications. This exercise can be very useful in planning experiments. Pre-sample 2 Pre-sample 1 Frequency X Bio 4118 Applied Biostatistics 2001
Types of power analysis III: d as a function of a, N and power Given a desired a, 1- b and N, what is the minimal detectable effect size dmin? If dmin is large, then only large deviations from H0 will be detected (i.e. will result in rejection of H0). Thus, we should be VERY VERY careful NOT to infer that some phenomenon does not exist if we accept H0. Sample 2 Sample 1 Frequency X Bio 4118 Applied Biostatistics 2001
Power: dependence on sample size 35 40 45 50 55 1.0 0.5 a Wing length (x 0.1 mm) Power (1-b) n = 5 n = 35 Power curves for testing H0: m = 45.5. H1: m 45.5 for n = 5 and for n = 35. For given observed wing length, the probability of rejecting a false null hypothesis decreases as N decreases. Bio 4118 Applied Biostatistics 2001
Why power matters N = 200 Frequency Two samples, identical means and variances, but differ in N in first case, power is large, p < .05, therefore reject H0 in second case, power is low, p > .05, therefore accept H0. N = 30 Frequency m1 m2 Size Bio 4118 Applied Biostatistics 2001
Power: conclusions If sample sizes are small, the power of any test is usually low. So, unless one knows the power of the analysis, a decision to accept the null hypothesis is meaningless! Conversely, if power is very high, rejection of the null is very likely, even if deviations from null expectations are small (and perhaps biologically meaningless)! Bio 4118 Applied Biostatistics 2001
Statistical hypothesis testing: problems and caveats Problem 1: many H0s are very unlikely to be true a priori… …so that their rejection is not very informative. Treatment 1 Treatment 2 Control Average yield Treatment Bio 4118 Applied Biostatistics 2001
Statistical hypothesis testing: problems and caveats Problem 2: Nominal type I error (e.g. a = 0.5) is entirely arbitrary, and may not bear any relationship to biological significance… … and even less to decision-making Threshold for decision-making Probabilty -3 -2 -1 1 2 3 t Bio 4118 Applied Biostatistics 2001
Statistical hypothesis testing: problems and caveats Problem 3: p is probability of obtaining a test statistic at least as extreme as that observed if H0 is true… … but often the actual (sampling) distribution of the test statistic does not match the (assumed) distribution under the null. Sampled Probabilty Null -3 -2 -1 1 2 3 t Bio 4118 Applied Biostatistics 2001
Statistical hypothesis testing: problems and caveats Problem 4: for fixed effect size, p depends on sample size (n)… …so that one can almost always reject H0 if the sample is sufficiently large, even if the observed effect is trivial Larger effect size Type I error Smaller effect size 0.05 Sample size (n) Bio 4118 Applied Biostatistics 2001
Statistical hypothesis testing: problems and caveats Problem 5: since p depends on sample size (n)… … using a fixed nominal a (e.g. a = 0.05) as n increases is logically inconsistent: even for n = infinity and true H0, a = 0.05! Fixed a (e.g. 0.05) 0.05 a depends on n Nominal type I error (a) Sample size (n) Bio 4118 Applied Biostatistics 2001
Statistical hypothesis testing: solutions Avoid testing trivial null hypotheses Distinguish between biological (or other) significance and statistical significance Always provide estimates of effect sizes and their precision, statistical significance (or lack thereof) notwithstanding Consider using randomization and/or resampling methods to generate actual distribution of test statistics. Bio 4118 Applied Biostatistics 2001