University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 1 Review and important concepts Biological questions and statistical hypotheses
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 2 Concepts map
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 3 Biological questions VS statistical null hypotheses Statistics can help you answer biological questions However, you must learn to translate biological questions into null hypotheses to be tested Do males and females differ in size? Average size of males and females are equal. Does age structure differ between two fish populations? Age (frequency) distribution is independent of population.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 4 The meaning of p Informal: the probability that the null hypothesis is true Strictly correct: the probability of observing data as deviant (from the expected results) as the observed results if in fact the null hypothesis were true, assuming the data were properly collected, and all statistical assumptions are met.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 5 To reject or not reject? The decision to reject or accept the null hypothesis is based on p. This requires some agreement (convention) as to what p value we will consider as significant. This threshold value is arbitrary!
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 6 Test statistics In standard statistical analysis, p is estimated by reference to the distribution of an appropriate test statistic. If we know the distribution of the test statistic, we can calculate the probability of getting a test statistic value at least as large (small) as the calculated value if H 0 were true, i.e., p.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 7 An example Two samples (1, 2) with mean values that differ by some amount . What is the probability p of observing this difference under H 0 that the two means are in fact equal? Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 8 An example (cont’d) If H 0 is true, and if other assumptions are met (we will get back to this…) the expected distribution of the test statistic t is the Student t distribution Probability (p) t Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 9 An example (cont’d) For the two samples, suppose t = 2.01 What is the probability of getting a value at least this large under H 0 that the two means are in fact equal? Since p is small, it is unlikely that H 0 is true. Therefore, reject H Probability t = 2.01 Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 10 Inference: How to translate p into a conclusion? If p < 0.05, reject the null hypothesis......but keep p in mind! Report p, not just whether it is “significant” (or not). Remember, the p < 0.05 “convention” is entirely arbitrary!
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 11 “Statistical significance” and real-world decision-making: an example If you were offered the same odds on each horse, on which would you bet? If you were a bookie, would you offer the same odds on each horse? And if you did, would you still be in business? Clyde’s Fancy Hypattia
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 12 Statistical errors in hypothesis testing Two types: a true null hypothesis may be rejected, or a false null hypothesis may be accepted Type I error ( ): the probability of rejecting a true null hypothesis Type II error ( ) : the probability of accepting a false null hypothesis
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 13 Errors in inference Reality ConclusionH 0 is trueH 0 is false Accept H 0 Reject H 0 no error
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 14 Errors in inference: an example Reality No HIV HIV Seronegative Seropositive 99% 95% 5% 1%
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 15 One- and two-tailed null hypotheses For 2-tailed H 0, there are two rejection regions of size /2. For 1-tailed H 0 there is one rejection region of size Probability Probability t 1-
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 16 Example: 2-tailed H 0 No difference in populations H 0 : 1 = 2 Since H 0 is 2- tailed, would reject H 0 if 1 - 2 > 0 or 1 - 2 < 0. Frequency Sample 2 Sample Probability
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 17 Example: 1-tailed H 0 The average size of individuals in population 1 is greater than population 2 H 0 : 1 - 2 0 Since H 0 is 1- tailed, would reject H 0 if 1 - 2 > 0 only. Frequency Sample 2 Sample Probability
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 18 One versus two- tailed hypotheses 2-tailed hypothesis: reject if any non- random pattern is detected. 1-tailed hypothesis: reject if a specified directional non- random pattern is detected H 0 : 1 = 2 (2-tailed, reject) H 0 : 1 2 (1-tailed, accept) Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 19 Important note! For given “directionality”, 1- tailed test is more powerful than 2-tailed Therefore, always specify the nature of H 0 before your analysis! 2 3 Probability Probability
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 20 Parameters of statistical inference Type I error rate ( ) Power (1 - Type II error rate = 1 - ) Sample size (N) Effect size ( ) Each of the above is a function of the other three. Hence, if three are known, so is the fourth.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 21 Power Power is the probability of rejecting the null hypothesis when it is false and a specified alternate hypothesis is true, i.e. 1- . Power can only be calculated when a specific alternate hypothesis is specified. Therefore, power depends on the alternate hypothesis. Powerful tests can detect small differences, weak tests only large differences.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 22 Calculating power: an example Expected distribution of means of samples of 5 housefly wing lengths from normal populations specified by as shown above curves and Y = Centre curve represents null hypothesis, H 0 : = 45.5, curves at sides represent alternative hypotheses, = 37 or = 54. Vertical lines delimit 5% rejection regions for the null hypothesis H 1 : = 37H 0 : = 45.5H 1 : = 54
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 23 Power: cont’d H 0 : = 0 H 1 : = 1 1 =54 1 =53 1 =50 1 =48.5 = = 0 =45.5 = = Increases in type II error, , as alternative hypothesis, H 1, approaches null hypothesis, H 0 -- that is, 1 approaches . Shading represents . Vertical lines mark off 5% critical regions (2.5% in each tail) for the null hypothesis. To simplify the graph, the alternative distributions are shown for one tail only.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 24 Effect size Every null hypothesis in any statistical test implies a value for some population parameter. E.g. if two sample means are equal, the absolute value of the difference between the two populations is zero: X Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 25 Effect size (cont’d) More generally, since H 0 specifies a lack of some phenomenon, quantifies the degree to which the phenomenon is present. So if H 0 is false, it is false to some specific degree, quantified by the effect size. X Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 26 Types of power analysis I: power as a function of , and N Often done after a statistical test, where N (sample size) and effect size ( ) are determined and the null hypothesis has been accepted. Then, for specified , we can calculate 1- (the power of the test) If 1- is low, then the Type II error rate is large, so there is a good chance we have accepted a false H 0. X Frequency Sample 2 Sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 27 Types of power analysis II: N as a function of , and power A certain effect size ( ) is anticipated (perhaps based on a preliminary sample) with a desired and 1- . Given , and we can calculate the minimum sample size N min required to achieve the desired specifications. This exercise can be very useful in planning experiments. X Frequency Pre-sample 2 Pre-sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 28 Types of power analysis III: as a function of , N and power Given a desired , 1- and N, what is the minimal detectable effect size min ? If min is large, then only large deviations from H 0 will be detected (i.e. will result in rejection of H 0 ). Thus, we should be VERY VERY careful NOT to infer that some phenomenon does not exist if we accept H 0. X Frequency Pre-sample 2 Pre-sample 1
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 29 Power: dependence on sample size Power curves for testing H 0 : = H 1 : 45.5 for n = 5 and for n = 35. For given observed wing length, the probability of rejecting a false null hypothesis decreases as N decreases. 0 Wing length (x 0.1 mm) Power (1- ) n = 5 n = 35
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 30 Why power matters Two samples, identical means and variances, but differ in N in first case, power is large, p <.05, therefore reject H 0 in second case, power is low, p >.05, therefore accept H 0. Frequency Size Frequency N = 200 N = 30
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 31 Power: conclusions If sample sizes are small, the power of any test is usually low. So, unless one knows the power of the analysis, a decision to accept the null hypothesis is meaningless! Conversely, if power is very high, rejection of the null is very likely, even if deviations from null expectations are small (and perhaps biologically meaningless)!
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 32 Statistical hypothesis testing: problems and caveats Problem 1: many H 0 s are very unlikely to be true a priori… …so that their rejection is not very informative. Treatment Average yield Treatment 1 Treatment 2 Control
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 33 Statistical hypothesis testing: problems and caveats Problem 2: Nominal type I error (e.g. = 0.05) is entirely arbitrary, and may not bear any relationship to biological significance… … and even less to decision-making t Probabilty Threshold for decision-making
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 34 Statistical hypothesis testing: problems and caveats Problem 3: p is probability of obtaining a test statistic at least as extreme as that observed if H 0 is true… … but often the actual (sampling) distribution of the test statistic does not match the (assumed) distribution under the null. t Probabilty Null Sampled
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 35 Statistical hypothesis testing: problems and caveats Problem 4: for fixed effect size, p depends on sample size (n)… …so that one can almost always reject H 0 if the sample is sufficiently large, even if the observed effect is trivial Sample size (n) Type I error 0.05 Larger effect size Smaller effect size
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 36 Statistical hypothesis testing: problems and caveats Problem 5: since p depends on sample size (n)… … using a fixed nominal (e.g. = 0.05) as n increases is logically inconsistent: even for n = infinity and true H 0, = 0.05! Sample size (n) Nominal type I error ( ) Fixed (e.g. 0.05) depends on n
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 37 Statistical hypothesis testing: solutions Avoid testing trivial null hypotheses Distinguish between biological (or other) significance and statistical significance Always provide estimates of effect sizes and their precision, statistical significance (or lack thereof) notwithstanding Consider using randomization and/or resampling methods to generate actual distribution of test statistics.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 38 The underlying principle of the t-test If the match between observed and expected is poorer than would be expected on the basis of measurement precision, then we should reject the null hypothesis. Fork length Frequency Reject H 0 Accept H 0 Observed Expected ee oo ee oo
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 39 Why correct for precision? Large differences between observed and expected may occur because (1) measurements are imprecise, or (2) the hypothesis is false, or (3) some combination of the two. Therefore, to conclude (2), we must first eliminate (1) and (3). Fork length Frequency Reject H 0 Accept H ee oo Observations True distribution Expected Observed
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 40 Principle of the t-test If the difference between the observed and expected results is much larger than the precision of the measurement, then something is wrong. If the difference between the observed results and those expected under the null hypothesis is much larger than the standard error, then the null hypothesis is probably incorrect.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 41 Components of the t-test Null hypothesis (H 0 ) Observations Test statistic (t) Assumptions
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 42 Test that the mean of a sample is equal to some theoretical value T by calculating: What is probability of obtaining a t value as deviant as that observed given the null hypothesis is true? Testing an extrinsic hypothesis Reject H 0 Accept H 0 Expected Observations True distribution Observed
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 43 An example: growth rate in rainbow trout Use observed relationship between growth rate ( ) and pH to predict in a lake of a pH = 4.5. Null hypothesis is H 0 : Compare expected ( = T ) with average observed in lake with pH = 4.5. Accept H 0 pH T 10 mm/m T Frequency Expected Observed True distribution
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 44 Inference: How to translate p into a conclusion? If p < 0.05, reject the null hypothesis... … but keep p in mind! Report p, not just whether it is “significant” (or not). Remember, the p < 0.05 is entirely arbitrary!
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 45 Assumptions p is calculated assuming the test statistic t is distributed as Student’s t (t s ) which has a well-known distribution. This assumption is true only if the data are normally distributed.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 46 The distribution of t versus Student’s t (t s ) Calculation of p assumes p(t) = p(t s ). But, as data become increasingly non- normal, the deviation between the two increases. Therefore, calculated p values are incorrect. t, data highly non-normal t data slightly non-normal tsts Probability (p)
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 47 What if data are not normal? Translation of t into p is incorrect. But, bias is often very small, especially with large samples, due to Central Limit Theorem. So, use common sense...and worry only when p is close to nominal level.
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 48 Increase sample size. Transform data. Use another (non-parametric) test, one that does not assume normality. What if data are not normal and p is close to ?
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 49 Data transformations typically simple mathematical functions like log(X), sqrt(X), arcsin(X) Choice based upon theory or trial and error. problem 1: finding an appropriate transformation can be a like finding a needle in a haystack problem 2: some data cannot be normalized!
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 50 Statistical analysis as model building All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. “Model fitting” is then the process by which model parameters are estimated. X Y Y 22 22 42 Group 1 Group 2 Group 3 Linear regression ANOVA
University of Ottawa - Bio 4158 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 51 Translating biological questions into statistical models Blackfly abundance varies spatially? Hypothesis: Food is the answer to everything Prediction: Abundance is related to food availability Model: Abundance=k+Food+Error H 0 : Abundance=k+Error