Elements of a statistical test Statistical null hypotheses

Slides:



Advertisements
Similar presentations
Tests of Hypotheses Based on a Single Sample
Advertisements

Lecture (11,12) Parameter Estimation of PDF and Fitting a Distribution Function.
Anthony Greene1 Simple Hypothesis Testing Detecting Statistical Differences In The Simplest Case:  and  are both known I The Logic of Hypothesis Testing:
1 Hypothesis testing. 2 A common aim in many studies is to check whether the data agree with certain predictions. These predictions are hypotheses about.
Chapter Seventeen HYPOTHESIS TESTING
Introduction to Hypothesis Testing
Lecture 2: Thu, Jan 16 Hypothesis Testing – Introduction (Ch 11)
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
C82MCP Diploma Statistics School of Psychology University of Nottingham 1 Overview of Lecture Independent and Dependent Variables Between and Within Designs.
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
Inference about a Mean Part II
IENG 486 Statistical Quality & Process Control
Today Concepts underlying inferential statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Inferential Statistics
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 8 Tests of Hypotheses Based on a Single Sample.
Statistical hypothesis testing – Inferential statistics I.
AM Recitation 2/10/11.
Statistics 11 Hypothesis Testing Discover the relationships that exist between events/things Accomplished by: Asking questions Getting answers In accord.
Overview of Statistical Hypothesis Testing: The z-Test
1 © Lecture note 3 Hypothesis Testing MAKE HYPOTHESIS ©
Hypothesis Testing.
Statistical Techniques I
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Chapter 9 Large-Sample Tests of Hypotheses
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
1 Power and Sample Size in Testing One Mean. 2 Type I & Type II Error Type I Error: reject the null hypothesis when it is true. The probability of a Type.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 21/10/ :24 PM 1 Review and important concepts Biological.
Inference and Inferential Statistics Methods of Educational Research EDU 660.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Educational Research Chapter 13 Inferential Statistics Gay, Mills, and Airasian 10 th Edition.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Inen 460 Lecture 2. Estimation (ch. 6,7) and Hypothesis Testing (ch.8) Two Important Aspects of Statistical Inference Point Estimation – Estimate an unknown.
Inferential Statistics Inferential statistics allow us to infer the characteristic(s) of a population from sample data Slightly different terms and symbols.
Hypothesis Testing Introduction to Statistics Chapter 8 Feb 24-26, 2009 Classes #12-13.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 7 Inferences Concerning Means.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 28/06/2016 4:11 PM 1 Review and important concepts.
Chapter 9 Introduction to the t Statistic
Lecture Slides Elementary Statistics Twelfth Edition
Inference for a Single Population Proportion (p)
Chapter 5 STATISTICAL INFERENCE: ESTIMATION AND HYPOTHESES TESTING
Hypothesis Testing: One Sample Cases
Inference and Tests of Hypotheses
PCB 3043L - General Ecology Data Analysis.
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Central Limit Theorem, z-tests, & t-tests
Hypothesis Testing: Hypotheses
Statistical Process Control
Chapter 9 Hypothesis Testing.
Chapter 9 Hypothesis Testing.
Statistical inference
Chapter 11: Introduction to Hypothesis Testing Lecture 5b
I. Statistical Tests: Why do we use them? What do they involve?
INTRODUCTION TO HYPOTHESIS TESTING
Interval Estimation and Hypothesis Testing
What are their purposes? What kinds?
Hypothesis Testing S.M.JOSHI COLLEGE ,HADAPSAR
Chapter 9 Hypothesis Testing: Single Population
STA 291 Spring 2008 Lecture 17 Dustin Lueker.
Statistical inference
Presentation transcript:

Lecture 2: Some general comments on statistical tests and statistical inference Elements of a statistical test Statistical null hypotheses The meaning of p Inference: How to translate p into a conclusion Statistical errors in hypothesis testing Power One versus two-tailed tests Problems with statistical hypothesis testing Bio 4118 Applied Biostatistics 2001

Elements of a statistical test Null hypothesis (H0) Observations (data) Test statistic Assumptions Bio 4118 Applied Biostatistics 2001

Statistical null hypotheses The “default” to which you compare your data Usually, one sets up the analysis such that if you reject the null hypothesis, you have a pattern which is consistent with the biological prediction… …so that in many cases, the null hypothesis specifies a lack of pattern. Bio 4118 Applied Biostatistics 2001

Biological questions versus statistical null hypotheses Do males and females differ in size? Average size of males and females are equal. Does age structure differ between two fish populations? Age (frequency) distribution is independent of population. Bio 4118 Applied Biostatistics 2001

Testing biological hypotheses Biological question Pose biological question (Do drugs A and B differ in their efficacy in treating breast cancer?) Generate biological hypothesis Generate biological prediction Induction Biological hypothesis Deduction Survival A B Bio 4118 Applied Biostatistics 2001

Observations Assume that the experimental data (observations) are collected in a proper manner. All observations are subject to a certain amount of measurement error! Bio 4118 Applied Biostatistics 2001

Statistical analysis as model building All statistical analyses begin with a mathematical model that supposedly “describes” the data, e.g., regression, ANOVA. “Model fitting” is then the process by which model parameters are estimated. X Linear regression ANOVA e42 m2 a2 Y m Group 1 Group 2 Group 3 Bio 4118 Applied Biostatistics 2001

Parameters, statistics and estimators Population parameters characterize populations (which in general cannot be completely enumerated) statistics (estimators) are estimates of population parameters obtained from a finite sample (e.g., the sample mean is an estimate of the population mean) Sample The process by which one obtains an estimate of a population parameter from a finite sample is called an estimation procedure. Bio 4118 Applied Biostatistics 2001

The meaning of p Informal: the probability that the null hypothesis is true Strictly correct: the probability of observing data as deviant (from the expected results) as the observed results if in fact the null hypothesis were true, assuming the data were properly collected, and all statistical assumptions are met. Bio 4118 Applied Biostatistics 2001

To reject or not reject? The decision to reject or accept the null hypothesis is based on p. This requires some agreement (convention) as to what p value we will consider as significant. This threshold value is arbitrary! Bio 4118 Applied Biostatistics 2001

Test statistics In standard statistical analysis, p is estimated by reference to the distribution of an appropriate test statistic. If we know the distribution of the test statistic, we can calculate the probability of getting a test statistic value at least as large (small) as the calculated value if H0 were true, i.e., p. Bio 4118 Applied Biostatistics 2001

An example Two samples (1, 2) with mean values that differ by some amount d. What is the probability p of observing this difference under H0 that the two means are in fact equal? Sample 2 Sample 1 Frequency Bio 4118 Applied Biostatistics 2001

An example (cont’d) Frequency Sample 2 Sample 1 If H0 is true, the expected distribution of the test statistic t is: Probability (p) t -3 -2 -1 1 2 3 Bio 4118 Applied Biostatistics 2001

An example (cont’d) For the two populations, suppose t = 2.01 Frequency Sample 2 Sample 1 For the two populations, suppose t = 2.01 What is the probability of getting a value at least this large under H0 that the two means are in fact equal? Since p is small, it is unlikely that H0 is true. Therefore, reject H0. -3 -2 -1 1 2 3 Probability t = 2.01 Bio 4118 Applied Biostatistics 2001

Inference: How to translate p into a conclusion? If p < 0.05, reject the null hypothesis... ...but keep p in mind! Report p, not just whether it is “significant” (or not). Remember, the p < 0.05 “convention” is entirely arbitrary! Bio 4118 Applied Biostatistics 2001

“Statistical significance” and real-world decision-making: an example If you were offered the same odds on each horse, on which would you bet? If you were a bookie, would you offer the same odds on each horse? And if you did, would you still be in business? Clyde’s Fancy Hypattia Bio 4118 Applied Biostatistics 2001

Statistical errors in hypothesis testing Two types: a true null hypothesis may be rejected, or a false null hypothesis may be accepted Type I error (a): the probability of rejecting a true null hypothesis Type II error (b) : the probability of accepting a false null hypothesis Bio 4118 Applied Biostatistics 2001

Errors in inference Reality Conclusion H0 is true H0 is false Accept H0 no error  Reject H0  no error Bio 4118 Applied Biostatistics 2001

Errors in inference: an example Reality Conclusion No HIV HIV Seronegative 99% 5%  Seropositive 1%  95% H0 HA Bio 4118 Applied Biostatistics 2001

One- and two-tailed null hypotheses -3 -2 -1 1 2 3 Probability a/2 1- a a/2 For 2-tailed H0, there are two rejection regions of size a/2. For 1-tailed H0 there is one rejection region of size a. -3 -2 -1 1 2 3 -3 -2 -1 1 2 3 Probability 1- a a a 1- a t Bio 4118 Applied Biostatistics 2001

Example: 2-tailed H0 No difference in populations H0: m1 = m2 Frequency Sample 2 Sample 1 No difference in populations H0: m1 = m2 Since H0 is 2- tailed, would reject H0 if m1 - m2 > 0 or m1 - m2 < 0. -3 -2 -1 1 2 3 Probability Bio 4118 Applied Biostatistics 2001

Example: 1-tailed H0 Frequency Sample 2 Sample 1 The average size of individuals in population 1 is greater than population 2 H0: m1 - m2  0 Since H0 is 1- tailed, would reject H0 if m1 - m2 > 0 only. -3 -2 -1 1 2 3 Probability Bio 4118 Applied Biostatistics 2001

One versus two-tailed hypotheses Sample 2 Frequency Sample 1 2-tailed hypothesis: reject if any non-random pattern is detected. 1-tailed hypothesis: reject if a specified directional non-random pattern is detected H0: m1 = m2 (2-tailed, reject) H0: m1  m2 (1-tailed, accept) Bio 4118 Applied Biostatistics 2001

Important note! -3 -2 -1 1 2 3 Probability For given “directionality”, 1- tailed test is more powerful than 2-tailed Therefore, always specify the nature of H0 before your analysis! a a/2 Probability 2 3 Bio 4118 Applied Biostatistics 2001

Parameters of statistical inference Type I error rate (a) Power (1 - Type II error rate = 1 - b) Sample size (N) Effect size (d) Each of the above is a function of the other three. Hence, if three are known, so is the fourth. Bio 4118 Applied Biostatistics 2001

Power Power is the probability of rejecting the null hypothesis when it is false and a specified alternate null hypothesis is true, i.e. 1- b. Power can only be calculated when a specific alternate null hypothesis is specified. Therefore, power depends on the alternate null hypothesis. Powerful tests can detect small differences, weak tests only large differences. Bio 4118 Applied Biostatistics 2001

Calculating power: an example Expected distribution of means of samples of 5 housefly wing lengths from normal populations specified by m as shown above curves and sY = 1.74. Centre curve represents null hypothesis, H0: m = 45.5, curves at sides represent alternative hypotheses, m = 37 or m = 54. Vertical lines delimit 5% rejection regions for the null hypothesis. H1 : m = 37 H0 : m = 45.5 H1 : m = 54 35 40 45 50 55 Bio 4118 Applied Biostatistics 2001

Power: cont’d 40 45 50 55 60 H0: m = m0 H1: m = m1 m1=54 m1=53 m1=50 m1=48.5 b=0.0096 b=0.0018 m0=45.5 b=0.2676 b=0.5948 Increases in type II error, b, as alternative hypothesis, H1, approaches null hypothesis, H0 -- that is, m1 approaches m . Shading represents b. Vertical lines mark off 5% critical regions (2.5% in each tail) for the null hypothesis. To simplify the graph, the alternative distributions are shown for one tail only. Bio 4118 Applied Biostatistics 2001

Effect size Every null hypothesis in any statistical test implies a value for some population parameter. E.g. if two sample means are equal, the absolute value of the difference d between the two populations is zero: Sample 2 Sample 1 Frequency X Bio 4118 Applied Biostatistics 2001

Effect size (cont’d) Sample 2 Sample 1 Frequency More generally, since H0 specifies a lack of some phenomenon, d quantifies the degree to which the phenomenon is present. So if H0 is false, it is false to some specific degree, quantified by d, the effect size. X Bio 4118 Applied Biostatistics 2001

Types of power analysis I: power as a function of a, d and N Often done after a statistical test, where N (sample size) and effect size (d) are determined and the null hypothesis has been accepted. Then, for specified a, we can calculate 1- b (the power of the test) If 1- b is low, then the Type II error rate is large, so there is a good chance we have accepted a false H0. Sample 2 Sample 1 Frequency X Bio 4118 Applied Biostatistics 2001

Types of power analysis II: N as a function of a, d and power A certain effect size (d) is anticipated (perhaps based on a preliminary sample) with a desired a and 1- b. Given a, b and d, we can calculate the minimum sample size Nmin required to achieve the desired specifications. This exercise can be very useful in planning experiments. Pre-sample 2 Pre-sample 1 Frequency X Bio 4118 Applied Biostatistics 2001

Types of power analysis III: d as a function of a, N and power Given a desired a, 1- b and N, what is the minimal detectable effect size dmin? If dmin is large, then only large deviations from H0 will be detected (i.e. will result in rejection of H0). Thus, we should be VERY VERY careful NOT to infer that some phenomenon does not exist if we accept H0. Sample 2 Sample 1 Frequency X Bio 4118 Applied Biostatistics 2001

Power: dependence on sample size 35 40 45 50 55 1.0 0.5 a Wing length (x 0.1 mm) Power (1-b) n = 5 n = 35 Power curves for testing H0: m = 45.5. H1: m  45.5 for n = 5 and for n = 35. For given observed wing length, the probability of rejecting a false null hypothesis decreases as N decreases. Bio 4118 Applied Biostatistics 2001

Why power matters N = 200 Frequency Two samples, identical means and variances, but differ in N in first case, power is large, p < .05, therefore reject H0 in second case, power is low, p > .05, therefore accept H0. N = 30 Frequency m1 m2 Size Bio 4118 Applied Biostatistics 2001

Power: conclusions If sample sizes are small, the power of any test is usually low. So, unless one knows the power of the analysis, a decision to accept the null hypothesis is meaningless! Conversely, if power is very high, rejection of the null is very likely, even if deviations from null expectations are small (and perhaps biologically meaningless)! Bio 4118 Applied Biostatistics 2001

Statistical hypothesis testing: problems and caveats Problem 1: many H0s are very unlikely to be true a priori… …so that their rejection is not very informative. Treatment 1 Treatment 2 Control Average yield Treatment Bio 4118 Applied Biostatistics 2001

Statistical hypothesis testing: problems and caveats Problem 2: Nominal type I error (e.g. a = 0.5) is entirely arbitrary, and may not bear any relationship to biological significance… … and even less to decision-making Threshold for decision-making Probabilty -3 -2 -1 1 2 3 t Bio 4118 Applied Biostatistics 2001

Statistical hypothesis testing: problems and caveats Problem 3: p is probability of obtaining a test statistic at least as extreme as that observed if H0 is true… … but often the actual (sampling) distribution of the test statistic does not match the (assumed) distribution under the null. Sampled Probabilty Null -3 -2 -1 1 2 3 t Bio 4118 Applied Biostatistics 2001

Statistical hypothesis testing: problems and caveats Problem 4: for fixed effect size, p depends on sample size (n)… …so that one can almost always reject H0 if the sample is sufficiently large, even if the observed effect is trivial Larger effect size Type I error Smaller effect size 0.05 Sample size (n) Bio 4118 Applied Biostatistics 2001

Statistical hypothesis testing: problems and caveats Problem 5: since p depends on sample size (n)… … using a fixed nominal a (e.g. a = 0.05) as n increases is logically inconsistent: even for n = infinity and true H0, a = 0.05! Fixed a (e.g. 0.05) 0.05 a depends on n Nominal type I error (a) Sample size (n) Bio 4118 Applied Biostatistics 2001

Statistical hypothesis testing: solutions Avoid testing trivial null hypotheses Distinguish between biological (or other) significance and statistical significance Always provide estimates of effect sizes and their precision, statistical significance (or lack thereof) notwithstanding Consider using randomization and/or resampling methods to generate actual distribution of test statistics. Bio 4118 Applied Biostatistics 2001