Statistical vs Clinical Significance

Slides:



Advertisements
Similar presentations
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Advertisements

Introduction to Hypothesis Testing
Chapter 10 Introduction to Inference
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
A Spreadsheet for Analysis of Straightforward Controlled Trials
Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education University of Otago, Dunedin NZ Reliability: the Essentials.
Statistical vs Clinical or Practical Significance
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Chapter 17/18 Hypothesis Testing
Inferential Statistics and t - tests
Hypothesis Tests Steps and Notation (1-Sample)
Power and sample size.
Inferential Statistics
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
MAGNITUDE-BASED INFERENCES
Statistical Issues in Research Planning and Evaluation
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Cal State Northridge  320 Ainsworth Sampling Distributions and Hypothesis Testing.
1/55 EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2008 Chapter 10 Hypothesis Testing.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice-Hall, Inc. Chap 8-1 Business Statistics: A Decision-Making Approach 6 th Edition Chapter.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Statistics for Business and Economics 7 th Edition Chapter 9 Hypothesis Testing: Single.
Ch. 9 Fundamental of Hypothesis Testing
BCOR 1020 Business Statistics
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Statistics for Managers Using Microsoft® Excel 5th Edition
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Hypothesis Testing. A Research Question Everybody knows men are better drivers than women. Hypothesis: A tentative explanation that accounts for a set.
Chapter 10 Hypothesis Testing
Confidence Intervals and Hypothesis Testing - II
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Fundamentals of Hypothesis Testing: One-Sample Tests
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
1 Statistical Inference Greg C Elvers. 2 Why Use Statistical Inference Whenever we collect data, we want our results to be true for the entire population.
Chapter 8 Introduction to Hypothesis Testing
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
Individual values of X Frequency How many individuals   Distribution of a population.
Inferential Statistics 2 Maarten Buis January 11, 2006.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
 If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Chapter 20 Testing hypotheses about proportions
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Hypothesis Testing State the hypotheses. Formulate an analysis plan. Analyze sample data. Interpret the results.
Introduction to sample size and power calculations Afshin Ostovar Bushehr University of Medical Sciences.
1 Chapter 9 Hypothesis Testing. 2 Chapter Outline  Developing Null and Alternative Hypothesis  Type I and Type II Errors  Population Mean: Known 
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
Chap 8-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 8 Introduction to Hypothesis.
Lecture 9 Chap 9-1 Chapter 2b Fundamentals of Hypothesis Testing: One-Sample Tests.
Statistical Inference for the Mean Objectives: (Chapter 9, DeCoursey) -To understand the terms: Null Hypothesis, Rejection Region, and Type I and II errors.
How confident are we in the estimation of mean/proportion we have calculated?
Hypothesis Testing An understanding of the method of hypothesis testing is essential for understanding how both the natural and social sciences advance.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Hypothesis Testing and Statistical Significance
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Hypothesis Tests for 1-Proportion Presentation 9.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. 1 FINAL EXAMINATION STUDY MATERIAL III A ADDITIONAL READING MATERIAL – INTRO STATS 3 RD EDITION.
Statistical Analysis and Data Interpretation: What is Important for the Athlete and Statistician Will G Hopkins Institute of Sport and Recreation Research.
Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests
Statistics for Managers Using Microsoft® Excel 5th Edition
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Week 11 Chapter 17. Testing Hypotheses about Proportions
The end of statistical significance
Presentation transcript:

Statistical vs Clinical Significance Will G Hopkins Auckland University of Technology Auckland, NZ Other titles: Statistical vs clinical, practical, or mechanistic significance. A more meaningful way to make inferences from a sample. Statistical significance is unethical; clinical significance isn’t. What are the chances your finding is beneficial or harmful? Publishing without hypotheses and statistical significance. Non-significant effect?  No problem! probability beneficial trivial smallest clinically harmful value harmful value of effect statistic

Statistical significance Confidence limits Summary Background Misinterpretation of data Making inferences Sample  population Statistical significance P values and null hypotheses Confidence limits Precision of estimation Clinical, practical, or mechanistic significance Probabilities of benefit and harm Smallest worthwhile effect How to use possible, likely, very likely, almost certain Examples

Background Most researchers and students misinterpret statistical significance and non-significance. Few people know the meaning of the P value that defines statistical significance. Reviewers and editors reject some papers with statistically non-significant effects that should be published. Use of confidence limits instead of a P value is only a partial solution to these problems. We’re trying to make inferences about a population from a sample. What's missing is some way to make inferences about the clinical or practical significance of an effect.

Making Inferences in Research We study a sample to get an observed value of a statistic representing an interesting effect, such as the relationship between physical activity and health or performance. But we want the true (= population) value of the statistic. The observed value and the variability in the sample allow us to make an inference about the true value. Use of the P value and statistical significance is one approach to making such inferences. Its use-by date was December 31, 1999. There are better ways to make inferences.

P Values and Statistical Significance Based on notion that we can disprove, but not prove, things. Therefore, we need something to disprove. Let's assume the true effect is zero: the null hypothesis. If the value of the observed effect is unlikely under this assumption, we reject (disprove) the null hypothesis. "Unlikely" is related to (but not equal to) a probability or P value. P < 0.05 is regarded as unlikely enough to reject the null hypothesis (i.e., to conclude the effect is not zero). We say the effect is statistically significant at the 0.05 or 5% level. Some folks also say "there is a real effect". P > 0.05 means not enough evidence to reject the null. We say the effect is statistically non-significant. Some folks accept the null and say "there is no effect".

Problems with this philosophy We can disprove things only in pure mathematics, not in real life. Failure to reject the null doesn't mean we have to accept the null. In any case, true effects in real life are never zero. Never. So, THE NULL HYPOTHESIS IS ALWAYS FALSE! Therefore, to assume that effects are zero until disproved is illogical, and sometimes impractical or even unethical. 0.05 is arbitrary. The answer? We need better ways to represent the uncertainties of real life: Better interpretation of the classical P value More emphasis on (im)precision of estimation, through use of confidence limits for the true value Better types of P value, representing probabilities of clinical or practical benefit and harm

Better Interpretation of the Classical P Value P/2 is the probability that the true value is negative. Example: P = 0.24 observed value probability value of effect statistic positive negative probability distribution of true value given the observed value (P value)/2 = 0.12 Easier to understand, and avoids statistical significance, but… Problem: having to halve the P value is awkward, although we could use one-tailed P values directly. Problem: focus is still on zero or null value of the effect.

Confidence (or Likely) Limits of the True Value These define a range within which the true value is likely to fall. "Likely" is usually a probability of 0.95 (defining 95% limits). probability distribution of true value given the observed value observed value probability value of effect statistic positive negative Area = 0.95 lower likely limit upper likely limit Problem: 0.95 is arbitrary and gives an impression of imprecision. 0.90 or less would be better. Problem: still have to assess the upper and lower limits and the observed value in relation to clinically important values.

Clinical Significance Statistical significance focuses on the null value of the effect. More important is clinical significance defined by the smallest clinically beneficial and harmful values of the effect. These values are usually equal and opposite in sign. Example: smallest clinically harmful value value of effect statistic positive negative smallest clinically beneficial value observed value We now combine these values with the observed value to make a statement about clinical significance.

value of effect statistic The smallest clinically beneficial and harmful values help define probabilities that the true effect could be clinically beneficial, trivial, or harmful (Pbeneficial, Ptrivial, Pharmful). Pbeneficial = 0.80 smallest clinically beneficial value These Ps make an effect easier to assess and (hopefully) to publish. Warning: these Ps are NOT the proportions of + ive, non- and - ive responders in the population. The calculations are easy. Put the observed value, smallest beneficial/harmful value, and P value into the confidence-limits spreadsheet at newstats.org. More challenging: choosing the smallest clinically important value, interpreting the probabilities, and publishing the work. probability value of effect statistic positive negative observed value Ptrivial = 0.15 Pharmful = 0.05 smallest clinically harmful value

Choosing the Smallest Clinically Important Value If you can't meet this challenge, quit the field. For performance in many sports, ~0.5% increases a top athlete's chances of winning. The default for most other populations is Cohen's set of smallest worthwhile effect sizes. This approach applies to the smallest clinically, practically and/or mechanistically important effects. Correlations: 0.10 Relative risks: ~1.2, depending on prevalence of the disease or other condition. Changes or differences in the mean: 0.20 between-subject standard deviations.

More on differences or changes in the mean… Why the between-subject standard deviation is important: intelligence females males Trivial effect (0.1x SD): intelligence females males Very large effect (3x SD): You must also use the between-subject standard deviation when analyzing the change in the mean in an experiment. Many meta-analysts wrongly use the SD of the change score.

Interpreting the Probabilities You should describe outcomes in plain language in your paper. Therefore you need to describe the probabilities that the effect is beneficial, trivial, and/or harmful. Suggested schema: The effect… beneficial/trivial/harmful is not…, is almost certainly not… Probability <0.01 Chances <1% Odds <1:99 is very unlikely to be… 0.01–0.05 1–5% 1:99–1:19 is unlikely to be…, is probably not… 0.05–0.25 5–25% 1:19–1:3 is possibly (not)…, may (not) be… 0.25–0.75 25–75% 1:3–3:1 is likely to be…, is probably… is very likely to be… is…, is almost certainly… 0.75–0.95 0.95–0.99 >0.99 75–95% 95–99% >99% 3:1–19:1 19:1–99:1 >99:1

Publishing the Outcome Example: TABLE 2. Differences in improvements in kayaking performance between the slow, explosive and control training groups, and chances that the differences are substantial (greater than the smallest worthwhile change of 0.5%) for a top kayaker. Mean improvement (%) and 90% confidence limits 3.1; ±1.6 Chances (% and qualitative) of substantial improvementa 99.6; almost certain Compared groups Slow - control Explosive - control Slow - explosive 2.0; ±1.2 98; very likely 1.1; ±1.4 74; possible aChances of substantial decline in performance all <5% (very unlikely).

Examples showing use of the spreadsheet and the clinical importance of p=0.20 More examples on supplementary slides at end of slideshow.

Summary When you report your research… Show the observed magnitude of the effect. Attend to precision of estimation by showing 90% confidence limits of the true value. Show the P value if you must, but do not test a null hypothesis and do not mention statistical significance. Attend to clinical, practical or mechanistic significance by stating the smallest worthwhile value then showing the probabilities that the true effect is beneficial, trivial, and/or harmful (or substantially positive, trivial, and/or negative). Make a qualitative statement about the clinical or practical significance of the effect, using unlikely, very likely, and so on.

This presentation is available from: See Sportscience 6, 2002

Supplementary slides: Original meaning of P value More examples of clinical significance

Traditional Interpretation of the P Value Example: P = 0.20 for an observed positive value of a statistic If the true value is zero, there is a probability of 0.20 of observing a more extreme positive or negative value. probability probability distribution of observed value if true value = 0 P value = 0.1 + 0.1 observed value value of effect statistic positive negative Problem: huh? (Hard to understand.) Problem: everything that's wrong with statistical significance.

More Examples of Clinical Significance Examples for a minimum worthwhile change of 2.0 units. Example 1–clinically beneficial, statistically non-significant (inappropriately rejected by editors): The observed effect of the treatment was 6.0 units (90% likely limits –1.8 to 14 units; P = 0.20). The chances that the true effect is practically beneficial/trivial/harmful are 80/15/5%. Example 2–clinically beneficial, statistically significant (no problem with publishing): The observed effect of the treatment was 3.3 units (90% likely limits 1.3 to 5.3 units; P = 0.007). The chances that the true effect is practically beneficial/trivial/harmful are 87/13/0%.

Example 3–clinically unclear, statistically non-significant (the worst kind of outcome, due to small sample or large error of measurement; usually rejected, but could/should be published to contribute to a future meta-analysis): The observed effect of the treatment was 2.7 units (90% likely limits –5.9 to 11 units; P = 0.60). The chances that the true effect is practically beneficial/trivial/harmful are 55/26/18%. Example 4–clinically unclear, statistically significant (good publishable study; true effect is on the borderline of beneficial): The observed effect of the treatment was 1.9 units (90% likely limits 0.4 to 3.4 units; P = 0.04). The chances that the true effect is practically beneficial/trivial/harmful are 46/54/0%.

Example 5–clinically trivial, statistically significant (publishable rare outcome that can arise from a large sample size; usually misinterpreted as a worthwhile effect): The observed effect of the treatment was 1.1 units (90% likely limits 0.4 to 1.8 units; P = 0.007). The chances that the true effect is practically beneficial/trivial/harmful are 1/99/0%. Example 6–clinically trivial, statistically non-significant (publishable, but sometimes not submitted or accepted): The observed effect of the treatment was 0.3 units (90% likely limits –1.7 to 2.3 units; P = 0.80). The chances that the true effect is practically beneficial/trivial/harmful are 8/89/3%.