If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.

Slides:



Advertisements
Similar presentations
High Resolution studies
Advertisements

If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Introduction to Hypothesis Testing
Section 10.2: Tests of Significance
Chapter 10 Introduction to Inference
Introductory Mathematics & Statistics for Business
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Quantitative Data Analysis
A Spreadsheet for Analysis of Straightforward Controlled Trials
Client Assessment and Other New Uses of Reliability Will G Hopkins Physiology and Physical Education University of Otago, Dunedin NZ Reliability: the Essentials.
Statistical vs Clinical or Practical Significance
Planning, Performing, and Publishing Research with Confidence Limits
Statistical vs Clinical Significance
Validity and Reliability
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Meta-Analyses: Appropriate Growth Will G Hopkins Faculty of Health Science AUT University, Auckland, NZ Resources: Cochrane Reviewers Handbook (2006) at.
Critical review of significance testing F.DAncona from a Alain Morens lecture 2006.
Copyright © 2010 Pearson Education, Inc. Slide
Quantitative Methods Lecture 3
1 Session 8 Tests of Hypotheses. 2 By the end of this session, you will be able to set up, conduct and interpret results from a test of hypothesis concerning.
Chapter 7 Sampling and Sampling Distributions
Intro to Statistics Part2 Arier Lee University of Auckland.
Hypothesis Test II: t tests
The basics for simulations
You will need Your text Your calculator
Lesson Tests about a Population Parameter.
7 Elementary Statistics Larson Farber Hypothesis Testing.
HYPOTHESIS TESTING. Purpose The purpose of hypothesis testing is to help the researcher or administrator in reaching a decision concerning a population.
Chapter 7 Hypothesis Testing
Inferential Statistics and t - tests
6. Statistical Inference: Example: Anorexia study Weight measured before and after period of treatment y i = weight at end – weight at beginning For n=17.
Chapter 4 Inference About Process Quality
Type I & Type II errors Brian Yuen 18 June 2013.
Understanding p-values Annie Herbert Medical Statistician Research and Development Support Unit
T-Tests For Dummies As in the books, not you personally!
“Students” t-test.
Putting Statistics to Work
Inferential Statistics
Statistical Inferences Based on Two Samples
Testing Hypotheses About Proportions
Multiple Regression and Model Building
Statistical Analysis and Data Interpretation What is significant for the athlete, the statistician and team doctor? important Will Hopkins
Statistics.  Statistically significant– When the P-value falls below the alpha level, we say that the tests is “statistically significant” at the alpha.
Issues About Statistical Inference Dr R.M. Pandey Additional Professor Department of Biostatistics All-India Institute of Medical Sciences New Delhi.
MAGNITUDE-BASED INFERENCES
Review: What influences confidence intervals?
Elementary hypothesis testing
Elementary hypothesis testing Purpose of hypothesis testing Type of hypotheses Type of errors Critical regions Significant levels Hypothesis vs intervals.
Ch. 9 Fundamental of Hypothesis Testing
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Lecture 7 Introduction to Hypothesis Testing. Lecture Goals After completing this lecture, you should be able to: Formulate null and alternative hypotheses.
 If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Issues concerning the interpretation of statistical significance tests.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Making Inferences About Effects Seminar presented at Leeds Beckett and Split universities, March 2016 This slideshow consists of part of the lecture on.
Statistical Analysis and Data Interpretation: What is Important for the Athlete and Statistician Will G Hopkins Institute of Sport and Recreation Research.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it.
Review: What influences confidence intervals?
I figured there must be a better way. There is: confidence intervals!
Quantitative Data Analysis
Presentation transcript:

If you are viewing this slideshow within a browser window, select File/Save as… from the toolbar and save the slideshow to your computer, then open it directly in PowerPoint. When you open the file, use the full-screen view to see the information on each slide build sequentially. For full-screen view, click on this icon at the lower left of your screen. To go forwards, left-click or hit the space bar, PdDn or  key. To go backwards, hit the PgUp or  key. To exit from full-screen view, hit the Esc (escape) key.

Making Inferences: Clinical vs Statistical Significance Will G Hopkins (will@clear.net.nz) AUT University Auckland, NZ Hypothesis testing, p values, statistical significance Confidence limits Chances of benefit/harm harmful trivial beneficial probability value of effect statistic

This slideshow is an updated version of the slideshow in: Batterham AM, Hopkins WG (2005). Making meaningful inferences about magnitudes. Sportscience 9, 6-13. See link at sportsci.org. Other resources: Hopkins WG (2007). A spreadsheet for deriving a confidence interval, mechanistic inference and clinical inference from a p value. Sportscience 11, 16-20. See sportsci.org. Hopkins WG, Marshall SW, Batterham AM, Hanin J (2009). Progressive statistics for studies in sports medicine and exercise science. Medicine and Science in Sports and Exercise 41, 3-12. (Also available at sportsci.org: Sportscience 13, 55-70, 2009.)

Background A major aim of research is to make an inference about an effect in a population based on study of a sample. Null-hypothesis testing via the P value and statistical significance is the traditional but flawed approach to making an inference. Precision of estimation via confidence limits is an improvement. But what's missing is some way to make inferences about the clinical, practical or mechanistic significance of an effect. I will explain how to do it via confidence limits using values for the smallest beneficial and harmful effect. I will also explain how to do it by calculating and interpreting chances that an effect is beneficial, trivial, and harmful.

Hypothesis Testing, P Values and Statistical Significance Based on the notion that we can disprove, but not prove, things. Therefore, we need a thing to disprove. Let's try the null hypothesis: the population or true effect is zero. If the value of the observed effect is unlikely under this assumption, we reject (disprove) the null hypothesis. Unlikely is related to (but not equal to) the P value. P < 0.05 is regarded as unlikely enough to reject the null hypothesis (that is, to conclude the effect is not zero or null). We say the effect is statistically significant at the 0.05 or 5% level. Some folks also say there is a real effect. P > 0.05 means there is not enough evidence to reject the null. We say the effect is statistically non-significant. Some folks also accept the null and say there is no effect.

Problems with this philosophy… We can disprove things only in pure mathematics, not in real life. Failure to reject the null doesn't mean we have to accept the null. In any case, true effects are always "real", never zero. So… The null hypothesis is always false! Therefore, to assume that effects are zero until disproved is illogical and sometimes impractical or unethical. 0.05 is arbitrary. The P value is not a probability of anything in reality. Some useful effects aren't statistically significant. Some statistically significant effects aren't useful. Non-significant is usually misinterpreted as unpublishable. So good data don't get published. Solution: clinical significance or magnitude-based inferences via confidence limits and chances of benefit and harm. Statistical significance = null-based inferences.

Clinical Significance via Confidence Limits Start with confidence limits, which define a range within which we infer the true, population or large-sample value is likely to fall. Likely is usually a probability of 0.95 (for 95% limits). Caution: the confidence interval is not a range of responses! probability value of effect statistic positive negative probability distribution of true value, given the observed value Area = 0.95 observed value lower likely limit upper likely limit Representation of the limits as a confidence interval: likely range of true value value of effect statistic positive negative

value of effect statistic For clinical significance, we interpret confidence limits in relation to the smallest clinically beneficial and harmful effects. These are usually equal and opposite in sign. Harm is the opposite of benefit, not side effects. They define regions of beneficial, trivial, and harmful values: The next slide is the key to clinical or practical significance. All you need is these two things: the confidence interval and a sense of what is important (e.g., beneficial and harmful). harmful value of effect statistic positive negative trivial beneficial smallest clinically harmful effect smallest clinically beneficial effect

value of effect statistic Put the confidence interval and these regions together to make a decision about clinically significant, clear or decisive effects. UNDERSTAND THIS SLIDE! Clinically decisive? Statistically significant? value of effect statistic positive negative trivial harmful beneficial Yes: use it. Yes Yes: use it. Yes Why hypothesis testing is unethical and impractical! Yes: use it. No Yes: depends. No Yes: don't use it. Yes Yes: don't use it. No Yes: don't use it. No Yes: don't use it. Yes Yes: don't use it. Yes No: need more research. No

value of effect statistic Making a crude call on magnitude. Declare the observed magnitude of clinically clear effects. value of effect statistic positive negative trivial harmful beneficial Beneficial Beneficial Trivial Trivial Harmful Unclear

Clinical Significance via Clinical Chances We calculate probabilities that the true effect could be clinically beneficial, trivial, or harmful (Pbeneficial, Ptrivial, Pharmful). These Ps are NOT the proportions of positive, non- and negative responders in the population. Calculating the Ps is easy. Put the observed value, smallest beneficial/harmful value, and P value into a spreadsheet at newstats.org. The Ps allow a more detailed call on magnitude, as follows… smallest beneficial value probability value of effect statistic positive negative observed value probability distribution of true value Pbeneficial = 0.80 Ptrivial = 0.15 Pharmful = 0.05 smallest harmful value

value of effect statistic Making a more detailed call on magnitudes using chances of benefit and harm. Chances (%) that the effect is harmful / trivial / beneficial value of effect statistic positive negative trivial harmful beneficial 0.01/0.3/99.7 Most likely beneficial 0.1/7/93 Likely beneficial 2/33/65 Risk of harm >0.5% is unacceptable, unless chance of benefit is high enough. Possibly beneficial 1/59/40 Clinical: unclear Mechanistic: possibly +ive Mechanistic: possibly +ive 0.2/97/3 Very likely trivial 2/94/4 Likely trivial 28/70/2 Possibly harmful 74/26/0.2 Possibly harmful 97/3/0.01 Very likely harmful 9/60/31 Mechanistic and clinical: unclear

The effect… beneficial/trivial/harmful Use this table for the plain-language version of chances: An effect should be almost certainly not harmful (<0.5%) and at least possibly beneficial (>25%) before you decide to use it. But you can tolerate higher chances of harm if chances of benefit are much higher: e.g., 3% harm and 76% benefit = clearly useful. I use an odds ratio of benefit/harm of >66 in such situations. The effect… beneficial/trivial/harmful is almost certainly not… Probability <0.005 Chances <0.5% Odds <1:199 is very unlikely to be… 0.005–0.05 0.5–5% 1:999–1:19 is unlikely to be…, is probably not… 0.05–0.25 5–25% 1:19–1:3 is possibly (not)…, may (not) be… 0.25–0.75 25–75% 1:3–3:1 is likely to be…, is probably… is very likely to be… is almost certainly… 0.75–0.95 0.95–0.995 >0.995 75–95% 95–99.5% >99.5% 3:1–19:1 19:1–199:1 >199:1

Both these effects are clinically decisive, clear, or significant. Two examples of use of the spreadsheet for clinical chances: P value 0.03 value of statistic 1.5 Conf. level (%) 90 deg. of freedom 18 Confidence limits lower upper 0.4 2.6 positive negative 1 -1 threshold values for clinical chances -0.7 5.5 1 -1 90 18 0.20 2.4 Both these effects are clinically decisive, clear, or significant. prob (%) odds 78 3:1 likely, probable clinically positive Chances (% or odds) that the true value of the statistic is prob (%) odds 1:2071 almost certainly not clinically negative prob (%) odds 22 1:3 unlikely, probably not clinically trivial 78 3:1 likely, probable 19 1:4 unlikely, probably not 3 1:30 very unlikely

a with reference to a smallest worthwhile change of 0.5%. How to Publish Clinical Chances Example of a table from a randomized controlled trial: TABLE 1–Differences in improvements in kayaking sprint speed between slow, explosive and control training groups. Mean improvement (%) and 90% confidence limits 3.1; ±1.6 Qualitative outcomea Almost certainly beneficial a with reference to a smallest worthwhile change of 0.5%. Compared groups Slow - control Explosive - control Slow - explosive 2.6; ±1.2 Very likely beneficial 0.5; ±1.4 Unclear

Problem: what's the smallest clinically important effect? If you can't answer this question, quit the field. This problem applies also with hypothesis testing, because it determines sample size you need to test the null properly. Example: in many solo sports, ~0.5% change in power output changes substantially a top athlete's chances of winning. The default for most other populations and effects is Cohen's set of smallest values. These values apply to clinical, practical and/or mechanistic importance… Standardized changes or differences in the mean: 0.20 of the between-subject standard deviation. In a controlled trial, it's the SD of all subjects in the pre-test, not the SD of the change scores. Correlations: 0.10. Injury or health risk, odds or hazard ratios: 1.1-1.3.

Problem: these new approaches are not yet mainstream. Confidence limits at least are coming in, so look for and interpret the importance of the lower and upper limits. You can use a spreadsheet to convert a published P value into a more meaningful magnitude-based inference. If the authors state “P<0.05” you can’t do it properly. If they state “P>0.05” or “NS”, you can’t do it at all. Problem: these approaches, and hypothesis testing, deal with uncertainty about an effect in a population. But effects like risk of injury or changes in physiology or performance can apply to individuals. Alas, more information and analyses are needed to make inferences about effects on individuals. Researchers almost always ignore this issue, because… they don’t know how to deal with it, and/or… they don’t have enough data to deal with it properly.

Show the observed magnitude of the effect. Summary Show the observed magnitude of the effect. Attend to precision of estimation by showing 90% confidence limits of the true value. Do NOT show P values, do NOT test a hypothesis and do NOT mention statistical significance. Attend to clinical, practical or mechanistic significance by… stating, with justification, the smallest worthwhile effect, then… interpreting the confidence limits in relation to this effect, or… estimating probabilities that the true effect is beneficial, trivial, and/or harmful (or substantially positive, trivial, and/or negative). Make a qualitative statement about the clinical or practical significance of the effect, using unlikely, very likely, and so on. Remember, it applies to populations, not individuals.

A New View of Statistics For related articles and resources: A New View of Statistics newstats.org SUMMARIZING DATA GENERALIZING TO A POPULATION Simple & Effect Statistics Precision of Measurement Confidence Limits Statistical Models Dimension Reduction Sample-Size Estimation