Probability & Significance Everything you always wanted to know about p-values* *but were afraid to ask Evidence Based Chiropractic April 10, 2003.

Slides:



Advertisements
Similar presentations
Chapter 7 Hypothesis Testing
Advertisements

More about Tests! Remember, you are not proving or accepting the null hypothesis. Most of the time, the null means no difference or no change from the.
Hypothesis Testing A hypothesis is a claim or statement about a property of a population (in our case, about the mean or a proportion of the population)
Inference Sampling distributions Hypothesis testing.
Statistical Issues in Research Planning and Evaluation
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Probability & Statistical Inference Lecture 6
+ Chapter 10 Section 10.4 Part 2 – Inference as Decision.
Thursday, September 12, 2013 Effect Size, Power, and Exam Review.
Chapter 10: Hypothesis Testing
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
Power. The Four Components to a Statistical Conclusion The number of units (e.g., people) accessible to study The salience of the program relative to.
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
BCOR 1020 Business Statistics Lecture 21 – April 8, 2008.
8-2 Basics of Hypothesis Testing
Ch. 9 Fundamental of Hypothesis Testing
BCOR 1020 Business Statistics
Using Statistics in Research Psych 231: Research Methods in Psychology.
Sample size calculations
Lecture Slides Elementary Statistics Twelfth Edition
Overview Definition Hypothesis
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Section 9.1 Introduction to Statistical Tests 9.1 / 1 Hypothesis testing is used to make decisions concerning the value of a parameter.
Hypothesis Testing.
Academic Viva POWER and ERROR T R Wilson. Impact Factor Measure reflecting the average number of citations to recent articles published in that journal.
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
Slide Slide 1 Chapter 8 Hypothesis Testing 8-1 Overview 8-2 Basics of Hypothesis Testing 8-3 Testing a Claim about a Proportion 8-4 Testing a Claim About.
Chapter 8 Introduction to Hypothesis Testing
Chapter 3 Investigating Independence Objectives Students will be able to: 1) Understand what it means for attempts to be independent 2) Determine when.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
AP STATISTICS LESSON 10 – 2 DAY 1 TEST OF SIGNIFICANCE.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Lecture 16 Dustin Lueker.  Charlie claims that the average commute of his coworkers is 15 miles. Stu believes it is greater than that so he decides to.
No criminal on the run The concept of test of significance FETP India.
Statistics (cont.) Psych 231: Research Methods in Psychology.
Randomized Trial of Preoperative Chemoradiation Versus Surgery Alone in Patients with Locoregional Esophageal Carcinoma, Ursa et al. Statistical Methods:
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Lecture 16 Section 8.1 Objectives: Testing Statistical Hypotheses − Stating hypotheses statements − Type I and II errors − Conducting a hypothesis test.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Lecture 18 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Multiple Testing Matthew Kowgier. Multiple Testing In statistics, the multiple comparisons/testing problem occurs when one considers a set of statistical.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Reading Health Research Critically The first four guides for reading a clinical journal apply to any article, consider: the title the author the summary.
CHAPTER 9 Testing a Claim
Chapter 8 Parameter Estimates and Hypothesis Testing.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Rejecting Chance – Testing Hypotheses in Research Thought Questions 1. Want to test a claim about the proportion of a population who have a certain trait.
Health and Disease in Populations 2002 Sources of variation (1) Paul Burton! Jane Hutton.
+ The Practice of Statistics, 4 th edition – For AP* STARNES, YATES, MOORE Unit 5: Hypothesis Testing.
URBDP 591 I Lecture 4: Research Question Objectives How do we define a research question? What is a testable hypothesis? How do we test an hypothesis?
1 Definitions In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test is a standard procedure for testing.
Results: How to interpret and report statistical findings Today’s agenda: 1)A bit about statistical inference, as it is commonly described in scientific.
European Patients’ Academy on Therapeutic Innovation The Purpose and Fundamentals of Statistics in Clinical Trials.
Chapter 8: Introduction to Hypothesis Testing. Hypothesis Testing A hypothesis test is a statistical method that uses sample data to evaluate a hypothesis.
The Practice of Statistics Third Edition Chapter 11: Testing a Claim Copyright © 2008 by W. H. Freeman & Company Daniel S. Yates.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Inferential Statistics Psych 231: Research Methods in Psychology.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.1 Significance Tests:
+ Homework 9.1:1-8, 21 & 22 Reading Guide 9.2 Section 9.1 Significance Tests: The Basics.
Understanding Results
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
CHAPTER 9 Testing a Claim
Presentation transcript:

Probability & Significance Everything you always wanted to know about p-values* *but were afraid to ask Evidence Based Chiropractic April 10, 2003

Causality Criteria (addendum to April 3 lecture) Association between A & B does not indicate presence or direction of causality –If a town with high unemployment has a high crime rate Do the unemployed commit the crimes? Would improved employment result in less crime? “The ecological fallacy” Tests for causality: –Is the association strong? –Is it consistent from study to study? –Did the postulated cause precede the postulated effect? –Is there a dose-response gradient? (more cause  more effect) –Does the association make biological sense? –Is the association specific? –Are there previously proven analogous causal associations?

Statistical Tests Employed in explanatory studies Assess the role of chance as explanation of pattern observed in data Most commonly assesses how 2 groups compare on an outcome: Is the pattern most probably not due to chance? –The difference is statistically significant Is the pattern likely due to chance? –The difference is not statistically significant –No matter how well the study is performed, either conclusion could be wrong

p-values (p=probability) A statistical value that indicates the probability that the observed pattern is due to chance alone How confident we can be in the conclusion ‘this result was significant at p<0.05’ –Statistically speaking, and all other things being equal, we could expect this result to occur by chance no more than 5 times in every 100 trials Example: Test 100 coins by flipping each one 100 times –One coin comes up ‘heads’ 73 times We suspect this is not an ordinary fair coin It is possible for an ordinary coin to get this result by chance Want to know the probability that a fair coin would result 73/100 heads How confident are we that this is not a fair coin?

Erroneous conclusions: Type I & Type II (see handout) Type I: –like a false positive –A difference is shown when in “truth” there is none –>5% chance typically unacceptable in RCT’s Type II –Like a false-negative –No difference is shown when in “truth” there is one –Acceptable 10-20% –Consider: If sample size is small, & If difference ‘feels” clinically important

Determinants of ‘power’: Define what constitutes a “true” difference Determine acceptable levels of Type I and Type II errors –▲ in one means ▼ in the other (tradeoff) Calculate the necessary sample size –Recruit, allowing for losses This should be thoroughly described in any ‘Methods’ section! –Example, Bove, JAMA, 280(18); 1998

The lower the , the higher the power The higher the , the lower the power Increased  (e.g. from.01 to.05 or.10) –Increases the chance of saying there is a difference when there is not (Type I error) Decreases the rigor of the test –Decreases the chances of saying there is no difference when there is (Type II error) Increases the power Decreased  (e.g. from.05 or.10 to.01) –Only willing to risk being wrong 1 in 100 times by saying there’s an effect when there isn’t –Limits chances of concluding there’s an effect Lowers the power as well as the Type I error risk

The probability of correctly concluding that A is not equal to B If there is a difference, the probability that you will statistically detect it [1 - p(failing to detect a true difference aka Type II error )] Sample size needed to “power” a RCT must be calculated a priori, and depend upon: –Expected or clinically important difference –Acceptable p-value (Type I error probability) –Acceptable power (1 – Type II error probability) The “power” of a RCT

Sample size calculations When a small difference between groups is considered clinically important… –A larger sample size is needed Setting the significance at.01 instead of.05 –This is increasing the rigor of the study –Less willing to accept Type I error –A larger sample size is needed To increase the odds of recognizing an actual difference (lower the Type II error) –This is increasing the power of the study –A larger sample size is needed

Sample size is not happenstance! To draw conclusions about the effectiveness of treatment (i.e. the difference between 2 groups’ outcomes) the RCT must have the statistical power to detect a real difference –Drawing conclusions about a population based upon a sample –Study says: A = B Caution - Small numbers increase the chance of a Type II error –Study says: A is not = B Caution - Small numbers increase the chance of a Type I error

An essential component of the “Methods” section If a published study does not disclose the details of how they estimated their required sample size, including… –Expected or clinically important difference sought –Acceptable probability of making a Type I error –Desired power to detect a difference if there is one –And the statistical package or computer program used to calculate needed sample size based on the above Then, the statistical conclusions can be interesting, informative, but not convincing!

Absence of evidence is not evidence of absence* RCT’s are intended to statistically detect a difference if there is one What if p>.05? –A ‘negative’ study? – not really –Evidence that the treatments are equivalent? No –Only: “There is no evidence that the groups are different” –Altman and Bland, BMJ 1995; 311:485(19 August)

Once again… Sample size affects the probability of detecting a difference between groups if there is one Sample size affects the probability that a difference between samples reflects a real difference in the underlying population –Not just a random occurrence