Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA.

Slides:



Advertisements
Similar presentations
Biostatistics in Practice Session 3: Testing Hypotheses Peter D. Christenson Biostatistician
Advertisements

Inference Sampling distributions Hypothesis testing.
Chapter 10: Hypothesis Testing
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Introduction to Hypothesis Testing
Nemours Biomedical Research Statistics March 19, 2009 Tim Bunnell, Ph.D. & Jobayer Hossain, Ph.D. Nemours Bioinformatics Core Facility.
Evaluating Hypotheses Chapter 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics.
SADC Course in Statistics Comparing Means from Independent Samples (Session 12)
Evaluating Hypotheses Chapter 9 Homework: 1-9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics ~
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Basic Business Statistics.
Statistics for the Social Sciences Psychology 340 Spring 2005 Hypothesis testing.
Statistics for the Social Sciences Psychology 340 Fall 2006 Hypothesis testing.
BCOR 1020 Business Statistics
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
INFERENTIAL STATISTICS – Samples are only estimates of the population – Sample statistics will be slightly off from the true values of its population’s.
AM Recitation 2/10/11.
Hypothesis Testing:.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview of Statistical Hypothesis Testing: The z-Test
Testing Hypotheses I Lesson 9. Descriptive vs. Inferential Statistics n Descriptive l quantitative descriptions of characteristics n Inferential Statistics.
Chapter 8 Hypothesis testing 1. ▪Along with estimation, hypothesis testing is one of the major fields of statistical inference ▪In estimation, we: –don’t.
Fundamentals of Hypothesis Testing: One-Sample Tests
Tests of significance & hypothesis testing Dr. Omar Al Jadaan Assistant Professor – Computer Science & Mathematics.
Comparing Systems Using Sample Data Andy Wang CIS Computer Systems Performance Analysis.
STATISTICAL INFERENCE PART VII
Inference for a Single Population Proportion (p).
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Jan 17,  Hypothesis, Null hypothesis Research question Null is the hypothesis of “no relationship”  Normal Distribution Bell curve Standard normal.
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
6.1 - One Sample One Sample  Mean μ, Variance σ 2, Proportion π Two Samples Two Samples  Means, Variances, Proportions μ 1 vs. μ 2.
10.2 Tests of Significance Use confidence intervals when the goal is to estimate the population parameter If the goal is to.
Hypothesis Testing Hypothesis Testing Topic 11. Hypothesis Testing Another way of looking at statistical inference in which we want to ask a question.
Chapter 20 Testing hypotheses about proportions
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
Topic 8 Hypothesis Testing Mathematics & Statistics Statistics.
IE241: Introduction to Hypothesis Testing. We said before that estimation of parameters was one of the two major areas of statistics. Now let’s turn to.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 4: Study Size and Power.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size and Power.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Economics 173 Business Statistics Lecture 4 Fall, 2001 Professor J. Petry
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Issues concerning the interpretation of statistical significance tests.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 3: Testing Hypotheses.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Rejecting Chance – Testing Hypotheses in Research Thought Questions 1. Want to test a claim about the proportion of a population who have a certain trait.
Biostatistics in Practice Peter D. Christenson Biostatistician Session 4: Study Size for Precision or Power.
Chapter 21: More About Tests
Welcome to MM570 Psychological Statistics
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
© Copyright McGraw-Hill 2004
STA Lecture 221 !! DRAFT !! STA 291 Lecture 22 Chapter 11 Testing Hypothesis – Concepts of Hypothesis Testing.
Statistical Inference Statistical inference is concerned with the use of sample data to make inferences about unknown population parameters. For example,
Biostatistics in Practice Peter D. Christenson Biostatistician LABioMed.org /Biostat Session 3: Testing Hypotheses.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Today: Hypothesis testing p-value Example: Paul the Octopus In 2008, Paul the Octopus predicted 8 World Cup games, and predicted them all correctly Is.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
Hypothesis Tests for 1-Proportion Presentation 9.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 21 More About Tests and Intervals.
Testing Hypotheses I Lesson 9.
Presentation transcript:

Biostatistics in practice Session 3 Youngju Pak, Ph.D. UCLA Clinical and Translational Science Institute LA BioMed/Harbor-UCLA Medical Center LA BioMed/Harbor-UCLA Medical Center School of Medicine /23/20151

Table of Contents  Analogy of Hypothesis Testing  How to compute a P-Values and interpret it  Understanding the sampling distribution and a confidence intervals (CI)  How to interpret a CI  The relationship between a P-Value and a CI 10/23/20152

Population Sample Sample estimate of population parameter/ Descriptive statistics Population parameter Sampling mechanism: random sample or convenience sample C.I.s or P- values for population parameter The procedure of statistical inferences 10/23/20153

Analogy for hypothesis testing Example: a bet between two friends Suppose you and a friend were playing a “fun” gambling game. Your friend has a coin which you flip: if “tails”, your friend pays you a $1, if “tails”, your friend pays you a $1, if “heads”, you pay your friend a $1 if “heads”, you pay your friend a $1 After 10 plays, you got 9 heads up. Do you trust your friend? Is this a fair coin? What is your argument? After 10 plays, you got 9 heads up. Do you trust your friend? Is this a fair coin? What is your argument? 10/23/20154

Statistical Hypothesis Testing  H 0 : Fair coin (null) vs. H a : Unfair coin (alternative)  Assume the coin is “fair” (Assume H 0 is true)  You and your friend have to put a threshold value on the definition of “being RARE”. That means that if Prob (# of H=9 or more |10 trials) is less than a certain value, say, α, then we will consider that 9 heads out of 10 trials are RARELY happen when the coin is fair, thus very unlikely to happen if the coin was fair. Then the rule is Prob.(# of H= 9 or more |10 trials) < 0.05 (α) ← Type I error rate ( = a level of significance) < 0.05 (α) ← Type I error rate ( = a level of significance) then your friend would agree to conclude it was not a fair coin thus reject H 0 in favor of H a. then your friend would agree to conclude it was not a fair coin thus reject H 0 in favor of H a. 10/23/20155

Statistical Hypothesis Testing continue.  Collect data and provide the “evidence” if H 0 : Fair coin is true. P(# of H=9 or more |10 trials)≈ (1.1%) P(# of H=9 or more |10 trials)≈ (1.1%)  Make decision P(# of H=9 or more |10 trials) ≈ 1.1 (%) < 5% P(# of H=9 or more |10 trials) ≈ 1.1 (%) < 5%  Thus, it is VERY unlikely to happen if it was a fair coin. coin.  We found a significant evidence to disapprove H 0 in favor of Ha. H 0 in favor of Ha.  Therefore, conclude that it was an UNFAIR coin (thus, the bet is invalid). 10/23/20156

How to interpret P Value=1.1(%), in general ?  A P Value is predicted on the assumption that H 0 is true  A P Value is NOT a probability of the alternative being correct.  A P Value should be used as an evidence to DISPROVE H 0, not to prove the Ha. 10/23/20157

How to interpret P-Values: Example Acute secondary Adrenal Insufficiency (AI) after Traumatic Brain Injury (TBI): a prospective study Objective: To determine the prevalence, clinical characteristics, and effect of AI on TBI patients Procedure: 80 TBI and 41 non-TBI patients were followed during the hospitalization up to 9 days, blood samples taken every 8 hours and vital signs recorded every hour. Subject is AI if 2 successive serum cortisols are low.

Goal: Do Groups Differ By More than is Expected By Chance? First, need to: Specify experimental units (Persons? Blood draws?). Specify single outcome for each unit (e.g., Yes/No(binary) or continuous?). Examine raw data, e.g., histogram, for meeting test assumptions. Specify group summary measure to be used (e.g., % or mean, median over units)  Descriptive statistics. Choose particular statistical test for the outcome and make inference with inferential statistics (CI, P-Value).

Outcome Type → Statistical Test Cohan (2005) Crit Care Med;33: Medians %s Means Wilcoxon Test ChiSquare Test t Test

t-Test for Minimal Mean Arterial Pressure(MAP): Step 1 1. Calculate a standardized quantity for the particular test, a “test statistic”. Diff in Group Means = = 7.2 (“Signal”) SE(Diff) ≈ sqrt[SEM SEM 2 2 ] = sqrt( ) ≈ 2.2 (“Noise” due to random sampling) AI N 42 Mean Std Dev SE(Mean) 1.66=10.78/√42 Non AI N 38 Mean Std Dev SE(Mean) 1.41=8.71/√38 → Test Statistic = t = ( )/2.2 = 3.28 Signal to Noise Ratio

t-Test for Minimal MAP: Step 2 2.Compare the test statistic to what it is expected to be if (populations represented by) groups do not differ(H 0 ). Often: t is approx’ly normal bell curve. Expect 0.95 Chance Observed = 3.28 Is the t-test statistics of 3.28 seems to be “RARE” to you? Why? Prob (-2 to -1) is Area = % 22

t-Test for Minimal MAP: P-Value Expected 95% When H 0 is true 95% Chance Observed = 3.28 P-Value=Prob. ( T-statistics > 3.28)=0.0007(One-sided) In practice, a two sided p- value is usually used. Two sided P-Value = 2 x One-sided P-value =2 x = < 0.05  Conclude: Groups differ since ≥3.28 has <5% if no difference in the entire  Smaller values ↔ more evidence of group differences. Area = Declare groups to differ if test statistic is RARE under H0 is true[How much RARE?]

One sided or Two sided P-Values? There are other types of t-tests: A two-sided P-value assumes that differences (between groups or pre-to-post) are possible in both directions, e.g., increase or decrease. A one-sided P-value assumes that these differences can only be either an increase or decrease, or one group can only have higher or lower responses than the other group. This is very rare, and generally not acceptable.

Tests on Percentages Is 26.3% vs. 61.9% statistically significant (p<0.05), i.e., a difference too large to have a <5% of occurring by chance if groups do not really differ? Solution: Same theme as for means. Find a test statistic and compare to its expected values if groups do not differ. See next slide.

Tests on Percentages Cannot use t-test for comparing lab data for multiple blood draws per subject. Expect 1 Observed = 10.2 Area = Chi-Square Distribution 95% Chance 5.99 Here, the signal in the test statistic is a squared quantity, expected to be 1. Test statistic=10.2 >> 5.99, so p<0.05. In fact, p=0.002.

Tests on Percentages: Chi-Square The chi-square test statistic (10.2 in the example) is found by first calculating what is the expected number of AI patients with MAP <60 and the same for non-AI patients, if AI and non-AI really do not differ for this. Then, chi-square is found as the sum of standardized ∑ (Observed – Expected) 2 / Expected. This should be close to 1, as in the graph on the previous slide, if groups do not differ. The value 10.2 seems too big(extreme) to have happened by chance (probability=0.002),i.e., if there is no difference among “all” TBI subjects(H 0 ).

How RARE is being “RARE”? Expect >99% ChanceObserved = 3.28 Convention: “Too deviant” is ~2. Why not choose, say, |t|>3, so that our chances of being wrong are even less, <1%? <0.5% Answer: Then the chances of missing a real difference are increased, the converse wrong conclusion. This is analogous to setting the threshold for a diagnostic test of disease.

A statistically significant result --- A statistically significant result ---  is not necessarily an important or even interesting result  may not be scientifically interesting or clinically significant.  With large sample sizes, very small differences may turn out to be statistically significant. In such a case, practical implications of any findings must be judged on other than statistical grounds.  Statistical significance does not imply practical significance 10/23/201519

How to interpret insignificant p-values  Possible answers 1.There is no difference (H 0 is true). 2.There is a real difference (Ha is true) but fail to detect due to small sample size– Type II error  There is no way to determine whether a non- significant difference is the result of a small sample size or because the null hypothesis is correct.  Thus, insignificant P-Values should almost always be regarded as INCONCLUSIVE rather than an indication of no effect. (Fail to reject the null.).  Insignificant p-value does NOT prove H 0. 10/23/201520

Back to Paper: Normal Range What is the “normal” range for lowest MAP in AI patients, i.e., 95% of subjects were in approximately what range? SD = 8.7 SD = 10.8 N = 38 N = 42

Back to Paper: Normal Range What is the “normal” range for lowest MAP in AI patients, i.e., 95% of subjects were in approximately what range? Answer: 56.2 ± 2(10.8) ≈ 35 to 78 SD = 8.7 SD = 10.8 N = 38 N = 42

Back to Paper: Confidence Intervals Δ= = 7.2 is the best guess for the MAP diff between the means of “all” AI and non-AI patients. We are 95% sure that diff is within ≈ 7.2±2SE(Diff) = 7.2±2(2.2) = 2.8 to SD = 8.7 SD = 10.8 N = 38 N = 42 SE = 1.41 SE = 1.66 SE(Diff of Means) = 2.2 SE(Diff) ≈ sqrt of [SEM SEM 2 2 ]

Sampling distribution and CI  Sampling distribution: A distribution of a statistics (such as a sample mean or a t-test statistics) with repeated sampling from a target population.  We can calculate statistics from one random sample and use that statistics as point estimate for population.  But how precise that statistics is based on the sampling distribution of that statistics  Since a sample mean is used most commonly, the sampling distribution of the mean are used most commonly.  Simulation of a sampling distribution or a confidence interval of the sample mean the sample mean  go to

Confidence Interval  When your study is under powered(e.g., pilot data) or over powered(e.g., national surveys), the confidence interval provide the range for where true effect ( a population parameter) lies.  How well your sample mean (m) reflect the true mean?  Generic form of 95% CI for the mean(proportion) Lower limit: Sample mean(proportion) – 1.96* SE Lower limit: Sample mean(proportion) – 1.96* SE Upper limit: Sample mean (proportion) * SE Upper limit: Sample mean (proportion) * SE, 1.96* SE also usually called “the margin of the error”., 1.96* SE also usually called “the margin of the error”.  SE is measures the variability in the sampling distribution of the sample mean (or proportion) from a repeated sampling. 10/23/201525

Revisiting the food additives study 2. Look at the left side of the bottom panel of Figure 3 and recall what we have said about confidence intervals. Would you conclude that there is a change in hyperactivity under Mix A? 3. Repeat question 2 for placebo.

Revisiting the food additive study cont.

Possible values for real effect. Zero is “ruled out”.

Revisiting the food additive study cont. 4. Do you think that the positive conclusion for question #3 has been "proven"? Yes, with 95% confidence. 5. Do you think that the negative conclusion for question #2 has been "proven"? No, since more subjects would give a narrower confidence interval. Hypothesis testing make a Yes or No conclusion whether there is an effect and quantifies the chances of a correct conclusion either way. Confidence intervals give possible magnitudes of effects.

Confidence Intervals ↔ Hypothesis tests p>0.05 p≈0.05 p<0.05 The food additives study

Confidence Intervals ↔ Hypothesis tests 95% Confidence Intervals Non-overlapping 95% confidence intervals, as here, are sufficient for significant (p<0.05) group differences. However, overlapping is not necessary. They can overlap and still groups can differ significantly. The AI study

Power of a Study Statistical power is the sensitivity of a study to detect real effects, if they exist. It needs to be balanced with the likelihood of wrongly declaring effects when they are non- existent. Today, we have been keeping that error at <5%. Power is the topic for the next session #4.