Download presentation
Presentation is loading. Please wait.
Published byEverett Jacobs Modified over 8 years ago
1
1 Basics of Inferential Statistics Mark A. Weaver, PhD Family Health International Office of AIDS Research, NIH ICSSC, FHI Lucknow, India, March 2010
2
2 Outline 1.What are inferential statistics? 2.Randomization versus random sampling 3.Two methods of statistical inference –Hypothesis Testing –Estimation and Confidence Intervals
3
3 What Are Inferential Statistics? Inferential Statistics – methods for drawing conclusions about a population based on data from a sample of that population
4
4 Framework Population (N) Sample (n) N = Population Size n = Sample size Collect data in the sample and conduct data analysis Make inferences
5
5 Foundation for Inferential Statistics Q: What allows us to make valid statistical inferences about a large population based only on a relatively small sample? A: PROBABILITY Q: Where does probability come from? A: A random process i.e., random sampling or randomization
6
6 Two Special Cases Two important cases for which we are able to derive exact probability distributions on which to base inferences: 1.Random sampling 2.Randomization Note: important distinction between randomization and random sampling! –Both induce randomness required for statistical inference. –However, allow for different “spaces” of inference –Rarely used together in the same study!
7
77 Random Sampling Helps to establish external validity (generalizability) –Study population is randomly drawn from target pop. –Useful for estimating/comparing parameters from larger target populations. –Statistical inference extends to target population.
8
88 Randomization Helps to maintain internal validity (unbiased assessment of intervention effect) –Study population is typically a convenience sample Not random! Meet inclusion criteria and provide consent –Useful for comparing effectiveness of interventions within your study population. –Fairly strong assumptions required to generalize results beyond study population –Statistical inference really only applies to randomized subjects!
9
9 Inference in Nonrandom Studies? Random sampling is implicitly assumed when –Applying statistical models, p-values, or confidence intervals to observational data –Generalizing statistical inferences from an RCT to some broader population Such generalizations have been called non- statistical inference, or inferences without a basis in probability Rely on “clinical judgment” as opposed to “statistical inference” How reasonable is this implicit assumption…
10
10 “Arguments regarding the ‘representativeness’ of a nonrandomly selected sample are irrelevant to the question of its randomness: a random sample is random because of the sampling procedure used to select it, not because of the composition of the sample.” Edgington and Onghena (2007), Randomization Tests, 4 th ed.
11
11 “I have never met random samples except when sampling has been under human control and choice as in random sampling from a finite population or in experimental randomization in the comparative experiment.” Kempthorne (1979), Sankhya
12
12 “In most epidemiologic studies, randomization and random sampling play little or no role in the assembly of study cohorts. I therefore conclude that probabilistic interpretation of conventional statistics are rarely justified, and that such interpretations may encourage misinterpretation of nonrandomized studies.” Greenland (1990), Epidemiology
13
13 Inference in Nonrandom Studies? How reasonable is this implicit assumption of random sampling? Be very careful not to overstate your inferences from nonrandom studies! –Many published clinical findings from nonrandom studies have failed to replicate in RCTs –See Ioannidis, JAMA 2005 and PLoS Med. 2005
14
14 The Logic of Hypothesis Testing Statistical hypothesis – a statement about parameters of a population Null hypothesis (H 0 ) – the hypothesis to be tested, often includes hypothesis of no difference –H 0 : Avg. BP in group A ≥ Avg. BP in group B Alternative hypothesis (H A ) – corresponds to the research hypothesis –H A : Avg. BP in group A < Avg. BP in group B H 0 and H A - mutually exclusive and exhaustive
15
15 The Logic of Hypothesis Testing Goal of hypothesis testing – reject H 0 ! How do we decide to reject or not? –Obtain data via a random process –If data are consistent with H 0, then do not reject –Otherwise, if data are inconsistent, then reject H 0 and conclude H A P-value = probability of getting test statistic as or more extreme than one observed, assuming H 0 is true Decision rule: –If p-value < , reject H 0 –If p-value > , do not reject H 0
16
16 Type I and Type II Errors and Power In truth, H 0 is either true or false, but we never get to know the truth. Based only on observed data, we decide to either reject H 0 or not. Truth H 0 is trueH 0 is false Decision Do not reject H 0 Correct decision (1 - ) Type II error Reject H 0 Type I error = significance level Correct decision (1 - ) = Power
17
17 Interpreting “The Power to Detect” Suppose protocol says “study has 90% power to detect a mean difference of 5 between groups.” This does not mean: 1.There is a 90% chance to conclude that the true mean difference between groups is at least 5 2.That there is a 90% chance of observing a mean difference between groups of at least 5 It simply means that there is an 90% chance of making a decision to reject the null hypothesis if the true (but unknown) mean difference is 5
18
18 Interpreting Results Suppose we decided before the study that α = 0.05, study designed w/ 90% power to detect mean difference of 5 Suppose we observe p-value = 0.06 –Reject or not? –What is the probability that we would be making a type II error if we decide to not reject H 0 in this case? Now suppose we observe p-value of 0.001 –Reject or not? –What is the probability of a type I error here? How does knowing a study’s power help interpret results?
19
19 Absence of Evidence … Is not evidence of absence! That is, not rejecting the null hypothesis does not provide evidence that the null is true. P-values provide evidence against the null. Avoid the following conclusions: –“no difference between groups” –“treatment was ineffective” –“no association between X and Y” Instead, say “insufficient evidence of” difference, effect, or association
20
20 Genesis of a Confidence Interval μ -1.96 SE+1.96 SE +1.96 se-1.96 se
21
21 Interpreting a Confidence Interval For a 95% CI, we have 95% “confidence” that the true value is somewhere within the interval –True value is either in the observed interval or it is not, we can’t know for sure Have no more confidence regarding the center of the interval than we do around the endpoints –True value can be anywhere within the interval, not necessarily near the middle –Point estimate should not necessarily be regarded as “best estimate” or most likely value for true value
22
22 Confidence Intervals and Hypothesis Tests Close relationship between confidence intervals and hypothesis tests In fact, a confidence interval can be regarded as a family of hypothesis tests –Any value not contained within a (1 – α)% CI would have been rejected by a 2-sided test of size α –Example: 95% CI for OR (1.25, 2) –CI does not contain the value 1 –Thus, can reject H 0 : OR = 1 at the 5% level –CIs used frequently for conducting tests in certain contexts, e.g., non-inferiority designs
23
23 Key Points Statistical inference requires a random process Random sampling and randomization are not the same thing Goal of hypothesis testing is (almost) always to reject the null hypothesis –Deciding not to reject tells you little –Don’t over-interpret non-significant results Confidence intervals provide a range of plausible values for true parameters, none of which is more “likely” to be the true one
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.