Statistical Inference June 30-July 1, 2004 Statistical Inference The process of making guesses about the truth from a sample. Sample (observation) Make.

Slides:



Advertisements
Similar presentations
Pitfalls of Hypothesis Testing + Sample Size Calculations.
Advertisements

Comparing Two Proportions (p1 vs. p2)
Chapter 12: Testing hypotheses about single means (z and t) Example: Suppose you have the hypothesis that UW undergrads have higher than the average IQ.
Inference Sampling distributions Hypothesis testing.
Significance Tests About
Statistical Techniques I EXST7005 Lets go Power and Types of Errors.
Introduction to Statistics
Thursday, September 12, 2013 Effect Size, Power, and Exam Review.
Chapter 10: Hypothesis Testing
Statistical Significance What is Statistical Significance? What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant?
HYPOTHESIS TESTING Four Steps Statistical Significance Outcomes Sampling Distributions.
Statistics for the Social Sciences
Lecture 5 Outline – Tues., Jan. 27 Miscellanea from Lecture 4 Case Study Chapter 2.2 –Probability model for random sampling (see also chapter 1.4.1)
Statistical Significance What is Statistical Significance? How Do We Know Whether a Result is Statistically Significant? How Do We Know Whether a Result.
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
BCOR 1020 Business Statistics
Statistical inference: CLT, confidence intervals, p-values
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Inference about Population Parameters: Hypothesis Testing
The problem of sampling error in psychological research We previously noted that sampling error is problematic in psychological research because differences.
Chapter 5For Explaining Psychological Statistics, 4th ed. by B. Cohen 1 Suppose we wish to know whether children who grow up in homes without access to.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Overview Definition Hypothesis
Inference in practice BPS chapter 16 © 2006 W.H. Freeman and Company.
Significance Tests …and their significance. Significance Tests Remember how a sampling distribution of means is created? Take a sample of size 500 from.
More About Significance Tests
Inference for a Single Population Proportion (p).
CHAPTER 16: Inference in Practice. Chapter 16 Concepts 2  Conditions for Inference in Practice  Cautions About Confidence Intervals  Cautions About.
Sample size determination Nick Barrowman, PhD Senior Statistician Clinical Research Unit, CHEO Research Institute March 29, 2010.
F OUNDATIONS OF S TATISTICAL I NFERENCE. D EFINITIONS Statistical inference is the process of reaching conclusions about characteristics of an entire.
Health and Disease in Populations 2001 Sources of variation (2) Jane Hutton (Paul Burton)
Hypothesis Testing: One Sample Cases. Outline: – The logic of hypothesis testing – The Five-Step Model – Hypothesis testing for single sample means (z.
LECTURE 19 THURSDAY, 14 April STA 291 Spring
Instructor Resource Chapter 5 Copyright © Scott B. Patten, Permission granted for classroom use with Epidemiology for Canadian Students: Principles,
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.
Agresti/Franklin Statistics, 1 of 122 Chapter 8 Statistical inference: Significance Tests About Hypotheses Learn …. To use an inferential method called.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Chapter 8 Delving Into The Use of Inference 8.1 Estimating with Confidence 8.2 Use and Abuse of Tests.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Copyright © 2010, 2007, 2004 Pearson Education, Inc Section 8-2 Basics of Hypothesis Testing.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 8 Hypothesis Testing.
Ch 10 – Intro To Inference 10.1: Estimating with Confidence 10.2 Tests of Significance 10.3 Making Sense of Statistical Significance 10.4 Inference as.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Issues concerning the interpretation of statistical significance tests.
Lecture 17 Dustin Lueker.  A way of statistically testing a hypothesis by comparing the data to values predicted by the hypothesis ◦ Data that fall far.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Stats Lunch: Day 3 The Basis of Hypothesis Testing w/ Parametric Statistics.
Statistical Analysis II Lan Kong Associate Professor Division of Biostatistics and Bioinformatics Department of Public Health Sciences December 15, 2015.
Introduction to Biostatistics, Harvard Extension School © Scott Evans, Ph.D.1 Hypothesis Testing And the Central Limit Theorem.
Statistical Techniques
AP Statistics Chapter 11 Notes. Significance Test & Hypothesis Significance test: a formal procedure for comparing observed data with a hypothesis whose.
A significance test or hypothesis test is a procedure for comparing our data with a hypothesis whose truth we want to assess. The hypothesis is usually.
1 Probability and Statistics Confidence Intervals.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
SAMPLING DISTRIBUTION OF MEANS & PROPORTIONS. SAMPLING AND SAMPLING VARIATION Sample Knowledge of students No. of red blood cells in a person Length of.
Uncertainty and confidence Although the sample mean,, is a unique number for any particular sample, if you pick a different sample you will probably get.
Hypothesis Tests for 1-Proportion Presentation 9.
Reasoning in Psychology Using Statistics Psychology
Chapter 9.1, 9.3, 10.2 Sampling Distributions in general; Sampling Distributions for means in particular Hypothesis Testing.
The Normal Distribution
Statistics for the Social Sciences
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Statistical inference: distribution, hypothesis testing
Central Limit Theorem, z-tests, & t-tests
Reasoning in Psychology Using Statistics
CS639: Data Management for Data Science
Presentation transcript:

Statistical Inference June 30-July 1, 2004

Statistical Inference The process of making guesses about the truth from a sample. Sample (observation) Make guesses about the whole population Truth (not observable)

FOR EXAMPLE: What’s the average weight of all medical students in the US? 1.We could go out and measure all US medical students (>65,000) 2.Or, we could take a sample and make inferences about the truth from our sample. Using what we observe, 1. We can test an a priori guess (hypothesis testing). 2. We can estimate the true value (confidence intervals).

Statistical Inference is based on Sampling Variability Sample Statistic – we summarize a sample into one number; e.g., could be a mean, a difference in means or proportions, or an odds ratio – E.g.: average blood pressure of a sample of 50 American men – E.g.: the difference in average blood pressure between a sample of 50 men and a sample of 50 women Sampling Variability – If we could repeat an experiment many, many times on different samples with the same number of subjects, the resultant sample statistic would not always be the same (because of chance!). Standard Error – a measure of the sampling variability (a function of sample size).

Sampling Variability Random students The Truth (not knowable) The average of all 65,000+ US medical students at this moment is exactly 150 lbs lbs189.3 lbs92.1 lbs152.3 lbs169.2 lbs110.3 lbs

Sampling Variability Random samples of 5 students The Truth (not knowable) The average of all 65,000+ US medical students at this moment is exactly 150 lbs lbs139.3 lbs152.1 lbs158.3 lbs149.2 lbs170.3 lbs

Sampling Variability Samples of 50 students The Truth (not knowable) The average of all 65,000+ US medical students at this moment is exactly 150 lbs lbs lbs lbs lbs lbs lbs

Sampling Variability Samples of 150 students The Truth (not knowable) The average of all 65,000+ US medical students at this moment is exactly 150 lbs lbs lbs lbs lbs lbs lbs

The Central Limit Theorem: how sample statistics vary Many sample statistics (e.g., the sample average) follow a normal distribution – centers around the true population value (e.g. the true mean weight) – Becomes less variable (by a predictable amount) as sample size increases: Standard error of a sample statistic = standard deviation / square root (sample size) Remember: standard deviation reflects the average variability of the characteristic in the population

The Central Limit Theorem: Illustration I had SAS generate 1000 random observations from the following probability distributions: ~N(10,5) ~Exp(1) Uniform on [0,1] ~Bin(40,.05)

~N(10,5)

Uniform on [0,1]

~Exp(1)

~Bin(40,.05)

The Central Limit Theorem: Illustration I then had SAS generate averages of 2, averages of 5, and averages of 100 random observations from each probability distributions… (Refer to end of SAS LAB ONE, which we will implement next Wednesday, July 7)

~N(10,25): average of 1 (original distribution)

~N(10,25): 1000 averages of 2

~N(10,25): 1000 averages of 5

~N(10,25): 1000 averages of 100

Uniform on [0,1]: average of 1 (original distribution)

Uniform: 1000 averages of 2

Uniform: 1000 averages of 5

Uniform: 1000 averages of 100

~Exp(1): average of 1 (original distribution)

~Exp(1): 1000 averages of 2

~Exp(1): 1000 averages of 5

~Exp(1): 1000 averages of 100

~Bin(40,.05): average of 1 (original distribution)

~Bin(40,.05): 1000 averages of 2

~Bin(40,.05): 1000 averages of 5

~Bin(40,.05): 1000 averages of 100

The Central Limit Theorem: formally The Central Limit Theorem: formally If all possible random samples, each of size n, are taken from any population with a mean  and a standard deviation , the sampling distribution of the sample means (averages) will: 1. have mean: 2. have standard deviation: 3. be approximately normally distributed regardless of the shape of the parent population (normality improves with larger n)

Example Pretend that the mean weight of medical students was 128 lbs with a standard deviation of 15 lbs…

Hypothetical histogram of weights of US medical students (computer-generated) mean= 128 lbs; standard deviation = 15 lbs Standard deviation reflects the natural variability of weights in the population

Average weights from 1000 samples of 2

Average weights from 1000 samples of 10

Average weights from 1000 samples of 120

Using Sampling Variability In reality, we only get to take one sample!! But, since we have an idea about how sampling variability works, we can make inferences about the truth based on one sample.

Hypothesis Testing

The null hypothesis is the “straw man” that we are trying to shoot down. Example 1: Possible null hypothesis: “mean weight of medical students = 128 lbs” Let’s say we take one sample of 120 medical students and calculate their average weight….

Expected Sampling Variability for n=120 if the true weight is 128 (and SD=15) What are we going to think if our 120-student sample has an average weight of 143??

“P-value” associated with this experiment “P-value” (the probability of our sample average being 143 lbs or more IF the true average weight is 128) <.0001 Gives us evidence that 128 isn’t a good guess

Estimation (a preview) We’d estimate based on these data that the average weight is somewhere closer to 143 lbs. And we could state the precision of this estimate (a “confidence interval”—to come later)

Expected Sampling Variability for n=2 What are we going to think if our 2-student sample has an average weight of 143?

P-value = 11% i.e. about 11 out of 100 “average of 2” experiments will yield values 143 or higher even if the true mean weight is only 128 Expected Sampling Variability for n=2

The P-value P-value is the probability that we would have seen our data (or something more unexpected) just by chance if the null hypothesis (null value) is true. Small p-values mean the null value is unlikely given our data.

The P-value By convention, p-values of <.05 are often accepted as “statistically significant” in the medical literature; but this is an arbitrary cut-off. A cut-off of p<.05 means that in about 5 of 100 experiments, a result would appear significant just by chance (“Type I error”).

What factors affect the p- value?  The effect size  Variability of the sample data  Sample size**

Statistical Power Note that, though we found the same sample value (143 lbs) in our 120-student sample and our 2- student sample, we only rejected the null (and concluded that med students weigh more on average than 128 lbs) based on the 120-student sample. Larger samples give us more statistical power…

Hypothesis Testing: example 2 Hypothesis: more babies born in November (9 months after Valentine’s Day) Empirical evidence: Our researcher observed that 6/19 kids in one classroom had November birthdays.

Hypothesis Testing Is a contest between… The Null Hypothesis and the Alternative Hypothesis – The null hypothesis (abbreviated H 0 ) is usually the hypothesis of no difference Example: There are no more babies born in November (9 months after Valentine’s Day) than any other month – The alternative hypothesis (abbreviated H a ) Example: There are more babies born in November (9 months after Valentine’s Day) than in other months

The Steps 1. Define your null and alternative hypotheses: –H 0 : P(being born in November)=1/12 –H a : P(being born in November)>1/12 “one-sided” test

The Steps 2. Figure out the “null distribution”: –If I observe a class of 19 students and each student has a probability of 1/12 th of being born in November… –Sounds BINOMIAL! –In MATH-SPEAK: Class ~ binomial (19, 1/12 th ) ***If the null is true, how many births should I expect to see? – Expected November births= 19*(1/12)= 1.5 why? – Reasonable Variability = [19*(1/12)*(11/12)] **1/2 = 1.2 why? If I see 0-3 November births, it seems reasonable that the null is true…anything else is suspicious…

The Steps 3. Observe (experimental data) We see 6/19 babies were born in November in this case.

The Steps 4. Calculate a “p-value” and compare to a preset “significance level”

The Almighty P-Value The P-value roughly translated is… “the probability of seeing something as extreme as you did due to chance alone” Example: The probability that we would have seen 6 or more November births out of 19 if the probability of a random child being born in November was only 1/12. Easy to Calculate in SAS: data _null_; pval = 1- CDF('BINOMIAL',5, (1/12), 19); put pval; run; Based on the null distribution

The Steps 4a. Calculate a “p-value” data _null_; pval = 1- CDF('BINOMIAL',5, (1/12), 19); put pval; run; b. and compare to a preset “significance level”…..0035<.05 5% is often chosen due to convention/history

The Steps 5. Reject or fail to reject (accept) H o. In this case, reject H o.

Summary: The Underlying Logic… Follows this logic: Assume A. If A, then B. Not B. Therefore, Not A. But throw in a bit of uncertainty…If A, then probably B…

Summary: It goes something like this… The assumption: The probability of being born in November is 1/12 th. If the assumption is true, then it is highly likely that we will see fewer than 6 November-births (since the probability of seeing 6 or more is.0035, or 3-4 times out of 1000). We saw 6 November-births. Therefore, the assumption is likely to be wrong.

Example 3: the odds ratio Null hypothesis: There is no association between an exposure and a disease (odds ratio=1.0).

Example 3: Sampling Variability of the null Odds Ratio (OR) (100 cases/100 controls/10% exposed)

The Sampling Variability of the natural log of the OR (lnOR) is more Gaussian Sample values far from lnOR=0 give us evidence of an association. These values are very unlikely if there’s no association in nature.

Statistical Power Statistical power here is the probability of concluding that there is an association between exposure and disease if an association truly exists. – The stronger the association, the more likely we are to pick it up in our study. – The more people we sample, the more likely we are to conclude that there is an association if one exists (because the sampling variability is reduced).

Error and Power Type-I Error (false positive): – Concluding that the observed effect is real when it’s just due to chance. Type-II Error (false negative): – Missing a real effect. POWER (the flip side of type-II error): – The probability of seeing a real effect.

Your Decision The TRUTH God ExistsGod Doesn’t Exist Reject God BIG MISTAKECorrect Accept God Correct— Big Pay Off MINOR MISTAKE Think of… Pascal’s Wager

Type I and Type II Error in a box Your Statistical Decision True state of null hypothesis (H 0 ) H 0 TrueH 0 False Reject H 0 Type I error (α)Correct Do not reject H 0 Correct Type II Error (β)

Statistical vs. Clinical Significance Consider a hypothetical trial comparing death rates in 12,000 patients with multi-organ failure receiving a new inotrope, with 12,000 patients receiving usual care. If there was a 1% reduction in mortality in the treatment group (49% deaths versus 50% in the usual care group) this would be statistically significant (p<.05), because of the large sample size. However, such a small difference in death rates may not be clinically important.

Confidence Intervals (Estimation)

Confidence intervals don’t presuppose a null value. Shows our best guess at the plausible range of values for the population characteristic based on our data. The 95% confidence interval contains the true population value approximately 95% of the time.

95% CI should contain true value ~ 19/20 times X = TRUE VALUE ( X ) ( X ) ( X ) X ( ) ( X ) ( X ) (----X )

Confidence Intervals (Sample statistic)  (measure of how confident we want to be)  (standard error)

95% CI from a sample of 120: 143 +/- 2 x (1.37) =

95% CI from a sample of 10: 143 +/- 2 x (4.74) = –152.48

99.7% CI from a sample of 10: 143 +/- 3 x (4.74) = –157.22

What Confidence Intervals do  They indicate the un/certainty about the size of a population characteristic or effect. Wider CI’s indicate less certainty.  Confidence intervals can also answer the question of whether or not an association exists or a treatment is beneficial or harmful. (analogous to p-values…) e.g., if the CI of an odds ratio includes the value 1.0 we cannot be confident that exposure is associated with disease.