Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.1 Counts and Proportions.

Slides:



Advertisements
Similar presentations
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Advertisements

BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Inferential Statistics & Hypothesis Testing
Confidence Intervals © Scott Evans, Ph.D..
Copyright © Cengage Learning. All rights reserved. 8 Tests of Hypotheses Based on a Single Sample.
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Point and Confidence Interval Estimation of a Population Proportion, p
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
The Simple Regression Model
Evaluating Hypotheses
Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.1 Counts and Proportions.
Copyright (c) Bani K. Mallick1 STAT 651 Lecture #15.
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
BS704 Class 7 Hypothesis Testing Procedures
Inferences About Process Quality
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
BCOR 1020 Business Statistics
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
5-3 Inference on the Means of Two Populations, Variances Unknown
Sample Size Determination
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Inference for regression - Simple linear regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Simple Linear Regression
Binomial and Related Distributions 學生 : 黃柏舜 學號 : 授課老師 : 蔡章仁.
More About Significance Tests
Inference for a Single Population Proportion (p).
1 Today Null and alternative hypotheses 1- and 2-tailed tests Regions of rejection Sampling distributions The Central Limit Theorem Standard errors z-tests.
Statistical Review We will be working with two types of probability distributions: Discrete distributions –If the random variable of interest can take.
Mid-Term Review Final Review Statistical for Business (1)(2)
Testing means, part II The paired t-test. Outline of lecture Options in statistics –sometimes there is more than one option One-sample t-test: review.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
1 Chapter 10: Introduction to Inference. 2 Inference Inference is the statistical process by which we use information collected from a sample to infer.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Confidence intervals and hypothesis testing Petter Mostad
Large sample CI for μ Small sample CI for μ Large sample CI for p
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 8-1 Chapter 8 Fundamentals of Hypothesis Testing: One-Sample Tests Statistics.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 9 Testing a Claim 9.2 Tests About a Population.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Counts and Proportions.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
SECTION 1 TEST OF A SINGLE PROPORTION
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Inference for a Single Population Proportion (p)
Two-Sample Hypothesis Testing
Sampling Distributions
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Sampling Distributions
Chapter 9 Hypothesis Testing.
Introduction to Sampling Distributions
Presentation transcript:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.1 Counts and Proportions

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.2 Counts and Proportions  Coin flipping  What’s the chance that heads comes up 8+ times in 10 flips of a coin?  Binomial distribution  Celtics’ winning record  Are the Celtics really an average (.500) team despite their great start this season?  One-sample test of proportion  Who is the better free throw shooter, Kevin Garnett or Ray Allen?  Two-sample test of two proportions  Biomedical Research…

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.3 Where we’ve been… and where we’re going Variables of Interest? One Variable Continuous (Methods from Before Midterm) Binary One-sample test of proportion Two-sample test t of proportion Exact Methods Normal Approximation Two Variables Both Continuous Interested in prediction Simple Linear Regression Interested in association Both variables normal Pearson Correlation Not normal Spearman Correlation One Continuous, one categorical ANOVA Both Binary (in two weeks) More than Two Variables Multiple Linear Regression

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.4 Binary Data  Often in medical and public health studies, our endpoint of interest is binary or dichotomous  Examples  disease vs. no disease  response vs. no response  death vs. no death  success vs. failure

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.5 Binary Data  Continuous endpoints often dichotomized into a binary endpoint as well  For example, in a study of the effect of a drug on LDL levels, for each subject, the LDL measurement at the end of the study (a continuous measure) may be dichotomized into “response” vs. “no response”  Based on a cut-point defining whether the LDL level has been reduced to acceptable, normal, or safe levels

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.6 Binary Data 1.There are only two possible (mutually exclusive) outcomes, often called a “success” and a “failure” 2.Each experiment is identical to all the others, and the probability of a “success” is p  Thus, the probability of a “failure” is 1-p  Each experiment is called a Bernoulli trial  E.g. throwing the die and observing whether or not it comes up six, tossing a coin, or investigating the survival of a cancer patient

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.7 Binary Data: Smoking Status Example  Define:  X=1 for a smoker and  X=0 for a non-smoker.  If “success” is the event that a randomly selected individual is a smoker, and from previous research it is known that about 29% of the adults in the United States smoke, then: and  This is an example of a Bernoulli trial  We select just one individual at random, each selection is carried out independently, and each time the probability of that individual being a “success” is constant

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.8 Binary Data: Smoking Status Example  If we selected two adults, and looked at their smoking status, then the possible outcomes are:  Neither is a smoker  Only one is a smoker  Both are smokers  If we define X as the number of smokers between these two individuals, then  X=0: Neither is a smoker  X=1: Only one is a smoker  X=2: Both are smokers

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.9 Binary Data: Smoking Status Example P(X=0) = (1-p) 2 = (0.71) 2 = P(X=1) = P( 1st individual is a smoker OR 2nd individual is a smoker ) =p(1-p)+(1-p)p = 2p(1-p) = P(X=2) = p 2 = (0.29) 2 = Notice that: P(X=0)+P(X=1)+P(X=2)= =1.000

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.10 Binary Data: Smoking Status Example  Bar chart  A plot of the probability distribution of X, covering all of the possible numbers that X can attain (in the previous example those were n=0, 1, and 2)

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.11 Binomial Distribution  The binomial distribution is a special distribution that closely models the behavior of variables that corresponds to repeated Bernoulli experiments  When describing the binomial distribution, we need to specify two parameters:  The probability of “success” (p)  The number of Bernoulli experiments (n)  One way of looking at p is as the proportion of time that an experiment is successful when repeated a large number of times

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.12 Binomial Distribution  The “Exact Binomial Probability” is the probability calculated from the binomial distribution  Done by repeatedly plugging in the values for n, x, and p  Always correct to do so, but tedious  Only use if have computer to calculate for you!

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.13 Binomial Distribution  Given these parameters, the mean and standard deviation of a binomial distribution are:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.14 Binomial Distribution  An easier way…  If n is sufficiently large, then the statistic is approximately distributed as normal with mean 0 and standard deviation 1 (a std. normal distribution) In general, we say n is “large” if np and n(1-p) are both ≥ 5

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.15 Binomial Distribution  A better approximation to the normal distribution is given by when X < np, and by when X >np  This is called a continuity correction

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.16 Binary Data: Smoking Status Example  For example, if we sample n=30 subjects from the population, what is the probability that at most 6 of these individuals smoke?  In other words, we want to find the proportion of samples of size n=30 in which at most 6 people smoke  With p=0.29 and n=30, we have:  np=8.75 and n(1-p)=21.3>5  X=6<np=8.7  should apply the continuity correction

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.17 Binary Data: Smoking Status Example  Normal approximation to binomial with continuity correction:  The exact binomial probability is 0.190, which is very close to the approximate value given above

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.18 Sampling Distribution of Proportions  Our thinking in terms of estimation (including confidence intervals) and hypothesis testing does not change when dealing with proportions  We may use the Central Limit Theorem for binary data too (as the CLT applies to all distributions)  Again, we must be careful when n is small

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.19 Sampling Distribution of Proportions  Since the proportion of successes in the general population will not be known, we must estimate it  In general, such an estimate is derived by calculating the proportion of successes in a sample of n experiments (trials) as follows:  Where x is the number of successes in the sample of size n

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.20 Sampling Distribution of Proportions  The sampling distribution of a proportion has mean, μ, and standard deviation, σ: Mean: μ = p Standard deviation: σ = And the statistic: is distributed according to the standard normal distribution (again, the normal approximation is particularly good when np ≥ 5 and n(1-p) ≥ 5)  NOTE: Dividing our previous parameters by n (x  x/n)

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.21 Sampling Distribution of Proportions  Note that if we multiply the numerator and denominator of Z by n we have, which is the familiar form encountered in our study of the binomial distribution

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.22 Hypothesis Testing  Similar to hypothesis testing with continuous data, one may perform hypothesis tests on binary data:  1-sample test of a proportion  H 0 : p=p 0  H A : p ≠ p 0  2-sample test comparing proportions  H 0 : p 1 =p 2  H A : p 1 ≠ p 2

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.23 Lung Cancer Example  Consider the five-year survival among patients who have been diagnosed with lung cancer  The mean proportion of individuals surviving is p=0.10  This implies that the standard deviation of the 5-year survival is:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.24 Lung Cancer Example  Suppose we select n=50 lung cancer patients and are interested in the probability that 10 or more of these patients are still alive after 5 years  In other words, if we select repeated samples of size n=50 patients diagnosed with lung cancer, what fraction of the samples will have 20% or more survivors?

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.25 Lung Cancer Example  Since np = 50(0.1) = 5 and n(1-p) = 50(0.9) =45>5 the normal approximation should be adequate  Then,  Only 0.9% probability that 10 (20%) or more cancer patients would survive after five years from a sample of 50

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.26 Lung Cancer Example  In the previous example, we did not know the true proportion of 5-year survivors among individuals under 40 years of age that have been diagnosed with lung cancer  If it is known from previous studies that the five-year survival rate of lung cancer patients that are older than 40 years old is 8.2%, we might want to test whether the five-year survival among the younger lung-cancer patients is the same as that of the older ones

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.27 Lung Cancer Example  From a sample of n=52 patients under the age of 40 that have been diagnosed with lung cancer, the proportion surviving after five years is (or 6 patients)  Is this within sampling variability of the known 5-year survival of older patients? 0.082

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.28 Lung Cancer Example  The test of hypothesis is constructed as follows:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.29  The test statistic is: Lung Cancer Example P(|Z|>0.87)=P(Z>0.87)+P(Z<-0.87) =

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.30  Since the p value associated with the two-sided test is > α = 0.05, we do NOT reject the null hypothesis  That is, there is not sufficient evidence to indicate that the five-year survival of lung cancer patients who are younger than 40 years of age is different than that of the older patients Lung Cancer Example

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.31 Lung Cancer Example  STATA output for this normal approximation:  Again, we do not reject the null hypothesis  Note that any differences with the hand calculations are due to round-off error (See Lab #8 for appropriate STATA code)

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.32 Lung Cancer Example  Carrying out an exact binomial test in STATA we have:  We see that the p value associated with the two-sided test is 0.318, which is close to that calculated previously

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.33 Confidence Intervals  Similar to the testing of hypothesis involving proportions, we can construct confidence intervals for a population proportion (or for a difference between two proportions)  Again these intervals will be based on the statistic: where and are the estimates of the proportion and its associated standard deviation, respectively

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.34 Confidence Intervals  Two-sided Confidence Interval:  One-sided Confidence Intervals:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.35  In the previous example, if 6 out of 52 lung cancer patients under 40 years of age were alive after five years, and using the normal approximation (which is justified since np=52(0.115)=5.98 ≥ 5, and n(1-p)= 52( )=46.02 ≥ 5), an approximate 95% confidence interval for the true proportion p is given by Lung Cancer Example

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.36  In other words, we are 95% confident that the true five-year survival of lung-cancer patients < 40 years of age is between 2.8% and 20.2%  Note that this interval contains 8.2% (the five-year survival rate among lung cancer patients that are older than 40 years of age)  Thus, it is equivalent to a hypothesis test that did not reject the null hypothesis of equal five-year survival between lung cancer patients that are older than 40 years of age versus younger subjects Lung Cancer Example

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.37 Exact Confidence Intervals  “Exact” confidence intervals for a binomial parameter are possible  These do not rely on the normal approximation to the binomial (i.e., use of the CLT)  Computationally very intensive (particularly for large N)  Typically require special programming/software

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.38 Exact Confidence Intervals  General Rule:  Use exact confidence intervals whenever software is available and is feasible given the computing resources  If N is large…  It is OK to use normal approximation (as CLT kicks in)  If N is small…  The normal approximation may not be appropriate  Use exact CIs if possible

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.39 Lung Cancer Example  In the previous example, using exact binomial confidence intervals we have which is close to our calculations that used the normal approximation

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.40 Two Proportions  Comparing two proportions is similar to a two- mean comparison…  First and second group proportions: and  Under assumptions of equality of the two population proportions, we may want to derive a pooled estimate of the sample proportion:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.41 Two Proportions  Using this pooled estimate, we can derive a pooled estimate of the standard deviation of the unknown proportion (assumed equal between the two groups) as:  The hypothesis testing of comparisons between two proportions is based on the statistic:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.42 Two Proportions: Hypothesis Test  Hypothesis test for difference in two proportions:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.43 Car Accident Example  A study investigating morbidity and mortality among pediatric victims of motor vehicles accidents  Two random samples were selected to investigate the effectiveness of seat belts: 1.n 1 =123 from a population of children that were wearing seat belts at the time of the accident 2.n 2 =290 from a group of children that were not wearing seat belts at the time of the accident

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.44 Car Accident Example  In the first case, x 1 =3 children died, while in the second x 2 =13 died  Consequently, and  The estimated difference between these proportions:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.45 Car Accident Example  We wish to determine if the death rate is different in the two groups and carry out the test of hypothesis as proposed earlier:  Thus, there is not sufficient evidence to conclude that children not wearing seat belts are safer (die at different rates) than children wearing seat belts

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.46 Two Proportions: Confidence Intervals  Confidence intervals of the difference of two proportions are also based on the statistic:  The standard deviation estimate is

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.47 Two Proportions: Confidence Intervals  Note discrepancy between hypothesis testing and CI’s:  We no longer need to assume that the two proportions are equal, so the estimate of the standard deviation in the denominator is not a pooled estimate, but simply the sum of the standard deviations in each group  This deviation from hypothesis testing may lead to disagreements between decisions reached through usual hypothesis testing versus hypothesis testing performed using confidence intervals (infrequent issue)

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.48 Two Proportions: Confidence Intervals

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.49 Car Accident Example  A two-sided 95% confidence interval for the true difference in death rates among children wearing seat belts versus those that did not is given by:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.50 Car Accident Example  STATA output:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.51 Car Accident Example  That is, the true difference between the two groups will be between 5.7% in favor of those children wearing seat belts, to 1.6% in favor of those children not wearing seat belts  Since the zero (hypothesized under the null hypothesis) difference is included in the confidence interval we fail to reject the null hypothesis  There is no evidence to suggest a benefit of seat belts  But enough of a trend, that we may want to do further studies…

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.52 Special Case  What if we observe 0 events or responses?  How do we get a CI for the response rate when the variability is 0?  Example: ACTG A5129, “Absence of Sustained Hyperlactatemia Among HIV- Infected Patients with Risk Factors for Mitochondrial Toxicity”

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.53 Special Case  From abstract: Wohl DA, Pilcher CD, Evans SR, Revuelta M, McComsey G, Yang Y, Zackin R, Alston B, Welch S, Basar M, Kashuba A, Kondo P, Martinez A, Giardini J,Quinn J, Littles M, Wingfield H, Koletar SL, “Absence of Sustained Hyperlactatemia Among HIV-Infected Patients with Risk Factors for Mitochondrial Toxicity”, Journal of Acquired Immune Deficiency Syndromes (JAIDS), 35:3: , 2004.

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.54 Special Case

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.55 Special Case  There were no episodes of symptomatic hyperlactatemia or lactic acidosis during the study  A 95% confidence interval for the prevalence of hyperlactatemia given our data (i.e., 0 out of 83) is:  In other words, the true prevalence of hyperlactatemia as defined by this study is likely to be less than 3.54% 95% CI = (0,1-α 1/n ) = (0, )

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.56 Review: Coin Flipping  What’s the chance that heads comes up 8+ in 10 flips of a coin?  Since x=8>np=10(0.5)=5, we apply the normal approximation with continuity correction:  What’s the area in the upper tail of the standard normal, above 1.581?

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.57 Review: Coin Flipping  Thus, there’s a 5.7% chance that we could have 8, 9, or 10 heads on 10 flips of the coin

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.58 Review: Basketball Statistics  As of November 27th, the Celtics have 10 wins out of 11 games:  How likely is it that they just had a good start are a really just a.500 team?  Ho: Celtics win 50% of games (p=0.500)  Ha: Celtics win >50% of games (p>0.500)  Set α=0.05

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.59 Review: Basketball Statistics  One-sample test of proportion (one-sided)  0.3% chance we would observe this data (the Celtics present winning record) given they are really a.500 team  Since P<0.05, we can say the Celtics are NOT a.500 team!  We reject the null that their winning percentage is 50% 0.500

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.60 Review: Basketball Statistics  One-Sided Confidence Interval  We know that 95% of the time, the interval from 0.66 to infinity will cover the true winning percentage of the Celtics  Since 0.5 does not fall in the interval, we can say the Celtics are NOT a.500 team!  We reject the null that their winning percentage is 50%

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.61 Review: Basketball Statistics  Comparing Free Throw Percentages:  Kevin Garnett:  Ray Allen:  Combined:  And pooled estimate of S.D.:

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.62 Review: Basketball Statistics  Ho: and Ha:  Two-sample test of proportions:  Two-sided test this time…  No statistically significant difference between Garnett and Allen’s free throw percentages… 0

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.63 Review: Basketball Statistics  Confidence interval for difference in proportions:  Since zero does fall in this confidence interval, we can’t reject the null hypothesis that Garnett and Allen are equally proficient free throw shooters

Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.64 Where we’ve been… and where we’re going Variables of Interest? One Variable Continuous (Methods from Before Midterm) Binary One-sample test of proportion Two-sample test t of proportion Exact Methods Normal Approximation Two Variables Both Continuous Interested in prediction Simple Linear Regression Interested in association Both variables normal Pearson Correlation Not normal Spearman Correlation One Continuous, one categorical ANOVA Both Binary (in two weeks) More than Two Variables Multiple Linear Regression