Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.1 Counts and Proportions.

Slides:



Advertisements
Similar presentations
Hypothesis Testing. To define a statistical Test we 1.Choose a statistic (called the test statistic) 2.Divide the range of possible values for the test.
Advertisements

BPS - 5th Ed. Chapter 241 One-Way Analysis of Variance: Comparing Several Means.
© Scott Evans, Ph.D. and Lynne Peeples, M.S.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Inferential Statistics & Hypothesis Testing
Confidence Intervals © Scott Evans, Ph.D..
EPIDEMIOLOGY AND BIOSTATISTICS DEPT Esimating Population Value with Hypothesis Testing.
Point and Confidence Interval Estimation of a Population Proportion, p
BCOR 1020 Business Statistics Lecture 22 – April 10, 2008.
The Simple Regression Model
Topic 2: Statistical Concepts and Market Returns
Evaluating Hypotheses
Hypothesis Tests for Means The context “Statistical significance” Hypothesis tests and confidence intervals The steps Hypothesis Test statistic Distribution.
BS704 Class 7 Hypothesis Testing Procedures
Inferences About Process Quality
BCOR 1020 Business Statistics Lecture 18 – March 20, 2008.
BCOR 1020 Business Statistics
1 BA 555 Practical Business Analysis Review of Statistics Confidence Interval Estimation Hypothesis Testing Linear Regression Analysis Introduction Case.
5-3 Inference on the Means of Two Populations, Variances Unknown
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Sample Size Determination Ziad Taib March 7, 2014.
Statistical Inference for Two Samples
Problem A newly married couple plans to have four children and would like to have three girls and a boy. What are the chances (probability) their desire.
Medical Statistics (full English class) Ji-Qian Fang School of Public Health Sun Yat-Sen University.
Inference for regression - Simple linear regression
Hypothesis Testing:.
Chapter 13: Inference in Regression
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Chapter 9 Hypothesis Testing.
Confidence Intervals and Hypothesis Testing - II
Fundamentals of Hypothesis Testing: One-Sample Tests
Chapter 5 Sampling Distributions
Simple Linear Regression
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 26 Comparing Counts.
Binomial and Related Distributions 學生 : 黃柏舜 學號 : 授課老師 : 蔡章仁.
More About Significance Tests
Inference for a Single Population Proportion (p).
Introduction to Biostatistics, Harvard Extension School, Fall 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.1 Counts and Proportions.
Statistical Review We will be working with two types of probability distributions: Discrete distributions –If the random variable of interest can take.
Topics: Statistics & Experimental Design The Human Visual System Color Science Light Sources: Radiometry/Photometry Geometric Optics Tone-transfer Function.
Mid-Term Review Final Review Statistical for Business (1)(2)
Testing means, part II The paired t-test. Outline of lecture Options in statistics –sometimes there is more than one option One-sample t-test: review.
Biostatistics Class 6 Hypothesis Testing: One-Sample Inference 2/29/2000.
BINOMIALDISTRIBUTION AND ITS APPLICATION. Binomial Distribution  The binomial probability density function –f(x) = n C x p x q n-x for x=0,1,2,3…,n for.
Biostatistics, statistical software VII. Non-parametric tests: Wilcoxon’s signed rank test, Mann-Whitney U-test, Kruskal- Wallis test, Spearman’ rank correlation.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Statistical Hypotheses & Hypothesis Testing. Statistical Hypotheses There are two types of statistical hypotheses. Null Hypothesis The null hypothesis,
Large sample CI for μ Small sample CI for μ Large sample CI for p
Confidence Intervals Lecture 3. Confidence Intervals for the Population Mean (or percentage) For studies with large samples, “approximately 95% of the.
BPS - 5th Ed. Chapter 221 Two Categorical Variables: The Chi-Square Test.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 8 First Part.
Introduction to Inference: Confidence Intervals and Hypothesis Testing Presentation 4 First Part.
Fall 2002Biostat Statistical Inference - Confidence Intervals General (1 -  ) Confidence Intervals: a random interval that will include a fixed.
Chap 8-1 Fundamentals of Hypothesis Testing: One-Sample Tests.
Fall 2002Biostat Statistical Inference - Proportions One sample Confidence intervals Hypothesis tests Two Sample Confidence intervals Hypothesis.
Ka-fu Wong © 2003 Chap 6- 1 Dr. Ka-fu Wong ECON1003 Analysis of Economic Data.
© Copyright McGraw-Hill 2004
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 10 Comparing Two Groups Section 10.1 Categorical Response: Comparing Two Proportions.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 3 – Slide 1 of 27 Chapter 11 Section 3 Inference about Two Population Proportions.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Counts and Proportions.
Introduction to Biostatistics, Harvard Extension School, Fall, 2005 © Scott Evans, Ph.D.1 Contingency Tables.
SECTION 1 TEST OF A SINGLE PROPORTION
Chapter 9 Hypothesis Testing Understanding Basic Statistics Fifth Edition By Brase and Brase Prepared by Jon Booze.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate accuracy.
Inference for a Single Population Proportion (p)
Two-Sample Hypothesis Testing
Estimation & Hypothesis Testing for Two Population Parameters
Hypothesis Testing and Confidence Intervals (Part 1): Using the Standard Normal Lecture 8 Justin Kern October 10 and 12, 2017.
Chapter 9 Hypothesis Testing.
Introduction to Sampling Distributions
Presentation transcript:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.1 Counts and Proportions

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.2 Counts and Proportions  Coin flipping  What’s the chance that heads comes up 8+ times in 10 flips of a coin?  Binomial distribution  Batting average  Is Ichiro a.400+ hitter?  One-sample test of proportion  Which is the better offensive team, Mariners or Red Sox?  Two-sample test of two proportions  Biomedical Research…

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.3 Where we’ve been… and where we’re going Variables of Interest? One Variable Continuous (Methods from Before Midterm) Binary One-sample test of proportion Two-sample test t of proportion Exact Methods Normal Approximation Two Variables Both Continuous Interested in prediction Simple Linear Regression Interested in association Both variables normal Pearson Correlation Not normal Spearman Correlation One Continuous, one categorical ANOVA Both Binary (in two weeks) More than Two Variables Multiple Linear Regression

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.4 Binary Data  Often in medical and public health studies, our endpoint of interest is binary or dichotomous  Examples  disease vs. no disease  response vs. no response  death vs. no death  success vs. failure

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.5 Binary Data  Continuous endpoints often dichotomized into a binary endpoint as well  For example, in a study of the effect of a drug on LDL levels, for each subject, the LDL measurement at the end of the study (a continuous measure) may be dichotomized into “response” vs. “no response” based on a cut-point defining whether the LDL level has been reduced to acceptable, normal, or safe levels.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.6 Binary Data 1.There are only two possible (mutually exclusive) outcomes, often called a “success” and a “failure”. 2.Each experiment is identical to all the others, and the probability of a “success” is p. Thus, the probability of a “failure” is 1-p.  Each experiment is called a Bernoulli trial. Such experiments include throwing the die and observing whether or not it comes up six, tossing a coin, or investigating the survival of a cancer patient, etc.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.7 Binary Data: Smoking Status Example  Define:  X=1 for a smoker and  X=0 for a non-smoker.  If “success” is the event that a randomly selected individual is a smoker, and from previous research it is known that about 29% of the adults in the United States smoke, then: and  This is an example of a Bernoulli trial (we select just one individual at random, each selection is carried out independently, and each time the probability of that individual being a “success” is constant).

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.8 Binary Data: Smoking Status Example  If we selected two adults, and looked at their smoking status, then the possible outcomes are:  Neither is a smoker  Only one is a smoker  Both are smokers  If we define X as the number of smokers between these two individuals, then  X=0: Neither is a smoker  X=1: Only one is a smoker  X=2: Both are smokers

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.9 Binary Data: Smoking Status Example P(X=0) = (1-p) 2 = (0.71) 2 = P(X=1) = P( 1st individual is a smoker OR 2nd individual is a smoker ) =p(1-p)+(1-p)p = 2p(1-p) = P(X=2) = p 2 = (0.29) 2 =  Notice that: P(X=0)+P(X=1)+P(X=2)= =1.000.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.10 Binary Data: Smoking Status Example  Bar chart  A plot of the probability distribution of X, covering all of the possible numbers that X can attain (in the previous example those were n=0, 1, and 2).

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.11 Binomial Distribution  The binomial distribution is a special distribution that closely models the behavior of variables that corresponds to repeated Bernoulli experiments.  When describing the binomial distribution, we need to specify two parameters:  The probability of “success” (p)  The number of Bernoulli experiments (n)  One way of looking at p is as the proportion of time that an experiment is successful when repeated a large number of times.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.12 Binomial Distribution  Given these parameters, the mean and standard deviation of a binomial distribution are:  If n is sufficiently large, then the statistic is approximately distributed as normal with mean 0 and standard deviation 1 (a std. normal distribution). A better approximation to the normal distribution is given by when X np. This is called a continuity correction. In general, we say n is “large” if np and n(1-p) are both ≥ 5

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.13 Binary Data: Smoking Status Example  For example, suppose that we want to find the proportion of samples of size n=30 in which at most six individuals smoke.  With p=0.29 and n=30, we have:  np=8.75 and n(1-p)=21.3>5  X=6<np=8.7  should apply the continuity correction…

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.14 Binary Data: Smoking Status Example  Normal approximation to binomial with continuity correction:  The exact binomial probability is 0.190, which is very close to the approximate value given above.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.15 Sampling Distribution of Proportions  Our thinking in terms of estimation (including confidence intervals) and hypothesis testing does not change when dealing with proportions.  We may use the Central Limit Theorem for binary data also (as the CLT applies to all distributions)  But note that the CLT is an asymptotic result (as n  ∞ )  Thus, we must be careful when n is small

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.16 Sampling Distribution of Proportions  Since the proportion of a success in the general population will not be known, we must estimate it. In general, such an estimate is derived by calculating the proportion of successes in a sample of n experiments (trials) as follows: where x is the number of successes in the sample of size n.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.17 Sampling Distribution of Proportions  The sampling distribution of a proportion has mean, μ, and standard deviation, σ: Mean: μ = p Standard deviation: σ = And the statistic: is distributed according to the standard normal distribution (again, the normal approximation is particularly good when np>5 and n(1-p)>5).  NOTE: Dividing our previous parameters by n (x  x/n)

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.18 Sampling Distribution of Proportions  Note that if we multiply the numerator and denominator of Z by n we have, which is the familiar form encountered in our study of the binomial distribution.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.19 Hypothesis Testing  Similar to hypothesis testing with continuous data, one may perform hypothesis tests on binary data:  1-sample test of a proportion  H 0 : p=p 0  H A : p ≠ p 0  2-sample test comparing proportions  H 0 : p 1 =p 2  H A : p 1 ≠ p 2

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.20 Lung Cancer Example  Consider the five-year survival among patients under 40 who have been diagnosed with lung cancer. The mean proportion of individuals surviving is p= implying that the standard deviation of the 5-year survival is:.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.21 Lung Cancer Example  If we select repeated samples of size n=50 patients diagnosed with lung cancer, what fraction of the samples will have 20% or more survivors? That is, “what percent of the time 50(0.20)=10 or more patients will be alive after 5 years”?  Since np = 50(0.1) = 5 and n(1-p) = 50(0.9) =45>5 the normal approximation should be adequate. Then,  Only 0.9% of the time will the proportion of lung cancer patients surviving past five years be 20% or more.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.22 Lung Cancer Example  In the previous example, we did not know the true proportion of 5-year survivors among individuals under 40 years of age that have been diagnosed with lung cancer.  If it is known from previous studies that the five-year survival rate of lung cancer patients that are older than 40 years old is 8.2%, we might want to test whether the five-year survival among the younger lung-cancer patients is the same as that of the older ones.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.23 Lung Cancer Example  From a sample of n=52 patients under the age of 40 that have been diagnosed with lung cancer, the proportion surviving after five years is.  Is this within sampling variability of the known 5-year survival of older patients? 0.082

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.24 Lung Cancer Example  The test of hypothesis is constructed as follows:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.25  The test statistic is: Lung Cancer Example P(|Z|>0.87)=P(Z>0.87)+P(Z<-0.87) =

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.26  Since the p value associated with the two-sided test is > α = 0.05, we do NOT reject the null hypothesis.  That is, there is not sufficient evidence to indicate that the five-year survival of lung cancer patients who are younger than 40 years of age is different than that of the older patients. Lung Cancer Example

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.27 Lung Cancer Example  STATA output for this normal approximation:  Again, we do not reject the null hypothesis. Note that any differences with the hand calculations are due to round-off error. (NOTE: See Lab #8 for appropriate STATA code.)

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.28 Lung Cancer Example  Carrying out an exact binomial test in STATA we have:  We see that the p value associated with the two-sided test is 0.318, which is close to that calculated previously.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.29 Confidence Intervals  Similar to the testing of hypothesis involving proportions, we can construct confidence intervals for a population proportion (or for a difference between two proportions).  Again these intervals will be based on the statistic: where and are the estimates of the proportion and its associated standard deviation, respectively.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.30 Confidence Intervals  One-sided Confidence Interval:  Two-sided Confidence Intervals:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.31  In the previous example, if 6 out of 52 lung cancer patients under 40 years of age were alive after five years, and using the normal approximation (which is justified since np=52(0.115)=5.98>5, and n(1-p)= 52( )=46.02>5), an approximate 95% confidence interval for the true proportion p is given by Lung Cancer Example

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.32  In other words, we are 95% confident that the true five-year survival of lung-cancer patients < 40 years of age is between 2.8% and 20.2%.  Note that this interval contains 8.2% (the five-year survival rate among lung cancer patients that are older than 40 years of age). Thus, it is equivalent to a hypothesis test that did not reject the null hypothesis of equal five-year survival between lung cancer patients that are older than 40 years old versus younger subjects. Lung Cancer Example

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.33 Lung Cancer Example  In the previous example, using exact binomial confidence intervals we have which is close to our calculations that used the normal approximation.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.34 Two Proportions  Comparing two proportions is similar to a two- mean comparison…  First and second group proportions: and  Under assumptions of equality of the two population proportions, we may want to derive a pooled estimate of the sample proportion:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.35 Two Proportions  Using this pooled estimate, we can derive a pooled estimate of the standard deviation of the unknown proportion (assumed equal between the two groups) as:  The hypothesis testing of comparisons between two proportions is based on the statistic:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.36 Two Proportions: Hypothesis Test  Hypothesis test for difference in two proportions:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.37 Car Accident Example  In a study investigating morbidity and mortality among pediatric victims of motor vehicles accidents, information regarding the effectiveness of seat belts was collected.  Two random samples were selected, one of size n 1 =123 from a population of children that were wearing seat belts at the time of the accident, and another of size n 2 =290 from a group of children that were not wearing seat belts at the time of the accident.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.38 Car Accident Example  In the first case, x 1 =3 children died, while in the second x 2 =13 died.  Consequently, and.  The estimated difference between these proportions:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.39 Car Accident Example  We wish to determine if the death rate is different in the two groups and carry out the test of hypothesis as proposed earlier:  Thus, there is not sufficient evidence to conclude that children not wearing seat belts are safer (die in different rates) than children wearing seat belts.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.40 Two Proportions: Confidence Intervals  Confidence intervals of the difference of two proportions are also based on the statistic:  The standard deviation estimate is

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.41 Two Proportions: Confidence Intervals  Note discrepancy between hypothesis testing and CI’s:  We no longer need to assume that the two proportions are equal, so the estimate of the standard deviation in the denominator is not a pooled estimate, but simply the sum of the standard deviations in each group.  This deviation from hypothesis testing may lead to disagreements between decisions reached through usual hypothesis testing versus hypothesis testing performed using confidence intervals (infrequent issue).

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.42 Two Proportions: Confidence Intervals

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.43 Car Accident Example  A two-sided 95% confidence interval for the true difference in death rates among children wearing seat belts versus those that did not is given by:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.44 Car Accident Example  STATA output:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.45 Car Accident Example  That is, the true difference between the two groups will be between 5.7% in favor of those children wearing seat belts, to 1.6% in favor of those children not wearing seat belts. In this regard, since the zero (hypothesized under the null hypothesis) difference is included in the confidence interval we fail to reject the null hypothesis. There is no evidence to suggest a benefit of seat belts  Click it, or Ticket!!

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.46 Exact Confidence Intervals  “Exact” confidence intervals for a binomial parameter are possible  These do not rely on the normal approximation to the binomial (i.e., use of the CLT)  Computationally very intensive (particularly for large N)  May require special programming/software

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.47 Exact Confidence Intervals  General Rule:  Use exact confidence intervals whenever software is available and is feasible given the computing resources  If N is large…  It is OK to use normal approximation (as CLT kicks in)  If N is small…  The normal approximation may not be appropriate  Use exact CIs if possible

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.48 Special Case  What if we observe 0 events or responses?  How do we get a CI for the response rate when the variability is 0?  Example: ACTG A5129, “Absence of Sustained Hyperlactatemia Among HIV- Infected Patients with Risk Factors for Mitochondrial Toxicity”

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.49 Special Case  From abstract:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.50 Special Case  From abstract:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.51 Special Case  There were no episodes of symptomatic hyperlactatemia or lactic acidosis during the study.  A 95% confidence interval for the prevalence of hyperlactatemia given our data (i.e., 0 out of 83) is:  In other words, the true prevalence of hyperlactatemia as defined by this study is likely to be less than 3.54%. 95% CI = (0,1-α 1/n ) = (0, )

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.52 Review: Coin Flipping  What’s the chance that heads comes up 8+ in 10 flips of a coin?  Since x=8>np=10(0.5)=5, we apply the normal approximation with continuity correction:  What’s the area in the upper tail of the standard normal, above 1.581?

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.53 Review: Coin Flipping  Thus, there’s a 5.7% chance that we could have 8, 9, or 10 heads on 10 flips of the coin.

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.54 Review: Baseball Statistics  As of April 16 th, Ichiro has had 9 hits out of 31 at bats:  How likely is it that he could still be a.400 hitter (just having a bad start)?  Ho: Ichiro bats (p=0.400)  Ha: Ichiro bats <0.400 (p<0.400)  Set α=0.05

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.55 Review: Baseball Statistics  One-sample test of proportion (one-sided)  10.4% chance we would observe this data (Ichiro’s present batting average) given he is a.400 hitter.  Since P>0.05, we can’t say Ichiro is NOT a.400 hitter!  We don’t reject the null that Ichiro is a.400 hitter

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.56 Review: Baseball Statistics  Another (better?) Japanese Mariner…  Johjima has 10 hits out of 21 at bats:  Carry out same hypothesis test – but now test if Johjima may be a.500 hitter…  Don’t reject… Johjima’s average falls nicely within a.500+ hitter distribution!! 0.500

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.57 Review: Baseball Statistics  Maybe a better test would be to see if he’s just been lucky…and is really just a moderately good hitter (.300):  Ho: p j =.300  Ha: p j >.300  We reject… Johjima’s average is significantly above.300!! 0.300

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.58 Review: Baseball Statistics  Comparing batting averages of Mariners and Red Sox (overall, as of April 16 th ):  Mariners:  Red Sox:  Combined:  And pooled estimate of S.D.:

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.59 Review: Baseball Statistics  Ho: and Ha:  Two-sample test of proportions:  Two-sided test this time…  No statistically significant difference between Mariners and Red Sox batting averages… 0

Introduction to Biostatistics, Harvard Extension School, Spring, 2007 © Scott Evans, Ph.D. & Lynne Peeples, M.S.60 Where we’ve been… and where we’re going Variables of Interest? One Variable Continuous (Methods from Before Midterm) Binary One-sample test of proportion Two-sample test t of proportion Exact Methods Normal Approximation Two Variables Both Continuous Interested in prediction Simple Linear Regression Interested in association Both variables normal Pearson Correlation Not normal Spearman Correlation One Continuous, one categorical ANOVA Both Binary (in two weeks) More than Two Variables Multiple Linear Regression