BPS - 3rd Ed. Chapter 161 Inference about a Population Mean
BPS - 3rd Ed. Chapter 162 Conditions for Inference about a Mean u Data are a SRS of size n Population has a Normal distribution with mean and standard deviation –Both and are unknown Because is unknown, we cannot use z procedures to infer µ –This is a more realistic situation than in prior chapters
BPS - 3rd Ed. Chapter 163 When we do not know population standard deviation , we use sample standard deviation s to calculate the standard deviation of x-bar, which is now equal to Standard Error This statistic is called the standard error of the mean
BPS - 3rd Ed. Chapter 164 When we use s instead of our “one-sample z statistic” becomes a one-sample t statistic u The t test statistic does NOT follow a Normal distribution it follows a t distribution with n – 1 degrees of freedom One-Sample t Statistic
BPS - 3rd Ed. Chapter 165 The t Distributions u The t density curve is similar to the standard Normal curve (symmetrical, mean = 0, bell- shaped), but has broader tails u The t distribution is a family of distributions –Each family member is identified by its degree of freedom –Notation t(k) means “a t distribution with k degrees of freedom”
BPS - 3rd Ed. Chapter 166 The t Distributions As k increases, the t(k) density curve approaches the Z curve more closely. This is because s becomes a better estimate of as the sample size increases
BPS - 3rd Ed. Chapter 167 Using Table C Table C gives t critical values having upper tail probability p along with corresponding confidence level C The bottom row of the table applies to z* because a t* with infinite degrees of freedom is a standard Normal (Z) variable
BPS - 3rd Ed. Chapter 168 Using Table C t* = (for 7 df) This t table below highlights the t* critical value with probability to its right for the t(7) curve
BPS - 3rd Ed. Chapter 169 where t* is the critical value for confidence level C from the t(n – 1) density curve. Take a SRS of size n from a population with unknown mean and unknown standard deviation . A level C confidence interval for is given by: One-Sample t Confidence Interval
BPS - 3rd Ed. Chapter 1610 Illustrative example American Adult Heights A study of 8 American adults from a SRS yields an average height of = 67.2 inches and a standard deviation of s = 3.9 inches. A 95% confidence interval for the average height of all American adults ( ) is: “We are 95% confident that the average height of all American adults is between 63.9 and 70.5 inches.”
BPS - 3rd Ed. Chapter 1611 The P-value of t is determined from the t(n – 1) curve via the t table. The t test is similar in form to the z test learned earlier. The test statistic is: One-Sample t Test
BPS - 3rd Ed. Chapter 1612 P-value for Testing Means H a : > 0 v P-value is the probability of getting a value as large or larger than the observed test statistic (t) value. H a : < 0 v P-value is the probability of getting a value as small or smaller than the observed test statistic (t) value. H a : 0 v P-value is two times the probability of getting a value as large or larger than the absolute value of the observed test statistic (t) value.
BPS - 3rd Ed. Chapter 1613
BPS - 3rd Ed. Chapter 1614 Weight Gain Illustrative example To study weight gain, ten people between the age of 30 and 40 determine their change in weight over a 1-year interval. Here are their weight changes: Are these data statistically significant evidence that people in this age range gain weight over a given year?
BPS - 3rd Ed. Chapter Hypotheses:H 0 : = 0 H a : > 0 2. Test Statistic: (df = 10 1 = 9) 3. P-value: P-value = P(T > 2.70) = (using a computer) P-value is between 0.01 and 0.02 since t = 2.70 is between t* = (p = 0.02) and t* = (p = 0.01) (Table C) 4. Conclusion: Since the P-value is smaller than = 0.05, there is significant evidence that against the null hypothesis. Weight Gain Illustration
BPS - 3rd Ed. Chapter 1616 Weight Gain Illustrative example
BPS - 3rd Ed. Chapter 1617 Matched Pairs t Procedures u Matched pair samples allow us to compare responses to two treatments in paired couples u Apply the one-sample t procedures to the observed differences within pairs The parameter is the mean difference in the responses to the two treatments within matched pairs in the entire population
BPS - 3rd Ed. Chapter 1618 Air Pollution Case Study – Matched Pairs u Pollution index measurements were recorded for two areas of a city on each of 8 days u To analyze, subtract Area B levels from Area A levels. The 8 differences form a single sample. u Are the average pollution levels the same for the two areas of the city? DayArea AArea BA – B These 8 differences have = and s =
BPS - 3rd Ed. Chapter Hypotheses:H 0 : = 0 H a : ≠ 0 2. Test Statistic: (df = 8 1 = 7) 3. P-value: P-value = 2P(T > ) = (using a computer) P-value is smaller than 2(0.0005) = since t = is greater than t* = (upper tail area = ) (Table C) 4. Conclusion: Since the P-value is smaller than = 0.001, there is strong evidence (“highly signifcant”) that the mean pollution levels are different for the two areas of the city. Case Study
BPS - 3rd Ed. Chapter 1620 Find a 95% confidence interval to estimate the difference in pollution indexes (A – B) between the two areas of the city. (df = 8 1 = 7 for t*) We are 95% confident that the pollution index in area A exceeds that of area B by an average of to index points. Case Study Air Pollution
BPS - 3rd Ed. Chapter 1621 Conditions for t procedures 1. Valid data? 2. Unconfounded comparison? (lurking variables are balanced) 3. Simple random sample? 4. Normal distribution? Weight Gain Illustration Except in the case of small samples, conditions 1 –3 are of more importance than condition 4 (t procedures are “robust” to violations in condition 4).
BPS - 3rd Ed. Chapter 1622 Robustness of Normality Assumption u The t confidence interval and test are exact when the distribution of the population is Normal u However, no real data are exactly Normal u A confidence interval or significance test is robust when the confidence level or P-value does not change very much when the conditions for use of the procedure are violated t statistics are robust when n is moderate to large
BPS - 3rd Ed. Chapter 1623 t Procedure Robustness u Sample size less than 15: Use t procedures if the data appear about Normal (symmetric, single peak, no outliers). If the data are skewed or if outliers are present, do not use t. u Sample size at least 15: The t procedures can be used except in the presence of outliers or strong skewness in the data. u Large samples: The t procedures can be used even for clearly skewed distributions when the sample is large, roughly n ≥ 40.
BPS - 3rd Ed. Chapter 1624 Air Pollution Illustration Assessing Normality u Recall the air pollution illustration u Is it reasonable to assume the sampling distribution of xbar is Normal? u While we cannot judge Normality from just 8 observations, a stemplot of the data (right) shows no outliers, clusters, or extreme skewness. u Thus, the t test will be accurate
BPS - 3rd Ed. Chapter 1625 Can we use t? The stemplot shows a moderately sized data set (n = 20) with a strong negative skew. We cannot trust t procedures in this instance
BPS - 3rd Ed. Chapter 1626 Can we use t? u This histogram shows the distribution of word lengths in Shakespeare’s plays. The sample is very large. u The data has a strong positive skew, but there are no outliers. We can use the t procedures since n ≥ 40
BPS - 3rd Ed. Chapter 1627 Can we use t? u This histogram shows the heights of college students. u The distribution is close to Normal, so we can trust the t procedures for any sample size.