Thus we have (Xbar - m )/(s/sqrt(n)) which has a Z distribution if: Inference about the mean of a population of measurements (m) is based on the standardized value of the sample mean (Xbar). The standardization involves subtracting the mean of Xbar and dividing by the standard deviation of Xbar – recall that Mean of Xbar is m ; and Standard deviation of Xbar is s/sqrt(n) Thus we have (Xbar - m )/(s/sqrt(n)) which has a Z distribution if: Population is normal and s is known ; or if n is large so CLT takes over…
But what if s is unknown?? Then this standardized Xbar doesn’t have a Z distribution anymore, but a so-called t-distribution with n-1 degrees of freedom… Since s is unknown, the standard deviation of Xbar, s/sqrt(n), is unknown. We estimate it by the so-called standard error of Xbar, s/sqrt(n), where s=the sample standard deviation. There is a t-distribution for every value of the sample size; we’ll use t(k) to stand for the particular t-distribution with k degrees of freedom. There are some properties of these t-distributions that we should note…
Every t-distribution looks like a N(0,1) distribution; i. e Every t-distribution looks like a N(0,1) distribution; i.e., it is centered and symmetric around 0 and has the same characteristic “bell” shape… however, the standard deviation of t(k) {sqrt(k/(k-2))} is greater than 1, the s.d. of Z so the t-distribution density curve is more spread out than Z. Probabilities involving r.v.s that have the t(k) distributions are given by areas under the t(k) density curve … Table D in the back of our book gives us the probabilities we need…
The good news is that everything we’ve already learned about constructing confidence intervals and testing hypotheses about m carries through under the assumption of unknown s … So e.g., a 95% confidence interval for m based on a SRS from a population with unknown s is Xbar +/- t*(s.e.(Xbar)) Recall that s.e.(Xbar) = s/sqrt(n). Here t* is the appropriate tabulated value from Table D so that the area between –t* and +t* is .95 As we did before, if we change the level of confidence then the value of t* must change appropriately…
Similarly, we may test hypotheses using this t-distributed standardized Xbar… e.g., to test the H0: m =m0 against Ha: m >m0 we use (Xbar - m0)/(s/sqrt(n)) which has a t-distribution with n-1 df, assuming the null hypothesis is true. See page 422 (7.1, 3/7) for a complete summary of hypothesis testing in the case of “the one-sample t-test” … HW: Read section 7.1 thru p. 433; go over all the examples carefully and answer the HW questions following them: #7.1-7.9 Work on the following problems (p.441 ff) (use software as needed): #7.15-7.22, 7.25, 7.32, 7.35-7.37, 7.41.
To summarize the analysis: Is there a difference in aggressive behavior of patients on "moon days" compared with "non-moon days"? To summarize the analysis: when the data comes in matched pairs, the analysis is performed on the differences between the paired measurements then use the t-statistic with n-1 d.f. (n = # of pairs) to construct confidence intervals and test hypotheses on the true mean difference.
In a matched pairs design, subjects are matched in pairs and the outcomes are compared within each matched pair. A coin toss could determine which of the two subjects gets the treatment and which gets the control… One special kind of matched pairs design is when a subject acts as his/her own control, as in a before/after study… See example 7.7 on page 428ff (7.1, 4/7). Note that the paired observations (# of agressive behaviors) are subtracted and the difference in scores becomes the single number analyzed with a one-sample t-statistic with n-1 df, where n=the number of pairs… see the top of page 431 and the next page for a summary of the process. HW Read through p.433. Go over Example 7.7 then do #7.32, 7.35, 7.41.
Read the section on Robustness of the t procedures (starting p. 432 (7 Read the section on Robustness of the t procedures (starting p.432 (7.1, 5/7))… note the definition of the statistical term robust – essentially, a statistic is robust if it is insensitive to violations of the assumptions made when the statistic is used. For example, the t-statistic requires normality of the population… how sensitive is the t-statistic to violations of normality?? Look at the practical guidelines for inference on a single mean at bottom of p.432… If the sample size is < 15, use the t procedures if the data are close to normal. If the sample size is >= 15 then unless there is strong non-normality or outliers, t procedures are OK If the sample size is large (say n >= 40) then even if the distribution is skewed, t procedures are OK