Comparing k Populations Means – One way Analysis of Variance (ANOVA)
The F test – for comparing k means Situation We have k normal populations Let mi and s denote the mean and standard deviation of population i. i = 1, 2, 3, … k. Note: we assume that the standard deviation for each population is the same. s1 = s2 = … = sk = s
We want to test against
A convenient method for displaying the calculations for the F-test The ANOVA Table A convenient method for displaying the calculations for the F-test
Anova Table Mean Square F-ratio Between k - 1 SSBetween MSBetween Source d.f. Sum of Squares Mean Square F-ratio Between k - 1 SSBetween MSBetween MSB /MSW Within N - k SSWithin MSWithin Total N - 1 SSTotal
To Compute F (and the ANOVA table entries): 1) 2) 3) 4) 5)
Then 1) 2) 3) 4)
The c2 test for independence
Situation We have two categorical variables R and C. The number of categories of R is r. The number of categories of C is c. We observe n subjects from the population and count xij = the number of subjects for which R = I and C = j. R = rows, C = columns
Example Both Systolic Blood pressure (C) and Serum Cholesterol (R) were meansured for a sample of n = 1237 subjects. The categories for Blood Pressure are: <126 127-146 147-166 167+ The categories for Cholesterol are: <200 200-219 220-259 260+
Table: two-way frequency
The c2 test for independence Define = Expected frequency in the (i,j) th cell in the case of independence.
Justification - for Eij = (RiCj)/n in the case of independence Let pij = P[R = i, C = j] = P[R = i] P[C = j] = rigj in the case of independence = Expected frequency in the (i,j) th cell in the case of independence.
H0: R and C are independent Then to test H0: R and C are independent against HA: R and C are not independent Use test statistic Eij= Expected frequency in the (i,j) th cell in the case of independence. xij= observed frequency in the (i,j) th cell
Sampling distribution of test statistic when H0 is true - c2 distribution with degrees of freedom n = (r - 1)(c - 1) Critical and Acceptance Region Reject H0 if : Accept H0 if :
Standardized residuals Test statistic degrees of freedom n = (r - 1)(c - 1) = 9 Reject H0 using a = 0.05
Hypothesis testing and Estimation Linear Regression Hypothesis testing and Estimation
Fitting the best straight line to “linear” data The Least Squares Line Fitting the best straight line to “linear” data
The equation for the least squares line Let
Computing Formulae:
Then the slope of the least squares line can be shown to be:
and the intercept of the least squares line can be shown to be:
The residual sum of Squares Computing formula
Estimating s, the standard deviation in the regression model : Computing formula This estimate of s is said to be based on n – 2 degrees of freedom
Sampling distributions of the estimators
The sampling distribution slope of the least squares line : It can be shown that b has a normal distribution with mean and standard deviation
The sampling distribution intercept of the least squares line : It can be shown that a has a normal distribution with mean and standard deviation
Estimating s, the standard deviation in the regression model : Computing formula This estimate of s is said to be based on n – 2 degrees of freedom
(1 – a)100% Confidence Limits for slope b : ta/2 critical value for the t-distribution with n – 2 degrees of freedom
(1 – a)100% Confidence Limits for intercept a : ta/2 critical value for the t-distribution with n – 2 degrees of freedom
Example In this example we are studying building fires in a city and interested in the relationship between: X = the distance of the closest fire hall and the building that puts out the alarm and Y = cost of the damage (1000$) The data was collected on n = 15 fires.
The Data
Scatter Plot
Computations
Computations Continued
Computations Continued
Computations Continued
Least Squares Line y=4.92x+10.28
95% Confidence Limits for slope b : 4.07 to 5.77 t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom
95% Confidence Limits for intercept a : 7.21 to 13.35 t.025 = 2.160 critical value for the t-distribution with 13 degrees of freedom