Warm-up Ch.11 Inference for Linear Regression (Day 1) 1. The following is from a particular region’s mortality table. What is the probability that a 20-year-old will survive to be 60? (A) (B) (C) (D) (E) Fifty-three percent of adults say they have trouble sleeping. If a doctor contacts an SRS of 85 adults, what is the probability that over 55% will say they have trouble sleeping? (A) (B) (C) (D) (E) Age Number Surviving 10,0009,7009,2407,8004,300
Stand up for index card.
Agenda until Cumulative test 5/30 and 5/2 Thursday 4/18 – 11.2 Monday 4/22 – 8.3 and 8.4 Wednesday 4/24 – Finish 8.4 start 9.3 (H.W. mult. Choice of practice cumulative test) Friday 4/26 – 9.3 and 9.4 (H.W. Free response portion of practice cumulative test) Tuesday 4/30 Review answers to free response 30 minutes to take 12 multiple choice questions *no extra time given Thursday 5/2 Free Response portion 50 minutes to take 3 free response. A little more than 15 min/ each.
Inference for Regression Remember when we covered scatter plots was the least squares regression line. It included and the LSRL describes the set of numerical data in hopes of predicted the response variable for the given explanatory variable. When sampling for different sets of data we know that there will be different that will affect the LSRL. For inference of a regression line we will use called line of means or line of averages Since there are two estimates, we will work with n – 2 degrees of freedom. For the A.P. Statistics we are only interested in inference for slope (β).
Inference for Regression Remember when we covered scatter plots was the least squares regression line. It included and the LSRL describes the set of numerical data in hopes of predicted the response variable for the given explanatory variable. When sampling for different sets of data we know that there will be different that will affect the LSRL. For inference of a regression line we will use called line of means or line of averages Since there are two estimates, we will work with n – 2 degrees of freedom. For the A.P. Statistics we are only interested in inference for slope (β). Your book gives y = (β o + β 1 x ) + ε Response = prediction from regression line + random deviation
Problem 1 (checking conditions only…for now) With the help of satellite images of Earth, craters from meteor impact, have been identified. Now more than 180 are known. These are only a small sample because many of the craters have been uncovered or eroded away. Astronomers have recognized roughly 35 million year cycle of cratering. Here’s a scatter plot of known impacts. Any ideas why both x and y axis are in log?
Checking Conditions Linearity Assumption. The data on the scatter plot needs to demonstrate a linear relationship. Randomization *Equal Variance Condition. Looking at the residual plot the points need to be scattered constant throughout the line. No patterns. *Nearly Normal Condition. A histogram of the residuals and/or a normal plot needs to be evaluated. * There must be a residual plot and at least a residual plot to check the conditions.
Problem 2 Checking conditions 1. To get residual plot first enter the data in L1 and L2. 2. STAT - > CALC to LinReg (a+bx) L1, L2, Y1 (Y1 is under VARS -> Y Vars, ENTER, ENTER) 3. Then go to STATPLOT.
Residual PlotNormal Probability Plot H.W. Check the conditions for E#13 on pg 769. Show a sketch of your scatterplot with LSRL, residual plot and normal probability plot of your residuals in addition to addressing the conditions and if they are met. I need to use the remaining 35 minutes for Administration for those students who are taking the AP Exam and have not done AP administration in another class.
Step 2 and 3 Step 2: H o : The proportion of opinions is distributed uniformly for all five countries. H A : The proportion of opinions is not uniformly distributed for all five counties. Step 3: χ 2 = p-value ≈ 0.
Answers to 3 and 4 of χ 2 worksheet 3. The expected cell frequency for all 5 countries would be 616 for Necessary 234 for Unnecessary and 30 for Undecided. Step 1: The randomization condition is not met because the 1,000 person sample for each country is not specified to be random. It is questionable whether the 1,000 sampled are representative of each country. The count condition is met because there are counts for all cells of each category. The expected cell frequency is greater than 5 as shown above. The conditions are somewhat met for a Chi-Square model with 8 degrees of freedom for a test of homogeneity.
Step 4 and #4 Step 1 Step 4: The p-value is approximately zero. I reject the H o. There is no similarity in the distribution of opinions of the five countries. The Chi square value of for 8 degrees of freedom demonstrates there is a drastic difference between expected observe. #4 Step 1: The sample of 5,387 is not random, but the sample is big enough to suspect of bias. The expected cell frequency is shown to be greater than five. There are counts for all cells of the categories. The conditions are met for a X 2 model with 12 degrees of freedom with a test of independence.
Remaining steps are for #4 Step 2: H o : Eye color is independent of hair color. H A : Eye color is not independent of hair color. Step 3: χ 2 = p-value ≈ 0. Step 4: I reject the Ho in favor of the HA. Eye color is not independent of hair color. With a p-value of approximately zero and a chi-square of there is strong evidence that these factors are not related.