Download presentation
Presentation is loading. Please wait.
Published byElfreda Palmer Modified over 6 years ago
1
The scatterplot shows the advertised prices (in thousands of dollars) plotted against ages (in years) for a random sample of Plymouth Voyagers on several dealer’s lots. A computer printout showing the results of a straight line to the data by the method of least squares gives: Price = – 1.13 Age R-sq = 75.5% Find the correlation coefficient for the relationship between price and age of Voyagers based on the these data. What is the slope of the regression line? Interpret it in the context of these data. How will the size of the correlation coefficient change if the 10- year old Voyager is removed from the data set? Explain. How will the slope of the LSRL change if the 10-year-old Voyager is removed from the data?
2
Coefficient of Determination & Residual Plots
5
3
Coefficient of determination (r2 ) “The variation accounted for”
Shows how strong the association is - telling the percent of the variation in y that the regression line accounts for. remains the same no matter which variable is labeled x 1- r2 is the percent of the original variation left in the residuals Interpretation: r² % of the total variation (change) in y is accounted for by the total variation in x.
4
r2 example 1 Jimmy works at a restaurant and gets paid $8 an hour. He tracks how much total money he has earned each hour during his first shift.
5
r2 example 1 What is my correlation coefficient? r = 1
What is my coefficient of determination? r2 = 1
6
r2 example 1 If I draw a line, what percent of the changes in the values of money can be explained by the regression line based on hours? 100%
7
r2 example 1 We know 100% of the variation in money can be determined by the linear relationship based on hours.
8
r2 example 2 Will we be able to explain the relationship between hours and total money for a member of the wait staff with the same precision?
9
Coefficient of determination
Specifically, the value is the percentage of the variation of the dependent variable that is explained by the regression line based on the independent variable. In other words, in a bivariate data set, the y-values vary a certain amount. How much of that variation can be accounted for if we use a line to model the data.
10
How Big Should R2 Be? R2 is always between 0% and 100%. What makes a “good” R2 value depends on the kind of data you are analyzing and on what you want to do with it. Experiments 80%-90% or higher Surveys 50% or lower
11
HAVE READY. Homework #1-3 from yesterday
HAVE READY! ** Homework #1-3 from yesterday** **4 sets of notes ready to staple** “Scatterplot Notes” - these are NOT PowerPoint notes “Linear Regression ch. 8” PowerPoint notes 3. “Coefficient of Determination” PowerPoint notes 4. “Residual Plots” PowerPoint notes
12
Residual Plots Determining if there is a linear relationship
5
13
How well does age predict the range of motion after knee surgery?
Age Range of Motion How well does age predict the range of motion after knee surgery? Using this model, approximately 30.6% of the variation in range of motion after knee surgery is accounted for by variation in age.
14
Assumptions and Conditions
Quantitative Variables Condition: Regression can only be done on two quantitative variables, so make sure to check this condition. Straight Enough Condition: The linear model assumes that the relationship between the variables is linear. A scatterplot will let you check that the assumption is reasonable.
15
If the scatterplot is not straight enough,
STOP HERE! You can’t use a linear model for any two variables, even if they are related. They must have a linear association or the model won’t mean a thing.
16
Does age affect the range in motion after knee surgery?
Age Range of Motion Does age affect the range in motion after knee surgery?
17
Assumptions and Conditions
Quantitative Variables Condition: Regression can only be done on two quantitative variables, so make sure to check this condition. Straight Enough Condition: The linear model assumes that the relationship between the variables is linear. A scatterplot will let you check that the assumption is reasonable. A residual plot will tell whether a linear model is appropriate with which to model data.
18
(3,10) (6,2) (0,0) Sum of squares = 61.25
19
Residual plot A scatterplot of the (x, residual) pairs.
Residuals can be graphed against other statistics besides x Purpose is to tell if a linear association exist between the x & y variables If no pattern exists between the points in the residual plot, then the association is linear.
20
Linear Not linear
21
Is there a linear relationship between age & range of motion?
Is there a linear relationship between age & range of motion? Since there is no pattern in the residual plot, there is a linear relationship between age and range of motion
22
Residual plots are the same no matter if plotted against x or y-hat.
23
Computer Outputs 5
24
Computer-generated analysis:
Predictor Coef Stdev T P Constant Age s = R-sq = 30.6% R-sq(adj) = 23.7%
25
Example: What is the equation of the least squares line?
b) Where else in the printout do you find the information for the slope and y- intercept? c) Roughly, what change in crown dieback would be associated with an increase of 1 in soil pH? What value of crown dieback would you predict when soil pH = 4.0? e) Would it be sensible to use the least squares line to predict crown dieback when soil pH = 5.67? f) What is the correlation coefficient?
26
Example 1: The growth and decline of forests … included a scatter plot of y = mean crown dieback (%), which is one indicator of growth retardation, and x = soil pH. A statistical computer package MINITAB gives the following analysis: The regression equation is dieback=31.0 – 5.79 soil pH Predictor Coef Stdev t-ratio p Constant soil pH s= R-sq=51.5%
27
a) c) A decrease of 5.79% d) e) f) There is a moderate negative correlation between soil pH and percent crown dieback.
28
Example 2: The following output data from MINITAB shows the number of teachers (in thousands) for each of the states plus the District of Columbia against the number of students (in thousands) enrolled in grades K Predictor Coef Stdev t-ratio p Constant Enroll s= R-sq=81.5% a) LSRL: b) r =
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.