Download presentation
Presentation is loading. Please wait.
Published byBeverley Richard Modified over 9 years ago
1
Linear Regression
2
Simple Linear Regression Using one variable to … 1) explain the variability of another variable 2) predict the value of another variable Both accomplished with the line that best fits a scatterplot. Linear RegressionSlide #2
3
Linear RegressionSlide #3 Recall -- Definitions Response (dependent) variable –variability is being explained or values are predicted –y-axis Explanatory (independent, predictor) variable –used to explain variability or make predictions –x-axis
4
Review -- Line Characteristics 1.What is the most common equation of a line? 2.What does the slope tell us? 3.What does the intercept tell us? Linear RegressionSlide #4
5
Linear RegressionSlide #5 Finding the Best-Fit Line Candidate Lines 80 90100110120 80 90 100 110 120 130 X Y We need an objective criterion
6
Linear RegressionSlide #6 Finding the Best-Fit Line Definition -- Predicted Y ( ) The y-coordinate of the point on the line that corresponds to the observed x value 110120 110 120 130 X Plug value of x into line equation to get
7
Linear RegressionSlide #7 Finding the Best-Fit Line Definition -- Residual 80 90100110120 80 90 100 110 120 130 X Y Residual = Observed Y - Predicted Y
8
Linear RegressionSlide #8 Finding the Best-Fit Line minimize sum of residuals? 80 90100110120 80 90 100 110 120 130 X Y
9
Linear RegressionSlide #9 RSS = sum of squared residuals the line out of all possible lines that minimizes the RSS Should the RSS be computed for all lines? Finding the Best-Fit Line minimize sum of squared residuals?
10
Linear RegressionSlide #10 So …. It is important to understand –where the equation of the line comes from –how to interpret the line It is not important to compute the best-fit line “by hand”
11
Linear RegressionSlide #11 Example -- Rabbit Metabolic Rate Katzner et al. (1997; J. Wildl. Man. 78:1053-1062) examined the metabolic rate of pygmy rabbits (Brachylagus idahoensis) in the laboratory. In particular, they wanted to determine if the variability in resting metabolic rate (ml O 2 g -1 h -1 ) at 20 o C could be adequately explained by body mass (g). What is the response variable? –Resting metabolic rate What is the explanatory variable? –Body mass 1 2
12
Linear RegressionSlide #12 Example -- Rabbit Metabolic Rate Y = 1.41 - 0.00124X R-Sq = 55.4 % 400450500 0.8 0.9 1.0 Mass Metabolic Rate In terms of the variables of the problem, what is the equation of the best-fit line? MetRate = 1.41-0.00124Mass 3
13
Linear RegressionSlide #13 Example -- Rabbit Metabolic Rate Y = 1.41 - 0.00124X R-Sq = 55.4 % 400450500 0.8 0.9 1.0 Mass Metabolic Rate In terms of the variables of the problem, interpret the value of the slope? For each additional gram of mass, the metabolic rate decreases 0.00124 ml O 2 g -1 h -1 on average 4
14
Linear RegressionSlide #14 Example -- Rabbit Metabolic Rate Y = 1.41 - 0.00124X R-Sq = 55.4 % 400450500 0.8 0.9 1.0 Mass Metabolic Rate In terms of the variables of the problem, interpret the value of the y-intercept? Rabbits with no mass have a metabolic rate of 1.41 ml O 2 g -1 h -1 on average 5
15
Linear RegressionSlide #15 Example -- Rabbit Metabolic Rate Y = 1.41 - 0.00124X R-Sq = 55.4 % 400450500 0.8 0.9 1.0 Mass Metabolic Rate What is the predicted metabolic rate for a mass of 450 g? 6 (450,0.85) What is the predicted metabolic rate for a mass of 600 g? 7 What is the residual for a mass of 425 g and a metabolic rate of 0.82 ml O 2 g -1 h -1 ? 8 (425,0.82) (425,0.88)
16
Linear RegressionSlide #16 One More Regression Statistic r 2 = coefficient of determination = proportion of the total variability in the response variable explained away by knowing the value of the explanatory variable
17
Linear RegressionSlide #17 Visualizing r 2 Height Weight Total Variability in Y Variability Explained r 2 = Variability Explained Total Variability in y = Vrbility Remain
18
Linear RegressionSlide #18 Characteristics of r 2 What range of values can r 2 be? Which relationship is stronger -- r 2 = 0.5 or 0.9? Which relationship gives “better” predictions -- r 2 = 0.5 or 0.9? 0 < r 2 < 1
19
Linear RegressionSlide #19 Example -- Rabbit Metabolic Rate Y = 1.41 - 0.00124X R-Sq = 55.4 % 400450500 0.8 0.9 1.0 Mass Metabolic Rate What proportion of the variability in metabolic rate is explained by knowing mass? r 2 = 0.554 9 What is the correlation between metabolic rate and mass? r = 0.554 0.5 = - 0.744 10
20
Simple Linear Regression in R Examine handout – lm() – rSquared() – fitPlot() – predict() Linear RegressionSlide #20
21
Linear RegressionSlide #21 Regression is the Most Used and Most Abused Statistical Technique Assumptions: –A line adequately models the data –Homoscedasticity – same scatter of points along entire line –Residuals at any given value of the explanatory variable are normally distributed –Residuals at any given value of the explanatory variable are independent Intro Advanced
22
Linear RegressionSlide #22 A Line Models the Data 80100120 80 100 120 80100120 80 100 120 80100120 80 100 120 80100120 80 100 120
23
Linear RegressionSlide #23 Homoscedasticity 80100120 80 100 120 80100120 80 100 120 80100120 80 100 120
24
Linear RegressionSlide #24 r 2 doesn’t depend on x because of homoscedasticity Total Variability in Y Vrbility Remain Variability Explained Height Weight
25
Linear RegressionSlide #25 Other Problems Outliers –a problem because the model does not fit that point –may or may not remove Influential Points –a point that would markedly change the line if it were removed –typically an outlier in the x direction
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.