Section 5.2: Linear Regression: Fitting a Line to Bivariate Data
Dependent (response variable) – y variable Independent (predictor, or explanatory variable) – x variable
The relationship y = a + bx is the equation of a straight line The value b called the slope of the line is the amount by which y increases when x increases by 1 unit The value a, called the intercept (or sometimes the y-intercept or vertical intercept) of the line is the height of the line above the value x = 0
Example x y y = 7 + 3x a = 7 x increases by 1 y increases by b = 3
Example y y = x x increases by 1 y changes by b = -4 (i.e., changes by –4) a = 17
Least Squares Lines The most widely used criterion for measuring the goodness of fit of a line y = a + bx to bivariate data (x 1, y 1 ),…,(x n, y n ) is the sum of the squared deviations about the line The line that gives the best fit to data is the one that minimizes this sum; it is called the least-squares line or the sample regression line
Coefficients a and b The slope of the least – squares line is: The y intercept is: We write the equation of the least-squares line as: Where the ^ above y (read as y-hat) is a prediction of y resulting from the substitution of a particular x value into the equation
Calculating Formula for the Slope of the Least-Squares Line
Example: Greyhound
Calculations
Classwork
Activity: Exploring Correlation and Regression