Chapter 11: Simple Linear Regression
McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression Where We’re Going Introduce the straight-line linear regression model as a means of relating one quantitative variable to another quantitative variable. Introduce the correlation coefficient as a means of relating one quantitative variable to another quantitative variable. Assess how well the simple linear regression model fits the sample data. Use the simple linear regression model to predict the value of one variable given the value of another variable. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models There may be a deterministic reality connecting two variables, y and x. But we may not know exactly what that reality is, or there may be an imprecise, or random connection between the variables. The unknown/unknowable influence is referred to as the random error . So our probabilistic models refer to a specific connection between variables, as well as influences we can’t specify exactly in each case: y = f(x) + random error McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models The relationship between goals and attacks in soccer seems at first glance to be deterministic … 3 6 9 12 15 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models But if you consider how many scorers actually scores and how many goals the keeper saves, the rigid model becomes more variable. 3 6 9 12 15 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models General Form of Probabilistic Models y = Deterministic part + Random error where y is the variable of interest, and the mean value of the random error is assumed to be 0. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models The goal of linear regression analysis is to find the straight line that comes closest to all of the points in the scatter plot simultaneously. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models A Linear Probabilistic Model Y = β0 + β1 X+ random error where Y = dependent variable X = independent variable β0 + β1X = deterministic component β0 = y-intercept β1 = slope of the line McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.1: Probabilistic Models β0, the y-intercept, and β1, the slope of the line, are population parameters, and are unknown. Regression analysis is designed to estimate these parameters. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.2: Fitting the Model: The Least Squares Approach Values on the line are the predicted values of total offerings given the average offering. The distances between the scattered dots and the line are the errors of prediction. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.2: Fitting the Model: The Least Squares Approach Values on the line are the predicted values of total offerings given the average offering. The line is estimated to minimize the sum of the squared errors of prediction, and the method of finding this line is called the method of least squares. The distances between the scattered dots and the line are the errors of prediction. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.2: Fitting the Model: The Least Squares Approach Estimates: Deviation or prediction error: SSE: McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.2: Fitting the Model: The Least Squares Approach The least squares line is the line that has the following two properties: The sum of the errors (SE) equals 0. The sum of squared errors (SSE) is smaller than that for any other straight-line model. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.2: Fitting the Model: The Least Squares Approach McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.5: The Correlation Coefficient The correlation coefficient, r, is a measure of the strength of the linear relationship between two variables. It is computed as follows: An r value that close to zero suggests there may not be a linear relationship between the variables. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.5: The Correlation Coefficient Positive linear relationship No linear relationship Negative linear relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x r → +1 r 0 r → -1 Values of r equal to +1 or -1 require each point in the scatter plot to lie on a single straight line. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
11.5: The Correlation Coefficient High |r| x provides important information about y Predictions are more accurate based on the model Low |r| Knowing values of x does not substantially improve predictions on y There may be no relationship between x and y, or it may be more subtle than a linear relationship Predict values of y with the mean of y if no other information is available Predict values of y|x based on a hypothesized linear relationship Evaluate the power of x to predict values of y with the correlation coefficient McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example A fire insurance company wants to relate the amount of fire damage in major residential fires in a city to the distance to the nearest fire station. A sample of 15 recent fires in the city is selected. The amount of damage (y, measured in thousand dollars) and the distance between the fire to the nearest fire station (x, measured in miles) are recorded for each fire. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example How does the proximity of a residential fire to the fire station (x) affect the damages (y) from the fire? y = f(x) y = β0 + β1 x + error The data (found in Table 11.7 pg 610) produce the following straight line. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example The data produce the following estimates (in thousands of dollars): So the estimated damages equal $10,280 + $4920 for each mile from the nearestfire station, or The coefficient of correlation, r, is r = 0.96 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression
McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Suppose we have a new residential fire case whose distance from the nearest fire station is 3.5 miles. We can estimate the damage with the model estimates. The damage due to a residential fire 3.5 miles from the nearest station will be about $27,500. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression