Chapter 11: Simple Linear Regression

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Simple Linear Regression and Correlation
Simple Linear Regression
Chapter 12 Simple Regression
Simple Linear Regression
Statistics for Business and Economics
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
SIMPLE LINEAR REGRESSION
Chapter Topics Types of Regression Models
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
Chapter 11 Multiple Regression.
SIMPLE LINEAR REGRESSION
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
© 2011 Pearson Education, Inc. Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Correlation & Regression
SIMPLE LINEAR REGRESSION
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
© 2001 Prentice-Hall, Inc. Statistics for Business and Economics Simple Linear Regression Chapter 10.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
1 Chapter 12 Simple Linear Regression. 2 Chapter Outline  Simple Linear Regression Model  Least Squares Method  Coefficient of Determination  Model.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Lecture 10: Correlation and Regression Model.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Statistics for Managers Using Microsoft® Excel 5th Edition
11-1 Copyright © 2014, 2011, and 2008 Pearson Education, Inc.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
© 2011 Pearson Education, Inc Statistics for Business and Economics Chapter 10 Simple Linear Regression.
Chapter 13 Simple Linear Regression
Lecture 11: Simple Linear Regression
Chapter 14 Introduction to Multiple Regression
Chapter 20 Linear and Multiple Regression
Regression and Correlation
Statistics for Business and Economics
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Inference for Regression (Chapter 14) A.P. Stats Review Topic #3
Statistics for Managers using Microsoft Excel 3rd Edition
Inferences for Regression
Linear Regression and Correlation Analysis
Chapter 11: Simple Linear Regression
Chapter 11 Simple Regression
Statistics for Business and Economics (13e)
Chapter 13 Simple Linear Regression
Slides by JOHN LOUCKS St. Edward’s University.
Simple Linear Regression - Introduction
Correlation and Simple Linear Regression
CHAPTER 29: Multiple Regression*
PENGOLAHAN DAN PENYAJIAN
Correlation and Simple Linear Regression
SIMPLE LINEAR REGRESSION
Simple Linear Regression and Correlation
Simple Linear Regression
SIMPLE LINEAR REGRESSION
Inferences for Regression
Introduction to Regression
St. Edward’s University
Chapter 13 Simple Linear Regression
Correlation and Simple Linear Regression
Correlation and Simple Linear Regression
Presentation transcript:

Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression Where We’ve Been Presented methods for estimating and testing population parameters for a single sample Extended those methods to allow for a comparison of population parameters for multiple samples McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression Where We’re Going Introduce the straight-line linear regression model as a means of relating one quantitative variable to another quantitative variable Introduce the correlation coefficient as a means of relating one quantitative variable to another quantitative variable Assess how well the simple linear regression model fits the sample data Use the simple linear regression model to predict the value of one variable given the value of another variable McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models There may be a deterministic reality connecting two variables, y and x But we may not know exactly what that reality is, or there may be an imprecise, or random, connection between the variables. The unknown/unknowable influence is referred to as the random error So our probabilistic models refer to a specific connection between variables, as well as influences we can’t specify exactly in each case: y = f(x) + random error McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models The relationship between home runs and runs in baseball seems at first glance to be deterministic … McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models But if you consider how many runners are on base when the home run is hit, or even how often the batter misses a base and is called out, the rigid model becomes more variable. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models General Form of Probabilistic Models y = Deterministic component + Random error where y is the variable of interest, and the mean value of the random error is assumed to be 0: E(y) = Deterministic component. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models The goal of regression analysis is to find the straight line that comes closest to all of the points in the scatter plot simultaneously. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models A First-Order Probabilistic Model y = 0 + 1x +  where y = dependent variable x = independent variable 0 + 1x = E(y) = deterministic component  = random error component 0 = y – intercept 1 = slope of the line McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.1: Probabilistic Models 0, the y – intercept, and 1, the slope of the line, are population parameters, and invariably unknown. Regression analysis is designed to estimate these parameters. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach Step 1 Hypothesize the deterministic component of the probabilistic model E(y) = 0 + 1x Step 2 Use sample data to estimate the unknown parameters in the model McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach Values on the line are the predicted values of total offerings given the average offering. The distances between the scattered dots and the line are the errors of prediction. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach Values on the line are the predicted values of total offerings given the average offering. The line’s estimated parameters are the values that minimize the sum of the squared errors of prediction, and the method of finding those values is called the method of least squares. The distances between the scattered dots and the line are the errors of prediction. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach Estimates: Deviation: SSE: McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach The least squares line is the line that has the following two properties: The sum of the errors (SE) equals 0. The sum of squared errors (SSE) is smaller than that for any other straight-line model. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach Can home runs be used to predict errors? Is there a relationship between the number of home runs a team hits and the quality of its fielding? McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach Home Runs (x) Errors (y) xi2 xiyi 158 126 24964 19908 155 87 24025 13485 139 65 19321 9035 191 95 36481 18145 124 119 15625 14756 xi = 767 yi = 492 xi2 = 120416 xiyi = 75329 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.2: Fitting the Model: The Least Squares Approach These results suggest that teams which hit more home runs are (slightly) better fielders (maybe not what we expected). There are, however, only five observations in the sample. It is important to take a closer look at the assumptions we made and the results we got. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.3: Model Assumptions Assumptions 1. The mean of the probability distribution of  is 0. 2. The variance,  2, of the probability distribution of  is constant. 3. The probability distribution of  is normal. 4. The values of  associated with any two values of y are independent. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.3: Model Assumptions The variance,  2, is used in every test statistic and confidence interval used to evaluate the model. Invariably,  2 is unknown and must be estimated. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.3: Model Assumptions McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 Note: There may be many different patterns in the scatter plot when there is no linear relationship. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 A critical step in the evaluation of the model is to test whether 1 = 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x Positive Relationship No Relationship Negative Relationship 1 > 0 1 = 0 1 < 0 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 H0 : 1 = 0 Ha : 1 ≠ 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x Positive Relationship No Relationship Negative Relationship 1 > 0 1 = 0 1 < 0 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 The four assumptions described above produce a normal sampling distribution for the slope estimate: called the estimated standard error of the least squares slope estimate. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 Home Runs (x) Errors (y) xi -  (xi - )2  = E(Errors|HRs) yi--  (yi-- )2 158 126 4.6 21.16 98.14 27.86 776.4 155 87 1.6 2.56 98.31 -11.31 127.9 139 65 -14.4 207.4 99.23 -34.23 1171 191 95 37.6 1414 96.25 -1.245 1.55 124 119 -29.4 864.4 100.1 18.92 357.8 xi = 767 yi = 492 SSxx= 2509 SSE = 2435  = 153.4 McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 Since the t-value does not lead to rejection of the null hypothesis, we can conclude that A different set of data may yield different results. There is a more complicated relationship. There is no relationship (non-rejection does not lead to this conclusion automatically). McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 Interpreting p-Values for  Software packages report two-tailed p-values. To conduct one-tailed tests of hypotheses, the reported p-values must be adjusted: McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 A Confidence Interval on 1 where the estimated standard error is and t/2 is based on (n – 2) degrees of freedom McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.4: Assessing the Utility of the Model: Making Inferences about the Slope 1 In the home runs and errors example, the estimated 1 was -.0521, and the estimated standard error was .569. With 3 degrees of freedom, t = 3.182. The confidence interval is, therefore, which includes 0, so there may be no relationship between the two variables. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination The coefficient of correlation, r, is a measure of the strength of the linear relationship between two variables. It is computed as follows: McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination Positive linear relationship No linear relationship Negative linear relationship . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y y y x x x r → +1 r  0 r → -1 Values of r equal to +1 or -1 require each point in the scatter plot to lie on a single straight line. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination In the example about homeruns and errors, SSxy= -143.8 and SSxx= 2509. SSyy is computed as so McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination An r value that close to zero suggests there may not be a linear relationship between the variables, which is consistent with our earlier look at the null hypothesis and the confidence interval on  1. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination The coefficient of determination, r2, represents the proportion of the total sample variability around the mean of y that is explained by the linear relationship between x and y. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.5: The Coefficients of Correlation and Determination High r 2 x provides important information about y Predictions are more accurate based on the model Low r 2 Knowing values of x does not substantially improve predictions on y There may be no relationship between x and y, or it may be more subtle than a linear relationship Predict values of y with the mean of y if no other information is available Predict values of y|x based on a hypothesized linear relationship Evaluate the power of x to predict values of y with the coefficient of determination McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction Statistical Inference based on the linear regression model Estimate the mean of y for a specific value of x: E(y)|x (over many experiments with this x-value) Estimate an individual value of y for a given x value (for a single experiment with this value of x) McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction Based on our model results, a team that hits 140 home runs is expected to make 99.9 errors: McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction A 95% Prediction Interval for an Individual Team’s Errors McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction Prediction intervals for individual new values of y are wider than confidence intervals on the mean of y because of the extra source of error. Error in E(y|xp) Error in predicting a mean value of y|xp Error in E(y|xp) Sampling error from the y population Error in predicting a specific value of y|xp McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

11.6: Using the Model for Estimation and Prediction Estimating y beyond the range of values associated with the observed values of x can lead to large prediction errors. Beyond the range of observed x values, the relationship may look very different. Estimated relationship True relationship Xi Xj Range of observed values of x McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Step 1 How does the proximity of a fire house (x) affect the damages (y) from a fire? y = f(x) y = 0 +1x +  McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Step 2 The data (found in Table 11.7) produce the following estimates (in thousands of dollars): The estimated damages equal $10,280 + $4910 for each mile from the fire station, or McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Step 3 The estimate of the standard deviation, , of  is s = 2.31635 Most of the observed fire damages will be within 2s  4.64 thousand dollars of the predicted value McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Step 4 Test that the true slope is 0 SAS automatically performs a two-tailed test, with a reported p-value < .0001. The one-tailed p-value is < .00005, which provides strong evidence to reject the null. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Step 4 A 95% confidence interval on 1 from the SAS output is 4.071 ≤ 1 ≤ 5.768. The coefficient of determination, r 2, is .9235. The coefficient of correlation, r, is McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Suppose the distance from the nearest station is 3.5 miles. We can estimate the damage with the model estimates. We’re 95% sure the damage for a fire 3.5 miles from the nearest station will be between $22,324 and $32,667. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression

McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression 11.7: A Complete Example Suppose the distance from the nearest station is 3.5 miles. We can estimate the damage with the model estimates. We’re 95% sure the damage for a fire 3.5 miles from the nearest station will be between $22,324 and $32,667. Since the x-values in our sample range from .7 to 6.1, predictions about y for x-values beyond this range will be unreliable. McClave: Statistics, 11th ed. Chapter 11: Simple Linear Regression