Class 16: Thursday, Nov. 4 Note: I will e-mail you some info on the final project this weekend and will discuss in class on Tuesday.

Slides:



Advertisements
Similar presentations
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Advertisements

Inference for Regression
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 512 – Lecture 18 Multiple Regression (Ch. 11)
Lecture 25 Multiple Regression Diagnostics (Sections )
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Stat 112: Lecture 14 Notes Finish Chapter 6:
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 13-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Lecture 6 Notes Note: I will homework 2 tonight. It will be due next Thursday. The Multiple Linear Regression model (Chapter 4.1) Inferences from.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24 Multiple Regression (Sections )
Lecture 24: Thurs., April 8th
Lecture 23 Multiple Regression (Sections )
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Regression Diagnostics Checking Assumptions and Data.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Linear Regression Example Data
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Correlation and Regression Analysis
Introduction to Regression Analysis, Chapter 13,
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
Correlation & Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Regression and Correlation Methods Judy Zhong Ph.D.
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
STA291 Statistical Methods Lecture 27. Inference for Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Inferences for Regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.
Soc 3306a Lecture 9: Multivariate 2 More on Multiple Regression: Building a Model and Interpreting Coefficients.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Applied Quantitative Analysis and Practices LECTURE#23 By Dr. Osman Sadiq Paracha.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
Regression Analysis Week 8 DIAGNOSTIC AND REMEDIAL MEASURES Residuals The main purpose examining residuals Diagnostic for Residuals Test involving residuals.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Multiple Regression Petter Mostad Review: Simple linear regression We define a model where are independent (normally distributed) with equal.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Slide 1 DSCI 5340: Predictive Modeling and Business Forecasting Spring 2013 – Dr. Nick Evangelopoulos Lecture 2: Review of Multiple Regression (Ch. 4-5)
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
Stat 112 Notes 5 Today: –Chapter 3.7 (Cautions in interpreting regression results) –Normal Quantile Plots –Chapter 3.6 (Fitting a linear time trend to.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Chapter 12: Correlation and Linear Regression 1.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
Quantitative Methods Residual Analysis Multiple Linear Regression C.W. Jackson/B. K. Gordor.
Inference for Least Squares Lines
Regression model Y represents a value of the response variable.
CHAPTER 29: Multiple Regression*
Unit 3 – Linear regression
Presentation transcript:

Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.

Predicting Emergency Calls to the AAA Club

R-Squared R-squared: As in simple linear regression, measures proportion of variability in Y explained by the regression of Y on these X’s. Between 0 and 1, nearer to 1 indicates more variability explained. Don’t get excited that R-squared has increased when you add more variables into the model. Adding another explanatory variable to the model will always increase R-squared. The right question to ask is not whether R-squared has increased when we add an explanatory variable to a model but whether or not R-squared has increased by a useful amount. The t-statistic and the associated p-value for the t- test for each coefficient answers this question.

Overall F-test Test of whether any of the predictors are useful: vs. at least one of does not equal zero. Tests whether the model provides better predictions than the sample mean of Y. p-value for the test: Prob>F in Analysis of Variance table. p-value = 0.005, strong evidence that at least one of the predictors is useful for predicting ERS for the New York AAA club.

Assumptions of Multiple Linear Regression Model 1.Linearity: 2.Constant variance: The standard deviation of Y for the subpopulation of units with is the same for all subpopulations. 3.Normality: The distribution of Y for the subpopulation of units with is normally distributed for all subpopulations. 4.The observations are independent.

Assumptions for linear regression and their importance to inferences InferenceAssumptions that are important Point prediction, point estimation Linearity, independence Confidence interval for slope, hypothesis test for slope, confidence interval for mean response Linearity, constant variance, independence, normality (only if n<30) Prediction intervalLinearity, constant variance, independence, normality

Checking Linearity Plot residuals versus each of the explanatory variables. Each of these plots should look like random scatter, with no pattern in the mean of the residuals. If residual plots show a problem, then we could try to transform the x-variable and/or the y-variable.

Residual Plots in JMP After Fit Model, click red triangle next to Response, click Save Columns and click Residuals. Use Fit Y by X with Y=Residuals and X the explanatory variable of interest. Fit Line will draw a horizontal line with intercept zero. It is a property of the residuals from multiple linear regression that a least squares regression of the residuals on an explanatory variable has slope zero and intercept zero.

Residual by Predicted Plot Fit Model displays the Residual by Predicted Plot automatically in its output. The plot is a plot of the residuals versus the predicted Y’s, We can think of the predicted Y’s as summarizing all the information in the X’s. As usual we would like this plot to show random scatter. Pattern in the mean of the residuals as the predicted Y’s increase: Indicates problem with linearity. Look at residual plots versus each explanatory variable to isolate problem and consider transformations. Pattern in the spread of the residuals: Indicates problem with constant variance.

Checking Normality As with simple linear regression, make histogram of residuals and normal quantile plot of residuals. Normality appears to be violated: several points are outside the confidence bands. Distribution of Residuals is skewed to the right.

Transformations to Remedy Constant Variance and Normality Nonconstant Variance When the variance of Y| increases with, try transforming Y to log Y or Y to When the variance of Y| decreases with, try transforming Y to 1/Y or Y to Y 2 Nonnormality When the distribution of the residuals is skewed to the right, try transforming Y to log Y. When the distribution of the residuals is skewed to the left, try transforming Y to Y 2

Influential Points, High Leverage Points, Outliers As in simple linear regression, we identify high leverage and high influence points by checking the leverages and Cook’s distances (Use save columns to save Cook’s D Influence and Hats). High influence points: Cook’s distance > 1 High leverage points: Hat greater than (3*(# of explanatory variables + 1))/n is a point with high leverage. Use same guidelines for dealing with influential observations as in simple linear regression. Point that has unusual Y given its explanatory variables: point with a residual that is more than 3 RMSEs away from zero.

Scatterplot Matrix Before fitting a multiple linear regression model, it is good idea to make scatterplots of the response variable versus the explanatory variable. This can suggest transformations of the explanatory variables that need to be done as well as potential outliers and influential points. Scatterplot matrix in JMP: Click Analyze, Multivariate Methods and Multivariate, and then put the response variable first in the Y, columns box and then the explanatory variables in the Y, columns box.

In order to evaluate benefits of a proposed irrigation scheme in Egypt, the relation of yield Y of wheat to rainfall is investigated over several years (see rainfall.JMP). How can regression analysis help? YearYield (Bu./Acre), YTotal Spring Rainfall, R Average Spring Temperature, T

Simple Linear Regression of Yield on Rainfall Rainfall reduces yield!? Is irrigation a bad idea?

Interpretation of coefficient of rainfall: The change in the mean yield that is associated with a one inch increase in rainfall. Other important variables (lurking variables) are not held fixed and might tend to change as rainfall increases. Temperature tends to decrease as rainfall increases.

Controlling for Known Lurking Variables: Multiple Regression To evaluate the benefits of the irrigation scheme, we want to know how changes in rainfall are associated with changes in yield when all other important variables (lurking variables) such as temperature held fixed. Multiple regression provides this. Coefficient on rainfall in the multiple regression of yield on rainfall and temperature = change in the mean yield that is associated with a one inch increase in rainfall when temperature is held fixed.

Multiple Regression Analysis Rainfall is estimated to be beneficial once temperature is held fixed.