Lecture 24: Thurs., April 8th

Slides:



Advertisements
Similar presentations
Chap 12-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 12 Simple Regression Statistics for Business and Economics 6.
Advertisements

Forecasting Using the Simple Linear Regression Model and Correlation
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Inference for Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Stat 112: Lecture 15 Notes Finish Chapter 6: –Review on Checking Assumptions (Section ) –Outliers and Influential Points (Section 6.7) Homework.
Class 17: Tuesday, Nov. 9 Another example of interpreting multiple regression coefficients Steps in multiple regression analysis and example analysis Omitted.
Lecture 18: Thurs., Nov. 6th Chapters 8.3.2, 8.4, Outliers and Influential Observations Transformations Interpretation of log transformations (8.4)
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 22: Thurs., April 1 Outliers and influential points for simple linear regression Multiple linear regression –Basic model –Interpreting the coefficients.
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Chapter 12 Simple Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 13 Introduction to Linear Regression and Correlation Analysis.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Simple Linear Regression Analysis
RESEARCH STATISTICS Jobayer Hossain Larry Holmes, Jr November 6, 2008 Examining Relationship of Variables.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Introduction to Probability and Statistics Linear Regression and Correlation.
Regression Diagnostics Checking Assumptions and Data.
Linear Regression Example Data
Lecture 19 Transformations, Predictions after Transformations Other diagnostic tools: Residual plot for nonconstant variance, histogram to check normality.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Correlation and Regression Analysis
Copyright ©2006 Brooks/Cole, a division of Thomson Learning, Inc. More About Regression Chapter 14.
Simple Linear Regression Analysis
Linear Regression/Correlation
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Correlation & Regression
Objectives of Multiple Regression
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Regression and Correlation Methods Judy Zhong Ph.D.
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Chapter 13: Inference in Regression
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
OPIM 303-Lecture #8 Jose M. Cruz Assistant Professor.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 23 Multiple Regression.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Copyright ©2011 Brooks/Cole, Cengage Learning Inference about Simple Regression Chapter 14 1.
Data Analysis.
Lecture 10: Correlation and Regression Model.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Regression Analysis Presentation 13. Regression In Chapter 15, we looked at associations between two categorical variables. We will now focus on relationships.
Lecturer: Ing. Martina Hanová, PhD.. Regression analysis Regression analysis is a tool for analyzing relationships between financial variables:  Identify.
Inference for Least Squares Lines
AP Statistics Chapter 14 Section 1.
(Residuals and
CHAPTER 29: Multiple Regression*
CHAPTER 26: Inference for Regression
Presentation transcript:

Lecture 24: Thurs., April 8th

Inference for Multiple Regression Types of inferences: Confidence intervals/hypothesis tests for regression coefficients Confidence intervals for mean response, prediction intervals Overall usefulness of predictors (F-test, R-squared) Effect tests (we will cover these later when we cover categorical explanatory variables)

Overall usefulness of predictors Are any of the predictors useful? Does the mean of y change as any of the explanatory variables changes. vs. at least one of ‘s does not equal zero. Test (called overall F test) is carried out in Analysis of Variance table. We reject for large values of F statistic. Prob>F is the p-value for this test. For fish mercury data, Prob>F less than 0.0001 – strong evidence that at least one of length/weight is a useful predictor of mercury concentration.

The R-Squared Statistic P-value from overall F test tests whether any of predictors are useful but does not give a measure of how useful the predictors are. R squared is a measure of how good the predictions from the multiple regression model are compared to using the mean of y, i.e., none of the predictors, to predict y. Similar interpretation as in simple linear regression. The R-squared statistic is the proportion of the variation in y explained by the multiple regression model Total Sum of Squares: Residual Sum of Squares:

Air Pollution and Mortality Data set pollution.JMP provides information about the relationship between pollution and mortality for 60 cities between 1959-1961. The variables are y (MORT)=total age adjusted mortality in deaths per 100,000 population; PRECIP=mean annual precipitation (in inches); EDUC=median number of school years completed for persons 25 and older; NONWHITE=percentage of 1960 population that is nonwhite; NOX=relative pollution potential of Nox (related to amount of tons of Nox emitted per day per square kilometer); SO2=relative pollution potential of SO2

Multiple Regression and Causal Inference Goal: Figure out what the causal effect on mortality would be of decreasing air pollution (and keeping everything else in the world fixed) Confounding variable: A variable that is related to both air pollution in a city and mortality in a city. In order to figure out whether air pollution causes mortality, we want to compare mean mortality among cities with different air pollution levels but the same values of the confounding variables. If we include all of the confounding variables in the multiple regression model, the coefficient on air pollution represents the change in the mean of mortality that is caused by a one unit increase in air pollution.

Omitted Variables What happens if we omit a confounding variable from the regression, e.g., percentage of smokers? Suppose we are interested in the causal effect of on y and believe that there are confounding variables and that is the causal effect of on y. If we omit the confounding variable, , then the multiple regression will be estimating the coefficient as the coefficient on . How different are and .

Omitted Variables Bias Formula Suppose that Then Formula tells us about direction and magnitude of bias from omitting a variable in estimating a causal effect. Formula also applies to least squares estimates, i.e.,

Assumptions of Multiple Linear Regression Model For each subpopulation , (A-1A) (A-1B) (A-1C) The distribution of is normal [Distribution of residuals should not depend on ] (A-2) The observations are independent of one another

Checking/Refining Model Tools for checking (A-1A) and (A-1B) Residual plots versus predicted (fitted) values Residual plots versus explanatory variables If model is correct, there should be no pattern in the residual plots Tool for checking (A-1C) Histogram of residuals Tool for checking (A-2) Residual plot versus time or spatial order of observations

Model Building Make scatterplot matrix of variables (using analyze, multivariate). Decide on whether to transform any of the explanatory variables. Fit model. Check residual plots for whether assumptions of multiple regression model are satisfied. Also look for outliers and influential points. Make changes to model and repeat steps 2-3 until an adequate model is found.

2. a) From the scatter plot of MORT vs 2. a) From the scatter plot of MORT vs. NOX we see that NOX values are crunched very tight. A Log transformation of NOX is needed. b) The curvature in MORT vs. SO2 indicates a Log transformation for SO2 may be suitable. After the two transformations we have the following correlations:

Scatterplot Matrix

Dealing with Influential Observations By influential observations, we mean one or several observations whose removal causes a different conclusion or course of action. Display 11.8 provides a strategy for dealing with suspected influential cases.

Cook’s Distance Cook’s distance is a statistic that can be used to flag observations which are influential. After fit model, click on red triangle next to Response, Save columns, Cook’s D influence. Cook’s distance of close to or larger than 1 indicates a large influence.

Leverage Plots The leverage plots produced by JMP provide a “simple regression view” of a multiple regression coefficient. (The leverage plot for variable is a plot of vs. multiple regression residuals.) Slope of line shown in leverage plot is equal to the coefficient for that variable in the multiple regression. Distances from the points to the line in leverage plot are multiple regression residuals. Distance from point to horizontal line is the residual if the explanatory variable is not included in the model. These plots are used to identify outliers, leverage, and influential points for the particular regression coefficient in the multiple regression. (Use them the same way as in a simple regression.)

The influential points can have extreme impact on the analysis  An alternative model  Because of the importance of NOX and SO2, One could choose the final model to be: MORTvs.PRECIP,NONWHITE, EDUC and log Nox and log SO2  Notice that even though log Nox is not significant, one could still leave it in the model.

The enlarged observation New Orleans is an outlier for estimating each coefficient and is highly leveraged for estimating the coefficients of interest on log Nox and log SO2. Since New Orleans is both highly leveraged and an outlier, we expect it to be influential.