More Linear Regression Outliers, Influential Points, and Confidence Interval Construction.

Slides:



Advertisements
Similar presentations
Simple linear models Straight line is simplest case, but key is that parameters appear linearly in the model Needs estimates of the model parameters (slope.
Advertisements

Exercise 7.5 (p. 343) Consider the hotel occupancy data in Table 6.4 of Chapter 6 (p. 297)
Statistical Techniques I EXST7005 Multiple Regression.
Inference for Regression
Simple Logistic Regression
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Prediction, Correlation, and Lack of Fit in Regression (§11. 4, 11
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Multiple Linear Regression
1 Multiple Regression Response, Y (numerical) Explanatory variables, X 1, X 2, …X k (numerical) New explanatory variables can be created from existing.
July 1, 2008Lecture 17 - Regression Testing1 Testing Relationships between Variables Statistics Lecture 17.
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
Some Terms Y =  o +  1 X Regression of Y on X Regress Y on X X called independent variable or predictor variable or covariate or factor Which factors.
Multiple regression analysis
Psychology 202b Advanced Psychological Statistics, II February 10, 2011.
Feb 21, 2006Lecture 6Slide #1 Adjusted R 2, Residuals, and Review Adjusted R 2 Residual Analysis Stata Regression Output revisited –The Overall Model –Analyzing.
Chapter 12 Simple Regression
Descriptive Statistics In SAS Exploring Your Data.
Regression Diagnostics Using Residual Plots in SAS to Determine the Appropriateness of the Model.
Plots, Correlations, and Regression Getting a feel for the data using plots, then analyzing the data with correlations and linear regression.
Mean Comparison With More Than Two Groups
Two-Way ANOVA in SAS Multiple regression with two or
Lecture 24: Thurs., April 8th
Business Statistics - QBM117 Interval estimation for the slope and y-intercept Hypothesis tests for regression.
This Week Continue with linear regression Begin multiple regression –Le 8.2 –C & S 9:A-E Handout: Class examples and assignment 3.
Statistics 350 Lecture 17. Today Last Day: Introduction to Multiple Linear Regression Model Today: More Chapter 6.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
SAS Lecture 5 – Some regression procedures Aidan McDermott, April 25, 2005.
Inference for regression - Simple linear regression
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Chapter 13: Inference in Regression
Introduction to SAS Essentials Mastering SAS for Data Analytics
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Inference for Linear Regression Conditions for Regression Inference: Suppose we have n observations on an explanatory variable x and a response variable.
Topic 2: An Example. Leaning Tower of Pisa Construction began in 1173 and by 1178 (2 nd floor), it began to sink Construction resumed in To compensate.
1 1 Slide Simple Linear Regression Part A n Simple Linear Regression Model n Least Squares Method n Coefficient of Determination n Model Assumptions n.
Topic 14: Inference in Multiple Regression. Outline Review multiple linear regression Inference of regression coefficients –Application to book example.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
Multiple regression - Inference for multiple regression - A case study IPS chapters 11.1 and 11.2 © 2006 W.H. Freeman and Company.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
Inference for Regression Chapter 14. Linear Regression We can use least squares regression to estimate the linear relationship between two quantitative.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Stat 112 Notes 16 Today: –Outliers and influential points in multiple regression (Chapter 6.7)
Topic 6: Estimation and Prediction of Y h. Outline Estimation and inference of E(Y h ) Prediction of a new observation Construction of a confidence band.
6-3 Multiple Regression Estimation of Parameters in Multiple Regression.
Lecture 4 Introduction to Multiple Regression
1 1 Slide Simple Linear Regression Estimation and Residuals Chapter 14 BA 303 – Spring 2011.
1 Regression Analysis The contents in this chapter are from Chapters of the textbook. The cntry15.sav data will be used. The data collected 15 countries’
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
June 30, 2008Stat Lecture 16 - Regression1 Inference for relationships between variables Statistics Lecture 16.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Lab 4 Multiple Linear Regression. Meaning  An extension of simple linear regression  It models the mean of a response variable as a linear function.
Chapter 12 Inference for Linear Regression. Reminder of Linear Regression First thing you should do is examine your data… First thing you should do is.
We will use the 2012 AP Grade Conversion Chart for Saturday’s Mock Exam.
1. Analyzing patterns in scatterplots 2. Correlation and linearity 3. Least-squares regression line 4. Residual plots, outliers, and influential points.
CHAPTER 3 Describing Relationships
Decomposition of Sum of Squares
ENM 310 Design of Experiments and Regression Analysis
CHAPTER 29: Multiple Regression*
6-1 Introduction To Empirical Models
Ch. 15: Inference for Regression Part I AP STAT Ch. 15: Inference for Regression Part I EQ: How do you determine if there is a significant relationship.
Day 68 Agenda: 30 minute workday on Hypothesis Test --- you have 9 worksheets to use as practice Begin Ch 15 (last topic)
Presentation transcript:

More Linear Regression Outliers, Influential Points, and Confidence Interval Construction

Introduction The following tutorial will show you how to: Make a scatterplot with confidence bands Find outliers and influential points in a data set Conduct multiple linear regression, including an interaction term Calculate confidence intervals for parameter estimates, as well as individual and mean prediction

Consider the following data set: The file infant.txt contains data on the net food supply (# calories per person per day) and the infant mortality rate (# of infant deaths per 1000 live births) for 22 countries before World War II. Copy and paste the data into SAS using the following lines: DATA infant; INPUT country $ food mortality; DATALINES; [paste data lines here] ;

Plotting Confidence Bands We want to determine whether there is a relationship between infant mortality and country’s net food supply. We also want to construct confidence bands around our regression line to visually predict mortality rates given a certain level of food supply.

SAS Code for Confidence Bands Type the following code into SAS. This is similar to previous regression analyses you have conducted, but two options have been added. “pred” requests the confidence bands for individual prediction, and “conf” requests the confidence bands for mean prediction levels.

Plot of Confidence Bands

Interpreting the Plot “PRED” is the regression line “U95M” is the Upper 95% Confidence Interval for mean prediction “L95M” is the Lower 95% Confidence Interval for mean prediction “U95” is the Upper 95% Confidence Band for individual prediction “L95” is the Lower 95% Confidence Band for individual prediction Notice that the individual prediction bands are wider than the mean prediction bands.

Now that you have eye-balled the prediction levels, there is a formal way to calculate mean and individual prediction for a certain level of x (food). Suppose you wanted to know the mean and individual prediciton mortality rates for a country with a net food supply of 2900 calories. There is a simple way to calculate this in SAS. Add another line of data at the end of your data set with a made-up country name, 2900 and ‘.’ for the mortality value. Remember, SAS sees periods (.) as missing data. It will not take the missing value into consideration when calculating the regression line, but it will calculate prediction CIs for this value.

SAS Code: Add a new line of data to the datalines: … Uraguay Country ; Re-run the data set, so that “Country” has been added to your data set, then type the following code into SAS: PROC REG DATA = infant; MODEL mortality = food / clb clm cli; RUN;

Explanation of SAS Code “clb” requests the 95% confidence intervals for the parameter ( β) estimates “clm” requests the 95% confidence interval for mean prediction “cli” requests the 95% confidence interval for individual prediction

SAS Output

Interpreting Output The Regression Line is: Yhat = – 0.08(food) The 95% CI for β 1 is [-0.11,-0.05] Notice that the CI for β 1 does not contain 0, indicating that we reject Ho: β 1 = 0. There is a linear relationship between food and infant mortality. The same conclusion is reached by looking at the p-value for the test statistic (t* = -5.68, p-value < ).

95% CI Prediction Output

Interpreting the CI Output Notice that a new line has been added to your output (Observation #23). This is the new country you added with a calorie amount of The Yhat (Predicted Value) for this calorie is 78.43, which you could also calculate from the regression line: Yhat = – 0.08(2900). The 95% CI for mean/average prediction is found under “95% CL Mean”: [62.12,94.74] The 95% CI for individual/single prediction is found under “95% CL Predict”: [2.81,154.05] Notice that the CI for individual prediction is much wider than that of mean prediction.

Outliers and Influential Points To determine whether your data set contains any outliers or points that are influencing your model, use the options “r” to request residuals and “influence” to request measures of influence in your SAS output: PROC REG DATA = infant; MODEL mortality = food / r influence; RUN; QUIT;

Output from “r” and “influence”

Interpreting Output To determine if a point is an outlier, look for a Student Residual with an absolute value greater than 2.6. Observation #7 (Chile) is an outlier, and observation #16 (Japan) is close to being an outlier. To determine if a point is influential, look for a Cook’s D value greater than 1. There appears to be no influential points.