With Thanks to My Students in AMS 572: Data Analysis

Slides:



Advertisements
Similar presentations
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Advertisements

A. The Basic Principle We consider the multivariate extension of multiple linear regression – modeling the relationship between m responses Y 1,…,Y m and.
The Multiple Regression Model.
Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
Best subsets regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
EPI 809/Spring Probability Distribution of Random Error.
Copyright (c) 2004 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 13 Nonlinear and Multiple Regression.
Chapter 13 Multiple Regression
Multiple regression analysis
Chapter 10 Simple Regression.
Chapter 12 Simple Regression
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
The Simple Regression Model
Chapter 11 Multiple Regression.
REGRESSION AND CORRELATION
Multiple Linear Regression
Inferences About Process Quality
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Basics of regression analysis
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Correlation and Regression Analysis
Multiple Linear Regression and the General Linear Model
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Regression and Correlation Methods Judy Zhong Ph.D.
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
12a - 1 © 2000 Prentice-Hall, Inc. Statistics Multiple Regression and Model Building Chapter 12 part I.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 Experimental Statistics - week 10 Chapter 11: Linear Regression and Correlation Note: Homework Due Thursday.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide
Multiple Linear Regression and the General Linear Model 1.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
EQT 373 Chapter 3 Simple Linear Regression. EQT 373 Learning Objectives In this chapter, you learn: How to use regression analysis to predict the value.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Chapter 11 Multiple Linear Regression Group Project AMS 572.
Chapter 11 Multiple Linear Regression Chapter 11 Multiple Linear Regression.
April 4 Logistic Regression –Lee Chapter 9 –Cody and Smith 9:F.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
1 11 Simple Linear Regression and Correlation 11-1 Empirical Models 11-2 Simple Linear Regression 11-3 Properties of the Least Squares Estimators 11-4.
Chapter 13 Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
1 Experimental Statistics - week 12 Chapter 12: Multiple Regression Chapter 13: Variable Selection Model Checking.
Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.
Chapter 8: Simple Linear Regression Yang Zhenlin.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
1 Experimental Statistics - week 13 Multiple Regression Miscellaneous Topics.
Business Research Methods
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Model selection and model building. Model selection Selection of predictor variables.
Chapter 13 Simple Linear Regression
BINARY LOGISTIC REGRESSION
Chapter 7. Classification and Prediction
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
6-1 Introduction To Empirical Models
Prepared by Lee Revere and John Large
Simple Linear Regression
Presentation transcript:

With Thanks to My Students in AMS 572: Data Analysis Multiple Linear Regression, General Linear Model, & Generalized Linear Model With Thanks to My Students in AMS 572: Data Analysis

Outline 1. Introduction to Multiple Linear Regression 2. Statistical Inference 3. Topics in Regression Modeling 4. Example 5. Variable Selection Methods 6. Regression Diagnostic and Strategy for Building a Model

1. Introduction to Multiple Linear Regression

Multiple Linear Regression Regression analysis is a statistical methodology to estimate the relationship of a response variable to a set of predictor variables Multiple linear regression extends simple linear regression model to the case of two or more predictor variable Example: A multiple regression analysis might show us that the demand of a product varies directly with the change in demographic characteristics (age, income) of a market area. Historical Background Galton – example of his work: height of sons of 71” fathers is 67”, height of sons of 64” fathers is 67” Francis Galton started using the term regression in his biology research Karl Pearson and Udny Yule extended Galton’s work to the statistical context Legendre and Gauss developed the method of least squares used in regression analysis Ronald Fisher developed the maximum likelihood method used in the related statistical inference (test of the significance of regression etc.).

History Hi, I am Carl Friedrich Gauss (1777/4/30 – 1855/4/23). I developed the fundamentals of the basis for least-squares analysis in 1795 at the age of eighteen. I published an article called Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientum in 1809. In 1821, I published another article about least square analysis with further development, called Theoria combinationis observationum erroribus minimis obnoxiae. This article includes Gauss–Markov theorem Hi, I am Francis Galton (1822/2/16 –1911/1/17). You guys regard me as the founder of Biostatistics. In my research I found tall parents usual have shorter children; and vice versa. So the height of human being has the tendency to regress to its mean. Yes, It is me who first use the word “regression” for such phenomenon and problems. I am Adrien-Marie Legendre (1752/9/18 -1833/1/10). In 1805, I published an article named Nouvelles méthodes pour la détermination des orbites des comètes. In this article, I introduced Method of Least Squares to the world. Yes, I am the first person who published article regarding to method of least squares, which is the earliest form of regression. HI, I am Karl Pearson. Hi, again. I am Ronald Aylmer Fisher. Galton: Anyone who do not know me must know my cousin, Charles Darwin, who developed Evolutionary Theory. http://forvo.com/word/adrien_marie_legendre/ We both developed regression theory after Galton. Most content in this page comes from Wikipedia

Probabilistic Model is the observed value of the random variable (r.v.) which depends on fixed predictor values according to the following model: Here are unknown model parameters, and n is the number of observations. The random error, , are assumed to be independent r.v.’s with mean 0 and variance Thus are independent r.v.’s with mean and variance , where

Fitting the model The least squares (LS) method is used to find a line that fits the equation Specifically, LS provides estimates of the unknown model parameters, which minimizes, , the sum of squared difference of the observed values, , and the corresponding points on the line with the same x’s The LS can be found by taking partial derivatives of Q with respect to the unknown variables and setting them equal to 0. The result is a set of simultaneous linear equations. The resulting solutions, are the least squares (LS) estimators of , respectively. Please note the LS method is non-parametric. That is, no probability distribution assumptions on Y or ε are needed.

Goodness of Fit of the Model To evaluate the goodness of fit of the LS model, we use the residuals defined by are the fitted values: An overall measure of the goodness of fit is the error sum of squares (SSE) A few other definition similar to those in simple linear regression: total sum of squares (SST): SST is the SSE obtained when fitting the model Yi = B0 + ei, which ignores all the x’s R^2 = 0.5 means 50% of the variation in y is accounted for by x, in this case, all x’s regression sum of squares (SSR):

coefficient of multiple determination: values closer to 1 represent better fits adding predictor variables never decreases and generally increases multiple correlation coefficient (positive square root of ): only positive square root is used R is a measure of the strength of the association between the predictors (x’s) and the one response variable Y

Multiple Regression Model in Matrix Notation The multiple regression model can be represented in a compact form using matrix notation Let: be the n x 1 vectors of the r.v.’s , their observed values , and random errors , respectively for all n observations Let: be the n x (k + 1) matrix of the values of the predictor variables for all n observations (the first column corresponds to the constant term )

Let: and be the (k + 1) x 1 vectors of unknown model parameters and their LS estimates, respectively The model can be rewritten as: The simultaneous linear equations whose solutions yields the LS estimates: If the inverse of the matrix exists, then the solution is given by:

2. Statistical Inference

Determining the statistical significance of the predictor variables: For statistical inferences, we need the assumption that *i.i.d. means independent & identically distributed We test the hypotheses: vs. If we can’t reject , then the corresponding variable is not a significant predictor of y. It is easily shown that each is normal with mean and variance , where is the jth diagonal entry of the matrix

Deriving a pivotal quantity for the inference on Recall The unbiased estimator of the unknown error variance is given by We also know that , and that and are statistically independent. With and by the definition of the t-distribution, we obtain the pivotal quantity for the inference on

Derivation of the Confidence Interval for Thus the 100(1-α)% confidence interval for is: where

Derivation of the Hypothesis Test for at the significance level α Hypotheses: The test statistic is: The decision rule of the test is derived based on the Type I error rate α. That is P (Reject H0 | H0 is true) =  Therefore, we reject H0 at the significance level α if and only if , where is the observed value of

Another Hypothesis Test for all Now consider: for at least one When H0 is true, the test statistics An alternative and equivalent way to make a decision for a statistical test is through the p-value, defined as: p = P(observe a test statistic value at least as extreme as the one observed At the significance level , we reject H0 if and only if p < 

The General Hypothesis Test Consider the full model: (i=1,2,…n) Now consider a partial model: (i=1,2,…n) vs. Hypotheses: for at least one Test statistic: Reject H0 when

Estimating and Predicting Future Observations Let and let The pivotal quantity for is Using this pivotal quantity, we can derive a CI for the estimated mean *: Additionally, we can derive a prediction interval (PI) to predict Y*:

3. Topics in Regression Modeling

3.1 Multicollinearity Definition. The predictor variables are linearly dependent. This can cause serious numerical and statistical difficulties in fitting the regression model unless “extra” predictor variables are deleted.

How does the multicollinearity cause difficulties? If the approximate multicollinearity happens: is nearly singular, which makes numerically unstable. This reflected in large changes in their magnitudes with small changes in data. The matrix has very large elements. Therefore are large, which makes statistically nonsignificant.

Measures of Multicollinearity The correlation matrix R. Easy but can’t reflect linear relationships between more than two variables. Determinant of R can be used as measurement of singularity of . Variance Inflation Factors (VIF): the diagonal elements of . VIF>10 is regarded as unacceptable.

3.2 Polynomial Regression A special case of a linear model: Problems: The powers of x, i.e., tend to be highly correlated. If k is large, the magnitudes of these powers tend to vary over a rather wide range. So, set k<=3 if a good idea, and almost never use k>5.

Solutions Centering the x-variable: Effect: removing the non-essential multicollinearity in the data. Further more, we can standardize the data by dividing the standard deviation of x: Effect: helping to alleviate the second problem. Using the first few principal components of the original variables instead of the original variables.

3.3 Dummy Predictor Variables & The General Linear Model How to handle the categorical predictor variables? If we have categories of an ordinal variable, such as the prognosis of a patient (poor, average, good), one can assign numerical scores to the categories. (poor=1, average=2, good=3)

If we have nominal variable with c>=2 categories If we have nominal variable with c>=2 categories. Use c-1 indicator variables, , called Dummy Variables, to code. for the ith category, for the cth category.

Why don’t we just use c indicator variables: ? If we use that, there will be a linear dependency among them: This will cause multicollinearity.

Example of the dummy variables For instance, if we have four years of quarterly sale data of a certain brand of soda. How can we model the time trend by fitting a multiple regression equation? Solution: We use quarter as a predictor variable x1. To model the seasonal trend, we use indicator variables x2, x3, x4, for Winter, Spring and Summer, respectively. For Fall, all three equal zero. That means: Winter-(1,0,0), Spring-(0,1,0), Summer-(0,0,1), Fall-(0,0,0). Then we have the model:

3. Once the dummy variables are included, the resulting regression model is referred to as a “General Linear Model”. This term must be differentiated from that of the “Generalized Linear Model” which include the “General Linear Model” as a special case with the identity link function: The generalized linear model will link the model parameters to the predictors through a link function. For another example, we will check out the logit link in the logistic regression this afternoon.

Another Example of Generalized Linear Model: Logistic Regression Model In 1938, Ronald Fisher and Frank Yates suggested the logit link for regression with a binary response variable.

A popular model for categorical response variable Logistic regression model is perhaps the most popular generalized linear model for binary data. Logistic regression model is generally used to study the relationship between a binary response variable and a group of predictors (can be either continuous or categorical). Y = 1 (true, success, YES, etc.) or Y = 0 ( false, failure, NO, etc.) Logistic regression model can be extended to model a categorical response variable with more than two categories. The resulting model is sometimes referred to as the multinomial logistic regression model (in contrast to the ‘binomial’ logistic regression for a binary response variable.)

More on the rationale of the logistic regression model Consider a binary response variable Y=0 or 1and a single predictor variable x. We want to model E(Y|x) =P(Y=1|x) as a function of x. The logistic regression model expresses the logistic transform of P(Y=1|x) as a linear function of the predictor. This model can be rewritten as E(Y|x)= P(Y=1| x) *1 + P(Y=0|x) * 0 = P(Y=1|x) is bounded between 0 and 1 for all values of x. The following linear model may violate this condition sometimes: P(Y=1|x) =

More on the properties of the logistic regression model In the simple logistic regression, the regression coefficient has the interpretation that it is the log of the odds ratio of a success event (Y=1) for a unit change in x. For multiple predictor variables, the logistic regression model is

Logistic Regression, SAS Procedure http://www.ats.ucla.edu/stat/sas/output/SAS_logit_output.htm Proc Logistic This page shows an example of logistic regression with footnotes explaining the output. The data were collected on 200 high school students, with measurements on various tests, including science, math, reading and social studies. The response variable is high writing test score (honcomp), where a writing score greater than or equal to 60 is considered high, and less than 60 considered low; from which we explore its relationship with gender (female), reading test score (read), and science test score (science). The dataset used in this page can be downloaded from http://www.ats.ucla.edu/stat/sas/webbooks/reg/default.htm. data logit; set "c:\temp\hsb2"; honcomp = (write >= 60); run; proc logistic data= logit descending; model honcomp = female read science;

Logistic Regression, SAS Output

4. Example (Now we are back to General Linear Model) Here we revisit the classic regression towards Mediocrity in Hereditary Stature by Francis Galton He performed a simple regression to predict offspring height based on the average parent height Slope of regression line was less than 1 showing that extremely tall parents had less extremely tall children At the time, Galton did not have multiple regression as a tool so he had to use other methods to account for the difference between male and female heights We can now perform multiple regression on parent-offspring height and use multiple variables as predictors

Example Our Model: Y = height of child x1 = height of father x2 = height of mother x3 = gender of child

Example We find that: β0 = 15.34476 β1 = 0.405978 β2 = 0.321495 In matrix notation: We find that: β0 = 15.34476 β1 = 0.405978 β2 = 0.321495 β3 = 5.22595

Example Father Mother Gender Child 1 78.5 67 73.2 69.2 69 75.5 66.5 69 75.5 66.5 73.5 72.5 65.5 75 64 71 68 …

Example Important calculations Is the predicted height of each child given a set of predictor variables

Example Are these values significantly different than zero? Ho: βj = 0 Ha: βj ≠ 0 Reject H0j if

Example β-estimate SE t Intercept 2.75 5.59* Father Height 0.0292 15.3 2.75 5.59* Father Height 0.406 0.0292 13.9* Mother Height 0.321 0.0313 10.3* Gender 5.23 0.144 36.3* * p<0.05. We conclude that all β are significantly different than zero.

Example Testing the model as a whole: Ho: β0 = β1 = β2 = β3 = 0 Ha: The above is not true. Reject H0 if Since F = 529.032 >2.615, we reject Ho and conclude that our model predicts height better than by chance.

Example Making Predictions Let’s say George Clooney (71 inches) and Madonna (64 inches) have a baby boy. 95% Prediction interval: 69.97 ± 4.84 = (65.13, 74.81)

Example SAS code ods graphics on data revise; set mylib.galton; if sex = 'M' then gender = 1.0; else gender = 0.0; run; proc reg data=revise; title "Dependence of Child Heights on Parental Heights"; model height = father mother gender / vif; quit; Alternatively, one can use proc GLM procedure that can incorporate the categorical variable (sex) directly via the class statement.

Dependence of Child Heights on Parental Heights 2 The REG Procedure Model: MODEL1 Dependent Variable: height height Number of Observations Read 898 Number of Observations Used 898 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 3 7365.90034 2455.30011 529.03 <.0001 Error 894 4149.16204 4.64112 Corrected Total 897 11515 Root MSE 2.15433 R-Square 0.6397 Dependent Mean 66.76069 Adj R-Sq 0.6385 Coeff Var 3.22694 Parameter Estimates Parameter Standard Variable Label DF Estimate Error t Value Pr > |t| Intercept Intercept 1 15.34476 2.74696 5.59 <.0001 father father 1 0.40598 0.02921 13.90 <.0001 mother mother 1 0.32150 0.03128 10.28 <.0001 gender 1 5.22595 0.14401 36.29 <.0001 Variance Variable Label DF Inflation Intercept Intercept 1 0 father father 1 1.00607 mother mother 1 1.00660 gender 1 1.00188

Example

Example By Gary Bedford & Christine Vendikos

5. Variables Selection Method A. Stepwise Regression

Variables selection method (1) Why do we need to select the variables? (2) How do we select variables? * stepwise regression * best subset regression

Stepwise Regression (p-1)-variable model: P-variable model

Partial correlation coefficients

5. Variables selection method Stepwise Regression: SAS Example

Example 11.5 (T&D pg. 416), 11.9 (T&D pg. 431) The following table shows data on the heat evolved in calories during the hardening of cement on a per gram basis (y) along with the percentages of four ingredients: tricalcium aluminate (x1), tricalcium silicate (x2), tetracalcium alumino ferrite (x3), and dicalcium silicate (x4). No. X1 X2 X3 X4 Y 1 7 26 6 60 78.5 2 29 15 52 74.3 3 11 56 8 20 104.3 4 31 47 87.6 5 33 95.9 55 9 22 109.2 71 17 102.7 44 72.5 54 18 93.1 10 21 1159 40 23 34 83.8 12 66 113.3 13 68 109.4

SAS Program (stepwise variable selection is used) data example115; input x1 x2 x3 x4 y; datalines; 7 26 6 60 78.5 1 29 15 52 74.3 11 56 8 20 104.3 11 31 8 47 87.6 7 52 6 33 95.9 11 55 9 22 109.2 3 71 17 6 102.7 1 31 22 44 72.5 2 54 18 22 93.1 21 47 4 26 115.9 1 40 23 34 83.8 11 66 9 12 113.3 10 68 8 12 109.4 ; run; proc reg data=example115; model y = x1 x2 x3 x4 /selection=stepwise;

Selected SAS output The SAS System 22:10 Monday, November 26, 2006 3 The REG Procedure Model: MODEL1 Dependent Variable: y Stepwise Selection: Step 4 Parameter Standard Variable Estimate Error Type II SS F Value Pr > F Intercept 52.57735 2.28617 3062.60416 528.91 <.0001 x1 1.46831 0.12130 848.43186 146.52 <.0001 x2 0.66225 0.04585 1207.78227 208.58 <.0001 Bounds on condition number: 1.0551, 4.2205 ----------------------------------------------------------------------------------------------------

SAS Output (cont) All variables left in the model are significant at the 0.1500 level. No other variable met the 0.1500 significance level for entry into the model. Summary of Stepwise Selection Variable Variable Number Partial Model Step Entered Removed Vars In R-Square R-Square C(p) F Value Pr > F 1 x4 1 0.6745 0.6745 138.731 22.80 0.0006 2 x1 2 0.2979 0.9725 5.4959 108.22 <.0001 3 x2 3 0.0099 0.9823 3.0182 5.03 0.0517 4 x4 2 0.0037 0.9787 2.6782 1.86 0.2054

5. Variables selection method B. Best Subsets Regression

Best Subsets Regression For the stepwise regression algorithm The final model is not guaranteed to be optimal in any specified sense. In the best subsets regression, subset of variables is chosen from the collection of all subsets of k predictor variables) that optimizes a well-defined objective criterion

Best Subsets Regression In the stepwise regression, We get only one single final models. In the best subsets regression, The investor could specify a size for the predictors for the model.

Best Subsets Regression Optimality Criteria rp2-Criterion: Adjusted rp2-Criterion: The sample estimator, Mallows’ Cp-statistic, is given by Cp-Criterion (recommended for its ease of computation and its ability to judge the predictive power of a model)

Best Subsets Regression Algorithm Note that our problem is to find the minimum of a given function. Use the stepwise subsets regression algorithm and replace the partial F criterion with other criterion such as Cp. Enumerate all possible cases and find the minimum of the criterion functions. Other possibility?

Best Subsets Regression & SAS proc reg data=example115; model y = x1 x2 x3 x4 /selection=ADJRSQ; run; For the selection option, SAS has implemented 9 methods in total. For best subset method, we have the following options: Maximum R2 Improvement (MAXR) Minimum R2 (MINR) Improvement R2 Selection (RSQUARE) Adjusted R2 Selection (ADJRSQ) Mallows' Cp Selection (CP)

6. Building A Multiple Regression Model Steps and Strategy

Modeling is an iterative process Modeling is an iterative process. Several cycles of the steps maybe needed before arriving at the final model. The basic process consists of seven steps

Get started and Follow the Steps Categorization by Usage Collect the Data Divide the Data Explore the Data Fit Candidate Models Select and Evaluate Select the Final Model

Step I Decide the type of model needed, according to different usage. Main categories include: Predictive Theoretical Control Inferential Data Summary Sometimes, models are involved in multiple purposes.

Step II Collect the Data Predictor (X) Response (Y) Data should be relevant and bias-free

Step III Explore the Data Linear Regression Model is sensitive to the noise. Thus, we should treat outliers and influential observations cautiously.

Step IV Divide the Data Training Sets: building How to divide? Test Sets: checking How to divide? Large sample Half-Half Small sample size of training set >16

Step V Fit several Candidate Models Using Training Set.

Step VI Select and Evaluate a Good Model To improve the violations of model assumptions.

Step VII Select the Final Model Use test set to compare competing models by cross-validating them.

Regression Diagnostics (Step VI) Graphical Analysis of Residuals Plot Estimated Errors vs. Xi Values Difference Between Actual Yi & Predicted Yi Estimated Errors Are Called Residuals Plot Histogram or Stem-&-Leaf of Residuals Purposes Examine Functional Form (Linearity ) Evaluate Violations of Assumptions

Linear Regression Assumptions Mean of Probability Distribution of Error Is 0 Probability Distribution of Error Has Constant Variance Probability Distribution of Error is Normal Errors Are Independent

Residual Plot for Functional Form (Linearity) Add X^2 Term Correct Specification

Residual Plot for Equal Variance Unequal Variance Correct Specification Fan-shaped. Standardized residuals used typically (residual divided by standard error of prediction)

Residual Plot for Independence Not Independent Correct Specification

Questions? www.ams.sunysb.edu/~zhu zhu@ams.sunysb.edu