STT592-002: Intro. to Statistical Learning Linear Regression Chapter 03 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning
STT592-002: Intro. to Statistical Learning Outline Linear Regression Model Simple Linear, Multiple Linear, Multivariate Linear Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Non-Linear Regression Model Potential Fit Problems Linear vs. KNN Regression
Case 1: Advertisement Data STT592-002: Intro. to Statistical Learning Case 1: Advertisement Data Advertising=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv", header=TRUE); newdata=Advertising[,-1] fix(newdata) View(newdata) names(newdata) pairs(newdata)
Advertisement Data: background STT592-002: Intro. to Statistical Learning Advertisement Data: background
STT592-002: Intro. to Statistical Learning Advertisement Data: 1. Is there a relationship b/w advertising budget (TV, Radio, or Newspaper) and sales? 2. How strong is relationship b/w advertising budget (TV, Radio, or Newspaper) and sales? 3. Which media contribute to sales?
STT592-002: Intro. to Statistical Learning Advertisement Data: 4. How accurately can we estimate effect of each medium on sales? 5. How accurately can we predict future sales? 6. Is the relationship linear? 7. Is there synergy among advertising media?
Advertisement Data: how to fit the data? LSE STT592-002: Intro. to Statistical Learning Advertisement Data: how to fit the data? LSE
Simple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Simple Linear Regression: LSE background
Simple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Simple Linear Regression: LSE background
Simple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Simple Linear Regression: LSE background t-statistic ~t(n-2)
Advertisement Data for simple linear regression STT592-002: Intro. to Statistical Learning Advertisement Data for simple linear regression lm.fit=lm(Sales~TV,data=Advertising) ## to get Table 3.1 summary(lm.fit) names(lm.fit) coef(lm.fit) confint(lm.fit)
Q: Is b1=0 i.e. is X an important variable? STT592-002: Intro. to Statistical Learning Q: Is b1=0 i.e. is X an important variable? We use a hypothesis test to answer this question H0: b1=0 vs Ha: b10 Calculate If t is large (equivalently p-value is small) we can be sure that bj0 and that there is a relationship Number of standard deviations away from zero. P-value is 17.67 SE’s from 0
STT592-002: Intro. to Statistical Learning Measures of Fit: R2 Some of the variation in Y can be explained by variation in the X’s and some cannot. R2 tells you the % of variance that can be explained by the regression on X. R2 is always between 0 and 1. Zero means no variance has been explained. One means it has all been explained (perfect fit to the data).
Multiple Linear Regression Model STT592-002: Intro. to Statistical Learning Multiple Linear Regression Model Y: Quantitative Response; Xj: j-th predictor The parameters in the linear regression model are very easy to interpret. 0 is the intercept (i.e. the average value for Y if all the X’s are zero), j is the slope for the jth variable Xj j is the average increase in Y when Xj is increased by one unit and all other X’s are held constant.
Least Squares Estimate STT592-002: Intro. to Statistical Learning Least Squares Estimate We estimate the parameters using least squares i.e. minimize
Relationship between population and least squares lines STT592-002: Intro. to Statistical Learning Relationship between population and least squares lines Population line Least Squares line We would like to know 0 through p i.e. the population line. Instead we know through i.e. the least squares line. Hence we use through as guesses for 0 through p and as a guess for Yi. The guesses will not be perfect just as is not a perfect guess for .
Inference in Regression STT592-002: Intro. to Statistical Learning Inference in Regression 2 4 6 8 10 12 14 -10 -5 5 X Estimated (least squares) line. True (population) line. Unobserved The regression line from the sample is not the regression line from the population. What we want to do: Assess how well the line describes the plot. Guess the slope of the population line. Guess what value Y would take for a given X value
Some Relevant Questions STT592-002: Intro. to Statistical Learning Some Relevant Questions Is j=0 or not? We can use a hypothesis test to answer this question. If we can’t be sure that j≠0 then there is no point in using Xj as one of our predictors. Can we be sure that at least one of our X variables is a useful predictor i.e. is it the case that β1= β2== β p=0?
Advertisement Data for multiple linear regression STT592-002: Intro. to Statistical Learning Advertisement Data for multiple linear regression ## To get Table 3.4 ## lm.fit=lm(Sales~TV+Radio+Newspaper,data=Advertising) summary(lm.fit) names(lm.fit) coef(lm.fit) confint(lm.fit)
1. Is bj=0 i.e. is Xj an important variable? STT592-002: Intro. to Statistical Learning 1. Is bj=0 i.e. is Xj an important variable? We use a hypothesis test to answer this question H0: bj=0 vs Ha: bj0 Calculate If t is large (equivalently p-value is small) we can be sure that bj0 and that there is a relationship Number of standard deviations away from zero. P-value is 17.67 SE’s from 0
Testing Individual Variables STT592-002: Intro. to Statistical Learning Testing Individual Variables Is there a (statistically detectable) linear relationship between Newspapers and Sales after all the other variables have been accounted for? No: big p-value Small p-value in simple regression Almost all the explaining that Newspapers could do in simple regression has already been done by TV and Radio in multiple regression!
2. Is the whole regression explaining anything at all? STT592-002: Intro. to Statistical Learning 2. Is the whole regression explaining anything at all? Test for: H0: all slopes = 0 (b1=b2==bp=0), Ha: at least one slope 0 Answer comes from the F test in the ANOVA (ANalysis Of VAriance) table. The ANOVA table has many pieces of information. What we care about is the F Ratio and the corresponding p-value.
Multiple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Multiple Linear Regression: LSE background Q: how to find p-value? Q: ANOVA? =(MStreat)/(MSerror)
STT592-002: Intro. to Statistical Learning Adjusted R-Square R-square will always increase when more variables are added to the model, even if those variables are only weakly associated with the response. This is due to the fact that adding another variable to the least squares equations must allow us to fit the training data (though not necessarily the testing data) more accurately.
Deciding on Important Variables: variable selection STT592-002: Intro. to Statistical Learning Deciding on Important Variables: variable selection ## To get Table 3.4 ## lm.fit1=lm(Sales~Newspaper,data=Advertising) summary(lm.fit1) lm.fit2=lm(Sales~Newspaper+TV,data=Advertising) summary(lm.fit2) lm.fit3=lm(Sales~Newspaper+TV+Radio,data=Advertising) summary(lm.fit3) lm.fit4=lm(Sales~TV+Radio,data=Advertising) summary(lm.fit4)
STT592-002: Intro. to Statistical Learning Model fits Adjusted R-square; RSE; Plot the data to detect any synergy or interaction effect. par(mfrow=c(2,2)) plot(lm.fit) plot(predict(lm.fit), residuals(lm.fit)) plot(predict(lm.fit), rstudent(lm.fit)) plot(hatvalues(lm.fit)) which.max(hatvalues(lm.fit))
STT592-002: Intro. to Statistical Learning Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression
STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); head(Credit); newdata=Credit [,-1] fix(newdata); names(newdata) pairs(newdata[,c(1, 2, 4, 5, 6, 7)])
STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); head(Credit); newdata=Credit [,-1] fix(newdata); names(newdata) pairs(newdata[,c(1, 2, 4, 5, 6, 7)]) Q: Numerical/Quantitative Variables? Q: Categorical/Qualitative Variables?
Qualitative Predictors STT592-002: Intro. to Statistical Learning Qualitative Predictors How do you stick “gender” with “men” and “women” (category listings) into a regression equation? Code them as indicator variables (dummy variables) For example we can “code” Males=0 and Females= 1.
One qualitative predictor with two levels STT592-002: Intro. to Statistical Learning One qualitative predictor with two levels Q: To investigate differences in credit card balance between males and females, ignoring the other variables for the moment. Two genders (male and female). Let then the regression equation is
STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); lm.fit=lm(Balance~Gender,data=Credit) summary(lm.fit); contrasts(Credit$Gender)
Two qualitative predictors with only two levels STT592-002: Intro. to Statistical Learning Two qualitative predictors with only two levels Y: Balance. We want to include income and gender. Two genders (male and female). Let then the regression equation is 2 is the average extra balance each month that females have for given income level. Males are the “baseline”.
STT592-002: Intro. to Statistical Learning Other Coding Schemes There are different ways to code categorical variables. Two genders (male and female). Let then the regression equation is 2 is the average amount that females are above the average, for any given income level. 2 is also the average amount that males are below the average, for any given income level.
One qualitative predictor with more than two levels STT592-002: Intro. to Statistical Learning One qualitative predictor with more than two levels Q: To investigate differences in credit card balance between Ethnicity, ignoring the other variables for the moment. Three levels of Ethnicity contrasts(Credit$Ethnicity)
STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); lm.fit=lm(Balance~Ethnicity,data=Credit) summary(lm.fit) contrasts(Credit$Ethnicity)
Other Issues Discussed STT592-002: Intro. to Statistical Learning Other Issues Discussed Interaction terms Non-linear effects Collinearity and Multicollinearity Model Selection
STT592-002: Intro. to Statistical Learning Interaction When the effect on Y of increasing X1 depends on another X2. [synergy effect] Example: Maybe the effect on Salary (Y) when increasing Position (X1) depends on gender (X2)? For example maybe Male salaries go up faster (or slower) than Females as they get promoted. Advertising example: TV and radio advertising both increase sales. Perhaps spending money on both of them may increase sales more than spending the same amount on one alone?
Interaction in advertising STT592-002: Intro. to Statistical Learning Interaction in advertising Spending $1 extra on TV increases average sales by 0.0191 + 0.0011Radio Spending $1 extra on Radio increases average sales by 0.0289 + 0.0011TV Interaction Term
STT592-002: Intro. to Statistical Learning Credit Data: Advertising=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv", header=TRUE); lm.fit=lm(Sales~TV*Radio,data=Advertising) summary(lm.fit)
Parallel Regression Lines STT592-002: Intro. to Statistical Learning Parallel Regression Lines Line for women Line for men Regression equation female: salary = 112.77+1.86 + 6.05 position males: salary = 112.77-1.86 + 6.05 position Different intercepts Same slopes Parallel lines have the same slope. Dummy variables give lines different intercepts, but their slopes are still the same.
STT592-002: Intro. to Statistical Learning Interaction Effects Our model has forced the line for men and the line for women to be parallel. Parallel lines say that promotions have the same salary benefit for men as for women. If lines aren’t parallel then promotions affect men’s and women’s salaries differently.
Should the Lines be Parallel? STT592-002: Intro. to Statistical Learning Should the Lines be Parallel? 110 120 130 140 150 160 170 1 2 3 4 5 6 7 8 9 10 Position Interaction between gender and position Interaction is not significant
Collinearity and Multicollinearity STT592-002: Intro. to Statistical Learning Collinearity and Multicollinearity To detect collinearity: Use Correlation matrix of predictors. An element in matrix with a large absolute value indicates a pair of highly correlated variables, and therefore a collinearity problem in the data. But not all collinearity problems can be detected by inspection of the correlation matrix. Multicollinearity: collinearity to exist between three or more variables, even if no pair of variables has a particularly high correlation. For multicollinearity, compute the variance inflation factor (VIF).
Collinearity and Multicollinearity STT592-002: Intro. to Statistical Learning Collinearity and Multicollinearity For multicollinearity, compute the variance inflation factor (VIF). VIF≥1. As a rule of thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity.
Collinearity and Multicollinearity STT592-002: Intro. to Statistical Learning Collinearity and Multicollinearity To solve for collinearity: 1) Drop one of the problematic variables from the regression. 2) Combine the collinear variables together into a single predictor. For instance, we might take the average of standardized versions of those two variables to create a new variable.
STT592-002: Intro. to Statistical Learning Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression
Potential Fit Problems STT592-002: Intro. to Statistical Learning Potential Fit Problems There are a number of possible problems that one may encounter when fitting the linear regression model. Non-linearity of the data [residual plot] Dependence of the error terms Non-constant variance of error terms Outliers High leverage points Collinearity See Section 3.3.3 for more details.
STT592-002: Intro. to Statistical Learning Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression
K-Nearest Neighbors (KNN) classifier (Sec2.2) STT592-002: Intro. to Statistical Learning K-Nearest Neighbors (KNN) classifier (Sec2.2) Given a positive integer K and a test observation x0, the KNN classifier first identifies the neighbors K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j: Finally, KNN applies Bayes rule and classifies the test observation x0 to the class with the largest probability.
K-Nearest Neighbors (KNN) classifier (Sec2.2) STT592-002: Intro. to Statistical Learning K-Nearest Neighbors (KNN) classifier (Sec2.2) A small training data set: 6 blue and 6 orange observations. Goal: to make a prediction for the black cross. Consider K=3. KNN identify 3 observations that are closest to the cross. This neighborhood is shown as a circle. It consists of 2 blue points and 1 orange point, resulting in estimated probabilities of 2/3 for blue class and 1/3 for the orange class. KNN predict that the black cross belongs to the blue class.
STT592-002: Intro. to Statistical Learning KNN Regression kNN Regression is similar to the kNN classifier. To predict Y for a given value of X, consider k closest points to X in training data and take the average of the responses. i.e. If k is small kNN is much more flexible than linear regression. Is that better?
STT592-002: Intro. to Statistical Learning KNN Fits for k =1 and k = 9
KNN Fits in One Dimension (k =1 and k = 9) STT592-002: Intro. to Statistical Learning KNN Fits in One Dimension (k =1 and k = 9)
STT592-002: Intro. to Statistical Learning Linear Regression Fit
KNN vs. Linear Regression STT592-002: Intro. to Statistical Learning KNN vs. Linear Regression
Not So Good in High Dimensional Situations STT592-002: Intro. to Statistical Learning Not So Good in High Dimensional Situations