STT : Intro. to Statistical Learning

STT592-002: Intro. to Statistical Learning
Linear Regression Chapter 03 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning

Outline Linear Regression Model Simple Linear, Multiple Linear, Multivariate Linear Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Non-Linear Regression Model Potential Fit Problems Linear vs. KNN Regression

Case 1: Advertisement Data
STT : Intro. to Statistical Learning Case 1: Advertisement Data Advertising=read.csv(" header=TRUE); newdata=Advertising[,-1] fix(newdata) View(newdata) names(newdata) pairs(newdata)

Advertisement Data: background
STT : Intro. to Statistical Learning Advertisement Data: background

Advertisement Data: 1. Is there a relationship b/w advertising budget (TV, Radio, or Newspaper) and sales? 2. How strong is relationship b/w advertising budget (TV, Radio, or Newspaper) and sales? 3. Which media contribute to sales?

Advertisement Data: 4. How accurately can we estimate effect of each medium on sales? 5. How accurately can we predict future sales? 6. Is the relationship linear? 7. Is there synergy among advertising media?

Advertisement Data: how to fit the data?  LSE
STT : Intro. to Statistical Learning Advertisement Data: how to fit the data?  LSE

Simple Linear Regression: LSE background
STT : Intro. to Statistical Learning Simple Linear Regression: LSE background

Simple Linear Regression: LSE background
STT : Intro. to Statistical Learning Simple Linear Regression: LSE background t-statistic ~t(n-2)

Advertisement Data for simple linear regression
STT : Intro. to Statistical Learning Advertisement Data for simple linear regression lm.fit=lm(Sales~TV,data=Advertising) ## to get Table 3.1 summary(lm.fit) names(lm.fit) coef(lm.fit) confint(lm.fit)

Q: Is b1=0 i.e. is X an important variable?
STT : Intro. to Statistical Learning Q: Is b1=0 i.e. is X an important variable? We use a hypothesis test to answer this question H0: b1=0 vs Ha: b10 Calculate If t is large (equivalently p-value is small) we can be sure that bj0 and that there is a relationship Number of standard deviations away from zero. P-value is SE’s from 0

Measures of Fit: R2 Some of the variation in Y can be explained by variation in the X’s and some cannot. R2 tells you the % of variance that can be explained by the regression on X. R2 is always between 0 and 1. Zero means no variance has been explained. One means it has all been explained (perfect fit to the data).

Multiple Linear Regression Model
STT : Intro. to Statistical Learning Multiple Linear Regression Model Y: Quantitative Response; Xj: j-th predictor The parameters in the linear regression model are very easy to interpret. 0 is the intercept (i.e. the average value for Y if all the X’s are zero), j is the slope for the jth variable Xj j is the average increase in Y when Xj is increased by one unit and all other X’s are held constant.

Least Squares Estimate
STT : Intro. to Statistical Learning Least Squares Estimate We estimate the parameters using least squares i.e. minimize

Relationship between population and least squares lines
STT : Intro. to Statistical Learning Relationship between population and least squares lines Population line Least Squares line We would like to know 0 through p i.e. the population line. Instead we know through i.e. the least squares line. Hence we use through as guesses for 0 through p and as a guess for Yi. The guesses will not be perfect just as is not a perfect guess for .

Inference in Regression
STT : Intro. to Statistical Learning Inference in Regression 2 4 6 8 10 12 14 -10 -5 5 X Estimated (least squares) line. True (population) line. Unobserved The regression line from the sample is not the regression line from the population. What we want to do: Assess how well the line describes the plot. Guess the slope of the population line. Guess what value Y would take for a given X value

Some Relevant Questions
STT : Intro. to Statistical Learning Some Relevant Questions Is j=0 or not? We can use a hypothesis test to answer this question. If we can’t be sure that j≠0 then there is no point in using Xj as one of our predictors. Can we be sure that at least one of our X variables is a useful predictor i.e. is it the case that β1= β2== β p=0?

Advertisement Data for multiple linear regression
STT : Intro. to Statistical Learning Advertisement Data for multiple linear regression ## To get Table 3.4 ## lm.fit=lm(Sales~TV+Radio+Newspaper,data=Advertising) summary(lm.fit) names(lm.fit) coef(lm.fit) confint(lm.fit)

1. Is bj=0 i.e. is Xj an important variable?
STT : Intro. to Statistical Learning 1. Is bj=0 i.e. is Xj an important variable? We use a hypothesis test to answer this question H0: bj=0 vs Ha: bj0 Calculate If t is large (equivalently p-value is small) we can be sure that bj0 and that there is a relationship Number of standard deviations away from zero. P-value is SE’s from 0

Testing Individual Variables
STT : Intro. to Statistical Learning Testing Individual Variables Is there a (statistically detectable) linear relationship between Newspapers and Sales after all the other variables have been accounted for? No: big p-value Small p-value in simple regression Almost all the explaining that Newspapers could do in simple regression has already been done by TV and Radio in multiple regression!

2. Is the whole regression explaining anything at all?
STT : Intro. to Statistical Learning 2. Is the whole regression explaining anything at all? Test for: H0: all slopes = (b1=b2==bp=0), Ha: at least one slope  0 Answer comes from the F test in the ANOVA (ANalysis Of VAriance) table. The ANOVA table has many pieces of information. What we care about is the F Ratio and the corresponding p-value.

Multiple Linear Regression: LSE background
STT : Intro. to Statistical Learning Multiple Linear Regression: LSE background Q: how to find p-value? Q: ANOVA? =(MStreat)/(MSerror)

Adjusted R-Square R-square will always increase when more variables are added to the model, even if those variables are only weakly associated with the response. This is due to the fact that adding another variable to the least squares equations must allow us to fit the training data (though not necessarily the testing data) more accurately.

Deciding on Important Variables: variable selection
STT : Intro. to Statistical Learning Deciding on Important Variables: variable selection ## To get Table 3.4 ## lm.fit1=lm(Sales~Newspaper,data=Advertising) summary(lm.fit1) lm.fit2=lm(Sales~Newspaper+TV,data=Advertising) summary(lm.fit2) lm.fit3=lm(Sales~Newspaper+TV+Radio,data=Advertising) summary(lm.fit3) lm.fit4=lm(Sales~TV+Radio,data=Advertising) summary(lm.fit4)

Model fits Adjusted R-square; RSE; Plot the data to detect any synergy or interaction effect. par(mfrow=c(2,2)) plot(lm.fit) plot(predict(lm.fit), residuals(lm.fit)) plot(predict(lm.fit), rstudent(lm.fit)) plot(hatvalues(lm.fit)) which.max(hatvalues(lm.fit))

Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression

Credit Data: Credit=read.csv(" header=TRUE); head(Credit); newdata=Credit [,-1] fix(newdata); names(newdata) pairs(newdata[,c(1, 2, 4, 5, 6, 7)])

Credit Data: Credit=read.csv(" header=TRUE); head(Credit); newdata=Credit [,-1] fix(newdata); names(newdata) pairs(newdata[,c(1, 2, 4, 5, 6, 7)]) Q: Numerical/Quantitative Variables? Q: Categorical/Qualitative Variables?

Qualitative Predictors
STT : Intro. to Statistical Learning Qualitative Predictors How do you stick “gender” with “men” and “women” (category listings) into a regression equation? Code them as indicator variables (dummy variables) For example we can “code” Males=0 and Females= 1.

One qualitative predictor with two levels
STT : Intro. to Statistical Learning One qualitative predictor with two levels Q: To investigate differences in credit card balance between males and females, ignoring the other variables for the moment. Two genders (male and female). Let then the regression equation is

Credit Data: Credit=read.csv(" header=TRUE); lm.fit=lm(Balance~Gender,data=Credit) summary(lm.fit); contrasts(Credit$Gender)

Two qualitative predictors with only two levels
STT : Intro. to Statistical Learning Two qualitative predictors with only two levels Y: Balance. We want to include income and gender. Two genders (male and female). Let then the regression equation is 2 is the average extra balance each month that females have for given income level. Males are the “baseline”.

Other Coding Schemes There are different ways to code categorical variables. Two genders (male and female). Let then the regression equation is 2 is the average amount that females are above the average, for any given income level. 2 is also the average amount that males are below the average, for any given income level.

One qualitative predictor with more than two levels
STT : Intro. to Statistical Learning One qualitative predictor with more than two levels Q: To investigate differences in credit card balance between Ethnicity, ignoring the other variables for the moment. Three levels of Ethnicity contrasts(Credit$Ethnicity)

Credit Data: Credit=read.csv(" header=TRUE); lm.fit=lm(Balance~Ethnicity,data=Credit) summary(lm.fit) contrasts(Credit$Ethnicity)

Other Issues Discussed
STT : Intro. to Statistical Learning Other Issues Discussed Interaction terms Non-linear effects Collinearity and Multicollinearity Model Selection

Interaction When the effect on Y of increasing X1 depends on another X2. [synergy effect] Example: Maybe the effect on Salary (Y) when increasing Position (X1) depends on gender (X2)? For example maybe Male salaries go up faster (or slower) than Females as they get promoted. Advertising example: TV and radio advertising both increase sales. Perhaps spending money on both of them may increase sales more than spending the same amount on one alone?

Interaction in advertising
STT : Intro. to Statistical Learning Interaction in advertising Spending $1 extra on TV increases average sales by Radio Spending $1 extra on Radio increases average sales by TV Interaction Term

Credit Data: Advertising=read.csv(" header=TRUE); lm.fit=lm(Sales~TV*Radio,data=Advertising) summary(lm.fit)

Parallel Regression Lines
STT : Intro. to Statistical Learning Parallel Regression Lines Line for women Line for men Regression equation female: salary =  position males: salary =  position Different intercepts Same slopes Parallel lines have the same slope. Dummy variables give lines different intercepts, but their slopes are still the same.

Interaction Effects Our model has forced the line for men and the line for women to be parallel. Parallel lines say that promotions have the same salary benefit for men as for women. If lines aren’t parallel then promotions affect men’s and women’s salaries differently.

Should the Lines be Parallel?
STT : Intro. to Statistical Learning Should the Lines be Parallel? 110 120 130 140 150 160 170 1 2 3 4 5 6 7 8 9 10 Position Interaction between gender and position Interaction is not significant

Collinearity and Multicollinearity
STT : Intro. to Statistical Learning Collinearity and Multicollinearity To detect collinearity: Use Correlation matrix of predictors. An element in matrix with a large absolute value indicates a pair of highly correlated variables, and therefore a collinearity problem in the data. But not all collinearity problems can be detected by inspection of the correlation matrix. Multicollinearity: collinearity to exist between three or more variables, even if no pair of variables has a particularly high correlation. For multicollinearity, compute the variance inflation factor (VIF).

STT : Intro. to Statistical Learning Collinearity and Multicollinearity For multicollinearity, compute the variance inflation factor (VIF). VIF≥1. As a rule of thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity.

STT : Intro. to Statistical Learning Collinearity and Multicollinearity To solve for collinearity: 1) Drop one of the problematic variables from the regression. 2) Combine the collinear variables together into a single predictor. For instance, we might take the average of standardized versions of those two variables to create a new variable.

Potential Fit Problems
STT : Intro. to Statistical Learning Potential Fit Problems There are a number of possible problems that one may encounter when fitting the linear regression model. Non-linearity of the data [residual plot] Dependence of the error terms Non-constant variance of error terms Outliers High leverage points Collinearity See Section for more details.

K-Nearest Neighbors (KNN) classifier (Sec2.2)
STT : Intro. to Statistical Learning K-Nearest Neighbors (KNN) classifier (Sec2.2) Given a positive integer K and a test observation x0, the KNN classifier first identifies the neighbors K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j: Finally, KNN applies Bayes rule and classifies the test observation x0 to the class with the largest probability.

K-Nearest Neighbors (KNN) classifier (Sec2.2)
STT : Intro. to Statistical Learning K-Nearest Neighbors (KNN) classifier (Sec2.2) A small training data set: 6 blue and 6 orange observations. Goal: to make a prediction for the black cross. Consider K=3. KNN identify 3 observations that are closest to the cross. This neighborhood is shown as a circle. It consists of 2 blue points and 1 orange point, resulting in estimated probabilities of 2/3 for blue class and 1/3 for the orange class. KNN predict that the black cross belongs to the blue class.

KNN Regression kNN Regression is similar to the kNN classifier. To predict Y for a given value of X, consider k closest points to X in training data and take the average of the responses. i.e. If k is small kNN is much more flexible than linear regression. Is that better?

KNN Fits for k =1 and k = 9

KNN Fits in One Dimension (k =1 and k = 9)
STT : Intro. to Statistical Learning KNN Fits in One Dimension (k =1 and k = 9)

Linear Regression Fit

KNN vs. Linear Regression
STT : Intro. to Statistical Learning KNN vs. Linear Regression

Not So Good in High Dimensional Situations
STT : Intro. to Statistical Learning Not So Good in High Dimensional Situations

STT : Intro. to Statistical Learning

Similar presentations

Presentation on theme: "STT : Intro. to Statistical Learning"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

STT : Intro. to Statistical Learning

Similar presentations

Presentation on theme: "STT : Intro. to Statistical Learning"— Presentation transcript:

Similar presentations

About project

Feedback