STT : Intro. to Statistical Learning

Slides:



Advertisements
Similar presentations
Managerial Economics in a Global Economy
Advertisements

Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Multiple Regression and Correlation Analysis
1 1 Slide © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Correlation and Regression Analysis
Simple Linear Regression Analysis
Review for Final Exam Some important themes from Chapters 9-11 Final exam covers these chapters, but implicitly tests the entire course, because we use.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS & Updated by SPIROS VELIANITIS.
Quantitative Business Analysis for Decision Making Multiple Linear RegressionAnalysis.
Example of Simple and Multiple Regression
Chapter 13: Inference in Regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Understanding Multivariate Research Berry & Sanders.
1 1 Slide © 2016 Cengage Learning. All Rights Reserved. The equation that describes how the dependent variable y is related to the independent variables.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved Chapter 13 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Ch4 Describing Relationships Between Variables. Section 4.1: Fitting a Line by Least Squares Often we want to fit a straight line to data. For example.
Examining Relationships in Quantitative Research
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
McGraw-Hill/Irwin Copyright © 2013 by The McGraw-Hill Companies, Inc. All rights reserved. A PowerPoint Presentation Package to Accompany Applied Statistics.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
Chapter 13 Multiple Regression
Lecture 4 Introduction to Multiple Regression
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
1 1 Slide © 2011 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
Regression Analysis: Part 2 Inference Dummies / Interactions Multicollinearity / Heteroscedasticity Residual Analysis / Outliers.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
Biostatistics Regression and Correlation Methods Class #10 April 4, 2000.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
Regression Analysis Part A Basic Linear Regression Analysis and Estimation of Parameters Read Chapters 3, 4 and 5 of Forecasting and Time Series, An Applied.
Stats Methods at IC Lecture 3: Regression.
Chapter 15 Multiple Regression Model Building
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
Chapter 15 Multiple Regression and Model Building
Classification Methods
Multiple Regression Prof. Andy Field.
Chapter 9 Multiple Linear Regression
Basic Estimation Techniques
Chapter 11 Multiple Regression
Chapter 11: Simple Linear Regression
Linear Regression CSC 600: Data Mining Class 13.
Multiple Regression Analysis and Model Building
Essentials of Modern Business Statistics (7e)
QM222 A1 On tests and projects
John Loucks St. Edward’s University . SLIDES . BY.
Multiple Regression and Model Building
Correlation and Simple Linear Regression
Basic Estimation Techniques
CHAPTER 29: Multiple Regression*
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Prepared by Lee Revere and John Large
Correlation and Simple Linear Regression
Linear Model Selection and regularization
Simple Linear Regression
Korelasi Parsial dan Pengontrolan Parsial Pertemuan 14
Product moment correlation
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Presentation transcript:

STT592-002: Intro. to Statistical Learning Linear Regression Chapter 03 Disclaimer: This PPT is modified based on IOM 530: Intro. to Statistical Learning

STT592-002: Intro. to Statistical Learning Outline Linear Regression Model Simple Linear, Multiple Linear, Multivariate Linear Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Non-Linear Regression Model Potential Fit Problems Linear vs. KNN Regression

Case 1: Advertisement Data STT592-002: Intro. to Statistical Learning Case 1: Advertisement Data Advertising=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv", header=TRUE); newdata=Advertising[,-1] fix(newdata) View(newdata) names(newdata) pairs(newdata)

Advertisement Data: background STT592-002: Intro. to Statistical Learning Advertisement Data: background

STT592-002: Intro. to Statistical Learning Advertisement Data: 1. Is there a relationship b/w advertising budget (TV, Radio, or Newspaper) and sales? 2. How strong is relationship b/w advertising budget (TV, Radio, or Newspaper) and sales? 3. Which media contribute to sales?

STT592-002: Intro. to Statistical Learning Advertisement Data: 4. How accurately can we estimate effect of each medium on sales? 5. How accurately can we predict future sales? 6. Is the relationship linear? 7. Is there synergy among advertising media?

Advertisement Data: how to fit the data?  LSE STT592-002: Intro. to Statistical Learning Advertisement Data: how to fit the data?  LSE

Simple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Simple Linear Regression: LSE background

Simple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Simple Linear Regression: LSE background

Simple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Simple Linear Regression: LSE background t-statistic ~t(n-2)

Advertisement Data for simple linear regression STT592-002: Intro. to Statistical Learning Advertisement Data for simple linear regression lm.fit=lm(Sales~TV,data=Advertising) ## to get Table 3.1 summary(lm.fit) names(lm.fit) coef(lm.fit) confint(lm.fit)

Q: Is b1=0 i.e. is X an important variable? STT592-002: Intro. to Statistical Learning Q: Is b1=0 i.e. is X an important variable? We use a hypothesis test to answer this question H0: b1=0 vs Ha: b10 Calculate If t is large (equivalently p-value is small) we can be sure that bj0 and that there is a relationship Number of standard deviations away from zero. P-value is 17.67 SE’s from 0

STT592-002: Intro. to Statistical Learning Measures of Fit: R2 Some of the variation in Y can be explained by variation in the X’s and some cannot. R2 tells you the % of variance that can be explained by the regression on X. R2 is always between 0 and 1. Zero means no variance has been explained. One means it has all been explained (perfect fit to the data).

Multiple Linear Regression Model STT592-002: Intro. to Statistical Learning Multiple Linear Regression Model Y: Quantitative Response; Xj: j-th predictor The parameters in the linear regression model are very easy to interpret. 0 is the intercept (i.e. the average value for Y if all the X’s are zero), j is the slope for the jth variable Xj j is the average increase in Y when Xj is increased by one unit and all other X’s are held constant.

Least Squares Estimate STT592-002: Intro. to Statistical Learning Least Squares Estimate We estimate the parameters using least squares i.e. minimize

Relationship between population and least squares lines STT592-002: Intro. to Statistical Learning Relationship between population and least squares lines Population line Least Squares line We would like to know 0 through p i.e. the population line. Instead we know through i.e. the least squares line. Hence we use through as guesses for 0 through p and as a guess for Yi. The guesses will not be perfect just as is not a perfect guess for .

Inference in Regression STT592-002: Intro. to Statistical Learning Inference in Regression 2 4 6 8 10 12 14 -10 -5 5 X Estimated (least squares) line. True (population) line. Unobserved The regression line from the sample is not the regression line from the population. What we want to do: Assess how well the line describes the plot. Guess the slope of the population line. Guess what value Y would take for a given X value

Some Relevant Questions STT592-002: Intro. to Statistical Learning Some Relevant Questions Is j=0 or not? We can use a hypothesis test to answer this question. If we can’t be sure that j≠0 then there is no point in using Xj as one of our predictors. Can we be sure that at least one of our X variables is a useful predictor i.e. is it the case that β1= β2== β p=0?

Advertisement Data for multiple linear regression STT592-002: Intro. to Statistical Learning Advertisement Data for multiple linear regression ## To get Table 3.4 ## lm.fit=lm(Sales~TV+Radio+Newspaper,data=Advertising) summary(lm.fit) names(lm.fit) coef(lm.fit) confint(lm.fit)

1. Is bj=0 i.e. is Xj an important variable? STT592-002: Intro. to Statistical Learning 1. Is bj=0 i.e. is Xj an important variable? We use a hypothesis test to answer this question H0: bj=0 vs Ha: bj0 Calculate If t is large (equivalently p-value is small) we can be sure that bj0 and that there is a relationship Number of standard deviations away from zero. P-value is 17.67 SE’s from 0

Testing Individual Variables STT592-002: Intro. to Statistical Learning Testing Individual Variables Is there a (statistically detectable) linear relationship between Newspapers and Sales after all the other variables have been accounted for? No: big p-value Small p-value in simple regression Almost all the explaining that Newspapers could do in simple regression has already been done by TV and Radio in multiple regression!

2. Is the whole regression explaining anything at all? STT592-002: Intro. to Statistical Learning 2. Is the whole regression explaining anything at all? Test for: H0: all slopes = 0 (b1=b2==bp=0), Ha: at least one slope  0 Answer comes from the F test in the ANOVA (ANalysis Of VAriance) table. The ANOVA table has many pieces of information. What we care about is the F Ratio and the corresponding p-value.

Multiple Linear Regression: LSE background STT592-002: Intro. to Statistical Learning Multiple Linear Regression: LSE background Q: how to find p-value? Q: ANOVA? =(MStreat)/(MSerror)

STT592-002: Intro. to Statistical Learning Adjusted R-Square R-square will always increase when more variables are added to the model, even if those variables are only weakly associated with the response. This is due to the fact that adding another variable to the least squares equations must allow us to fit the training data (though not necessarily the testing data) more accurately.

Deciding on Important Variables: variable selection STT592-002: Intro. to Statistical Learning Deciding on Important Variables: variable selection ## To get Table 3.4 ## lm.fit1=lm(Sales~Newspaper,data=Advertising) summary(lm.fit1) lm.fit2=lm(Sales~Newspaper+TV,data=Advertising) summary(lm.fit2) lm.fit3=lm(Sales~Newspaper+TV+Radio,data=Advertising) summary(lm.fit3) lm.fit4=lm(Sales~TV+Radio,data=Advertising) summary(lm.fit4)

STT592-002: Intro. to Statistical Learning Model fits Adjusted R-square; RSE; Plot the data to detect any synergy or interaction effect. par(mfrow=c(2,2)) plot(lm.fit) plot(predict(lm.fit), residuals(lm.fit)) plot(predict(lm.fit), rstudent(lm.fit)) plot(hatvalues(lm.fit)) which.max(hatvalues(lm.fit))

STT592-002: Intro. to Statistical Learning Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression

STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); head(Credit); newdata=Credit [,-1] fix(newdata); names(newdata) pairs(newdata[,c(1, 2, 4, 5, 6, 7)])

STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); head(Credit); newdata=Credit [,-1] fix(newdata); names(newdata) pairs(newdata[,c(1, 2, 4, 5, 6, 7)]) Q: Numerical/Quantitative Variables? Q: Categorical/Qualitative Variables?

Qualitative Predictors STT592-002: Intro. to Statistical Learning Qualitative Predictors How do you stick “gender” with “men” and “women” (category listings) into a regression equation? Code them as indicator variables (dummy variables) For example we can “code” Males=0 and Females= 1.

One qualitative predictor with two levels STT592-002: Intro. to Statistical Learning One qualitative predictor with two levels Q: To investigate differences in credit card balance between males and females, ignoring the other variables for the moment. Two genders (male and female). Let then the regression equation is

STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); lm.fit=lm(Balance~Gender,data=Credit) summary(lm.fit); contrasts(Credit$Gender)

Two qualitative predictors with only two levels STT592-002: Intro. to Statistical Learning Two qualitative predictors with only two levels Y: Balance. We want to include income and gender. Two genders (male and female). Let then the regression equation is 2 is the average extra balance each month that females have for given income level. Males are the “baseline”.

STT592-002: Intro. to Statistical Learning Other Coding Schemes There are different ways to code categorical variables. Two genders (male and female). Let then the regression equation is 2 is the average amount that females are above the average, for any given income level. 2 is also the average amount that males are below the average, for any given income level.

One qualitative predictor with more than two levels STT592-002: Intro. to Statistical Learning One qualitative predictor with more than two levels Q: To investigate differences in credit card balance between Ethnicity, ignoring the other variables for the moment. Three levels of Ethnicity contrasts(Credit$Ethnicity)

STT592-002: Intro. to Statistical Learning Credit Data: Credit=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Credit.csv", header=TRUE); lm.fit=lm(Balance~Ethnicity,data=Credit) summary(lm.fit) contrasts(Credit$Ethnicity)

Other Issues Discussed STT592-002: Intro. to Statistical Learning Other Issues Discussed Interaction terms Non-linear effects Collinearity and Multicollinearity Model Selection

STT592-002: Intro. to Statistical Learning Interaction When the effect on Y of increasing X1 depends on another X2. [synergy effect] Example: Maybe the effect on Salary (Y) when increasing Position (X1) depends on gender (X2)? For example maybe Male salaries go up faster (or slower) than Females as they get promoted. Advertising example: TV and radio advertising both increase sales. Perhaps spending money on both of them may increase sales more than spending the same amount on one alone?

Interaction in advertising STT592-002: Intro. to Statistical Learning Interaction in advertising Spending $1 extra on TV increases average sales by 0.0191 + 0.0011Radio Spending $1 extra on Radio increases average sales by 0.0289 + 0.0011TV Interaction Term

STT592-002: Intro. to Statistical Learning Credit Data: Advertising=read.csv("http://www-bcf.usc.edu/~gareth/ISL/Advertising.csv", header=TRUE); lm.fit=lm(Sales~TV*Radio,data=Advertising) summary(lm.fit)

Parallel Regression Lines STT592-002: Intro. to Statistical Learning Parallel Regression Lines Line for women Line for men Regression equation female: salary = 112.77+1.86 + 6.05 position males: salary = 112.77-1.86 + 6.05  position Different intercepts Same slopes Parallel lines have the same slope. Dummy variables give lines different intercepts, but their slopes are still the same.

STT592-002: Intro. to Statistical Learning Interaction Effects Our model has forced the line for men and the line for women to be parallel. Parallel lines say that promotions have the same salary benefit for men as for women. If lines aren’t parallel then promotions affect men’s and women’s salaries differently.

Should the Lines be Parallel? STT592-002: Intro. to Statistical Learning Should the Lines be Parallel? 110 120 130 140 150 160 170 1 2 3 4 5 6 7 8 9 10 Position Interaction between gender and position Interaction is not significant

Collinearity and Multicollinearity STT592-002: Intro. to Statistical Learning Collinearity and Multicollinearity To detect collinearity: Use Correlation matrix of predictors. An element in matrix with a large absolute value indicates a pair of highly correlated variables, and therefore a collinearity problem in the data. But not all collinearity problems can be detected by inspection of the correlation matrix. Multicollinearity: collinearity to exist between three or more variables, even if no pair of variables has a particularly high correlation. For multicollinearity, compute the variance inflation factor (VIF).

Collinearity and Multicollinearity STT592-002: Intro. to Statistical Learning Collinearity and Multicollinearity For multicollinearity, compute the variance inflation factor (VIF). VIF≥1. As a rule of thumb, a VIF value that exceeds 5 or 10 indicates a problematic amount of collinearity.

Collinearity and Multicollinearity STT592-002: Intro. to Statistical Learning Collinearity and Multicollinearity To solve for collinearity: 1) Drop one of the problematic variables from the regression. 2) Combine the collinear variables together into a single predictor. For instance, we might take the average of standardized versions of those two variables to create a new variable.

STT592-002: Intro. to Statistical Learning Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression

Potential Fit Problems STT592-002: Intro. to Statistical Learning Potential Fit Problems There are a number of possible problems that one may encounter when fitting the linear regression model. Non-linearity of the data [residual plot] Dependence of the error terms Non-constant variance of error terms Outliers High leverage points Collinearity See Section 3.3.3 for more details.

STT592-002: Intro. to Statistical Learning Outline The Linear Regression Model Least Squares Fit Measures of Fit Inference in Regression Other Considerations in Regression Model Qualitative Predictors Interaction Terms Potential Fit Problems Linear vs. KNN Regression

K-Nearest Neighbors (KNN) classifier (Sec2.2) STT592-002: Intro. to Statistical Learning K-Nearest Neighbors (KNN) classifier (Sec2.2) Given a positive integer K and a test observation x0, the KNN classifier first identifies the neighbors K points in the training data that are closest to x0, represented by N0. It then estimates the conditional probability for class j as the fraction of points in N0 whose response values equal j: Finally, KNN applies Bayes rule and classifies the test observation x0 to the class with the largest probability.

K-Nearest Neighbors (KNN) classifier (Sec2.2) STT592-002: Intro. to Statistical Learning K-Nearest Neighbors (KNN) classifier (Sec2.2) A small training data set: 6 blue and 6 orange observations. Goal: to make a prediction for the black cross. Consider K=3. KNN identify 3 observations that are closest to the cross. This neighborhood is shown as a circle. It consists of 2 blue points and 1 orange point, resulting in estimated probabilities of 2/3 for blue class and 1/3 for the orange class. KNN predict that the black cross belongs to the blue class.

STT592-002: Intro. to Statistical Learning KNN Regression kNN Regression is similar to the kNN classifier. To predict Y for a given value of X, consider k closest points to X in training data and take the average of the responses. i.e. If k is small kNN is much more flexible than linear regression. Is that better?

STT592-002: Intro. to Statistical Learning KNN Fits for k =1 and k = 9

KNN Fits in One Dimension (k =1 and k = 9) STT592-002: Intro. to Statistical Learning KNN Fits in One Dimension (k =1 and k = 9)

STT592-002: Intro. to Statistical Learning Linear Regression Fit

KNN vs. Linear Regression STT592-002: Intro. to Statistical Learning KNN vs. Linear Regression

Not So Good in High Dimensional Situations STT592-002: Intro. to Statistical Learning Not So Good in High Dimensional Situations