LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning.

Slides:



Advertisements
Similar presentations
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Advertisements

Copyright © 2009 Pearson Education, Inc. Chapter 29 Multiple Regression.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Simple Linear Regression. Start by exploring the data Construct a scatterplot  Does a linear relationship between variables exist?  Is the relationship.
Regression Analysis Once a linear relationship is defined, the independent variable can be used to forecast the dependent variable. Y ^ = bo + bX bo is.
Ch11 Curve Fitting Dr. Deshi Ye
Objectives (BPS chapter 24)
Chapter 13 Multiple Regression
© 2010 Pearson Prentice Hall. All rights reserved Least Squares Regression Models.
Chapter 10 Simple Regression.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Chapter 12 Simple Regression
BA 555 Practical Business Analysis
Chapter 12 Multiple Regression
Chapter 13 Introduction to Linear Regression and Correlation Analysis
Lecture 19: Tues., Nov. 11th R-squared (8.6.1) Review
The Basics of Regression continued
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Ch. 14: The Multiple Regression Model building
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Chapter 15: Model Building
Correlation and Regression Analysis
Simple Linear Regression Analysis
Correlation & Regression
Objectives of Multiple Regression
Introduction to Linear Regression and Correlation Analysis
Inference for regression - Simple linear regression
Chapter 11 Simple Regression
STA291 Statistical Methods Lecture 27. Inference for Regression.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
Statistics for Business and Economics 7 th Edition Chapter 11 Simple Regression Copyright © 2010 Pearson Education, Inc. Publishing as Prentice Hall Ch.
Introduction to Linear Regression
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Statistics: Unlocking the Power of Data Lock 5 STAT 101 Dr. Kari Lock Morgan Multiple Regression SECTIONS 9.2, 10.1, 10.2 Multiple explanatory variables.
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
MGS3100_04.ppt/Sep 29, 2015/Page 1 Georgia State University - Confidential MGS 3100 Business Analysis Regression Sep 29 and 30, 2015.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
Department of Cognitive Science Michael J. Kalsher Adv. Experimental Methods & Statistics PSYC 4310 / COGS 6310 Regression 1 PSYC 4310/6310 Advanced Experimental.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 21 The Simple Regression Model.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Chapter 14: Inference for Regression. A brief review of chapter 4... (Regression Analysis: Exploring Association BetweenVariables )  Bi-variate data.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Simple Linear Regression Analysis Chapter 13.
Introduction to Multiple Regression Lecture 11. The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
LECTURE 02: EVALUATING MODELS January 27, 2016 SDS 293 Machine Learning.
LECTURE 07: CLASSIFICATION PT. 3 February 15, 2016 SDS 293 Machine Learning.
LECTURE 04: LINEAR REGRESSION PT. 2 February 3, 2016 SDS 293 Machine Learning.
LECTURE 05: CLASSIFICATION PT. 1 February 8, 2016 SDS 293 Machine Learning.
LECTURE 11: LINEAR MODEL SELECTION PT. 1 March SDS 293 Machine Learning.
Michael J. Kalsher PSYCHOMETRICS MGMT 6971 Regression 1 PSYC 4310 Advanced Experimental Methods and Statistics © 2014, Michael Kalsher.
LECTURE 13: LINEAR MODEL SELECTION PT. 3 March 9, 2016 SDS 293 Machine Learning.
STA302/1001 week 11 Regression Models - Introduction In regression models, two types of variables that are studied:  A dependent variable, Y, also called.
LECTURE 15: PARTIAL LEAST SQUARES AND DEALING WITH HIGH DIMENSIONS March 23, 2016 SDS 293 Machine Learning.
Stats Methods at IC Lecture 3: Regression.
Lecture 6 Feb. 2, 2015 ANNOUNCEMENT: Lab session will go from 4:20-5:20 based on the poll. (The majority indicated that it would not be a problem to chance,
Chapter 13 Simple Linear Regression
The simple linear regression model and parameter estimation
Chapter 14 Introduction to Multiple Regression
Chapter 4 Basic Estimation Techniques
STT : Intro. to Statistical Learning
Linear Regression CSC 600: Data Mining Class 13.
POSC 202A: Lecture Lecture: Substantive Significance, Relationship between Variables 1.
CHAPTER 29: Multiple Regression*
Presentation transcript:

LECTURE 03: LINEAR REGRESSION PT. 1 February 1, 2016 SDS 293 Machine Learning

Announcements 1/3 Smith Rstudio Server accounts should now be set up for everyone who requested one When you have time, please try logging in:

Announcements 2/3 Some of you have asked how to get Jupyter to play with R Instructions posted to Piazza, if you get stuck!

Announcements 3/3 Deadline for registration: Feb. 4 th SDS has some (limited) funding to support students who wish to attend Interested? Jordan!

Outline Motivation and Running Example: Advertising Simple Linear Regression - Estimating coefficients - How good is this estimate? - How good is the model? Multiple Linear Regression - Estimating coefficients - Important questions Dealing with Qualitative Predictors Extending the Linear Model - Removing the additive assumption - Non-linear relationships Potential Problems

Motivation Why start a ML course with linear regression?

Running example

Last year’s advertising budget Sales (in 1000s of units sold) Budget (in $1,000s)

Your task

Questions you might ask 1. Is there a relationship between budget and sales? 2. How strong is the relationship? 3. Which media contribute to sales? 4. How accurately can we estimate the effect? 5. How accurately can we predict future sales? 6. Is the relationship linear? 7. Is there synergy among the advertising media? Linear Regression

Simple Linear Regression Straightforward, linear approach for predicting a quantitative response on the basis of a single predictor Assumption: there is a roughly linear relationship between X (the predictor) and Y (the response) The response Is approximately modeled as SlopeIntercept A linear function of the predictor

In practice: and are unknown Before we can use the linear model to predict future response values, we need to estimate them from the data: Our goal: find estimated coefficients and so that Simple Linear Regression

“pretty close” minimizes RSS (other ways in Ch. 6)

Estimating coefficients with RSS Back to our hypothetical model: Define the residual: (difference between observed and predicted responses) We then define the residual sum of squares (RSS) to be:

Minimizing RSS: least squares Goal: find values for and that minimize RSS Dusting off our calculus, the minimizers are: where and are the mean values of the sample

Advertising example We’d expect to sell about 7,030 units without advertising Every additional $1k = approx additional units sold

How good is this estimate? Recall: we assume the true relationship between the predictor and the response takes the form: Consider this:

How good is this estimate? Different sets of observations will produce different estimated coefficients - Avg(huge number of samples)  perfect! - Least-squares is an unbiased estimator: i.e. doesn’t systematically over- or under-estimate In reality, we usually don’t have a huge number of sample to average over… but we still want to be able to predict So how close are our estimated parameters and ?

Standard error To find out, we can borrow the idea of standard error (SE): is the standard deviation of the population is the number of samples Note: the error gets smaller as the sample size increases

Standard error In adapting this to our coefficients, we’ll use the standard deviation of as our value for (why?) Let’s start with the slope: And now the intercept: What happens as x spreads out? What happens when the mean of x is 0?

Residual standard error In practice, we don’t know the standard deviation of Can estimate it using the RSS to get residual standard error: Standard error can be used to compute confidence intervals, a range of values that we believe with some level of confidence contain the true value In linear regression, the 95% confidence intervals are:

Using SE for hypothesis testing Recall: we want to determine whether there is a relationship between advertising budget and sales If there is NO relationship, what is the true value of ? NO relationship = NO slope = 0 We can then compute the probability that we would have observed our by chance, assuming a true value of 0 If this probability is small, we know a relationship exists

Advertising Example

How good is this model? Now that we know there is a relationship, we probably want to know how well the model captures it We’ve seen one measure: RSE is (roughly) the amount the response will deviate from the true regression line RSE is an absolute measure, given in the same units as the response variable Question: how do you know what a “good” RSE is?

R 2 is an alternative approach: captures the proportion of variance explained by the model We calculate it as follows: Total variance in the response Variance not explained after regression How good is this model?

TV advertising and sales Let’s look at these measures for our advertising data: What does the RSE tell us? What does R 2 tell us?

Discussion: multiple predictors So far, we’ve only talked about the effect of one variable Question: how do we handle multiple predictors?

Option 1: SLR for each predictor What problems do you see with this approach?

Option 2: extend the linear model Can accommodate multiple predictors by giving each one its own slope coefficient: Each slope captures the average effect on Y of an increase in one predictor, holding all others constant For example:

Estimating coefficients in MLR We can estimate the parameters ion MLR using the same least squares approach we used in SLR Choose to minimize RSS:

Advertising example When we fit a MLR model to the advertising data, we get the following least-squares coefficients What does this tell us? Do you notice anything unexpected?

What happened to newspaper ads? Does it make sense for the MLR to indicate no effect of newspaper advertising on sales? Let’s look at the correlation between all the dimensions In SLR, newspaper spending was “getting credit” for radio spending’s work!

Questions we ask in MLR Is at least one of the predictors useful in predicting the response? Do all the predictors help to explain the response, or is only a subset of the predictors useful? How well does the model fit the data? Given a set of predictor values, what response value should we predict, and how accurate is our prediction?

Q1: Is at least one predictor useful? In SLR, we tested to see if the slope was 0 (no effect) In MLR, we need to test whether ALL of the slopes are 0 to prove that there is no effect

Is at least one predictor useful? To do this, we compute the F-statistic: where p is the # of predictors and n is the sample size Value close to 1  no effect Question: why look at the F statistic and not just at the p- values for each predictor in turn? (hint: lots of predictors?)

Q2: Do we need them all? Now we know that at least one predictor has an effect: which one(s) is it? Determining which predictors are associated with the response is referred to as variable selection I’ll hint at some classic approaches today, and go into more detail when we get to Ch. 6

Method 1: Exhaustive Search 1. Construct all possible models, each containing a different subset of predictors 2. Evaluate them against one another, and select the one that performs best Problem: how many possible models are there on a dataset with p predictors?

Method 2: Forward selection 1. Begin with null model — an intercept with no predictors. 2. Fit p simple linear regressions, and add the variable that results in the lowest RSS. 3. Add to that model the variable that results in the lowest RSS for the new two-variable model. 4. Iterate. Question: when do we stop?

Method 3: Backward selection 1. Start with a MLR model containing all predictors. 2. Remove the predictor with the largest p-value (the least statistically significant). 3. Fit the new (p−1)-variable, and remove the predictor with the largest p-value. 4. Iterate. Question: when do we stop?

Method 4: Mixed selection 1. Start with the null model. 2. As with forward selection, iteratively add the variable that provides the best fit. 3. If the p-value for one of the variables in the model rises above a certain threshold, remove that variable. 4. Iterate until all variables in the model have a sufficiently low p-value, and all variables outside the model would have a large p-value if added.

Comparing selection methods Backward selection cannot be used if p > n (why?) Does forward selection have the same issue? Forward selection is a greedy approach. Problems? Mixed selection can remedy this, but at what cost? In Ch. 6, we’ll dig deeper into variable selection

Q3: How well does the model fit the data? Just like in SLR, we can use RSE and R 2 to measure how well our model fits the data Using the MLR model we created on the advertising data using all predictors: Question: what happens to the R 2 value if we remove newspaper from the model?

Q4: How confident are we? Now that we have a model, making a prediction is a piece of cake (just plug and chug!) Need to consider 3 kinds of uncertainty: 1. How far off are the coefficients?  confidence intervals 2. How far from linear is the true relationship?  ignore this for now 3. How much will any specific prediction vary from the true value, even if we had perfect coefficients?  prediction intervals

Dealing with Qualitative Predictors So far, we’ve assumed all our predictors were quantitative What happens when we have a qualitative predictor, such as gender or race? No problem! We’ll just trick the model

Two-level predictors Consider a qualitative predictor which only has two possible values (e.g. enrolled and auditing) Imagine we want to predict number of assignments turned in using only this predictor We can incorporate this into a regression model using a dummy variable: if the i th person is enrolled if the i th person is auditing

Two-level predictors ({P1:“enrolled”}, {P2:“enrolled”}, {P3:“auditing”},…) ({P1:1}, {P2:1}, {P3:0},…)

Two-level predictors

if the i th person is enrolled if the i th person is auditing is the average number of assignments turned in by students that are auditing is the average number of assignments turned in by students enrolled in the course is the average difference in number of assignments turned in by each group

A note on dummy variables The decision to code enrolled students as 1 and auditing students as 0 is arbitrary, It has no effect on model fit, or on the predicted values It does alter interpretation of the coefficients - If we swapped them, what would happen? - If we used (-1,1), what would happen?

Multi-level predictors When a qualitative predictor has more than two levels, a single dummy variable won’t cut it We’ll have to create a dummy variable for all but one level For example: if the i th person is from Amherst if the i th person is not from Amherst if the i th person is from Mt. Holyoke if the i th person is not from Mt. Holyoke

Multi-level predictors Then both of these variables can be used to obtain the model: if the i th person is from Amherst if the i th person is from Mt. Holyoke if the i th person is from Smith In this model: - is the average number of assignments turned in by students from Smith - is the difference in average number of assignments turned in by students from Smith vs. Mt. Holyoke - is the difference in average number of assignments turned in by students from Smith vs. Amherst “baseline”

Lab: Linear Regression To do this week’s lab in R, you’ll need the car, ISLR and MASS packages - All are already installed on the Smith Rstudio server - If you need to install them on your local version of R, run: > install.packages(‘MASS’) > install.packages(‘ISLR’) > install.packages(‘car’) Instructions and code can be found at: Full version can be found beginning on p. 109 of ISLR

Assignment 1 PDF posted on course website Problems from ISLR 3.7 (p ) - Conceptual: 3.1, 3.4, and Applied: 3.8, 3.10 Due Wednesday Feb. 10 by 11:59pm