BASIC REGRESSION CONCEPTS
Basic Idea Behind Regression To to determine how a particular variable (called the dependent variable -- y) is influenced by one or more other variables (called independent variables -- x1, x2, etc.) x1 x2 x3 y
5-step Regression Approach Hypothesize a form of the model Hypothesize whether a linear relation, quadratic relation, etc. exists between y and the x’s Determine the best estimates for the values of the parameters Make and check any necessary assumptions Evaluate the model Determine how good the model is If the model is good -- use it for prediction and estimation
Confidence Interval Review Suppose sales at Dollar Only Stores for 10 randomly selected weeks were: Week Sales 1 101,000 2 92,000 3 110,000 4 120,000 5 90,000 6 82,000 7 93,000 8 75,000 9 91,000 10 105,000
Confidence Interval for Average Weekly Sales Assuming that weekly sales are normally distributed, a 95% confidence interval for average weekly sales is:
Using Excel E3-F16 E3+F16
Regression Concepts But couldn’t the sales (y) have been affected by one or more factors? Advertising dollars (x1) Average number of salesmen (x2) Hours of operation (x3) The weather (good or not good) -- (x4) In this case we may want to “regress” y on one or more of these variables
The Basic Linear Regression Relation We might hypothesize that sales are linearly dependent on all four of the previous variables, i.e.: y = 0 + 1x1 + 2x2 + 3x3 + 4x4 + |<=======Regression =======>| |Error| The ’s are (unknown) constants We shall estimate them by b0, b1, b2, b3 and b4 is a random variable for the variability (error) when x1, x2, x3, and x4 take on a specific set of values has a distribution, a mean, and a standard deviation
Simple Linear Regression Simple linear regression is when we regress y (Sales) on only one variable x (Ad $) y = 0 + 1x + Here, 0 = the true value of the y-intercept 1 = the true slope of the line = a random variable of the “error”
INPUT DATA The Dollar Only Stores advertising for the corresponding 10 sample weeks is: Week(i) Ad $ (xi) Sales (yi) 1 1200 101,000 2 800 92,000 3 1000 110,000 4 1300 120,000 5 700 90,000 6 800 82,000 7 1000 93,000 8 600 75,000 9 900 91,000 10 1100 105,000
Step 1 -- Hypothesizing the form of the model If we are regressing on only one variable -- use a scatterplot to determine an appropriate model Does it look like the data is relatively linear? y = 0 + 1x + Does it look curved? Perhaps y = 0 + 1x + 2x2 + etc. LET’S SEE!
Scatterplot
Step 1 It looks like a straight line fits through the points fairly well. Thus, we hypothesize: y = 0 + 1x + We now must get the best estimates for 0 and 1 -- This is step 2!
Review Regression seeks to explain how a dependent variable (y) is affected by independent variables (x1, x2, x3, etc.) Regression is a multi-step procedure. The first step is to hypothesize a form of the model. If there is only one variable, plot y vs. x to assist in forming the hypothesis.