Economics 173 Business Statistics Lecture 22 Fall, 2001© Professor J. Petry
Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful for several reasons: –It can cover variety of mathematical models linear relationships. non - linear relationships. qualitative variables. –It provides efficient methods for model building, to select the best fitting set of variables.
Polynomial Models The independent variables may appear as functions of a number of predictor variables. –Polynomial models of order p with one predictor variable: y = 0 + 1 x + 2 x 2 + …+ p x p + –Polynomial models with two predictor variables For example: y = 0 + 1 x 1 + 2 x 2 + y = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 +
4 y 0 1 x Polynomial models with one predictor variable –First order model (p = 1) y = 0 + 1 x + 2 x 2 + 2 < 0 2 > 0 –Second order model (p=2)
5 y = 0 + 1 x + 2 x 2 + –Third order model (p=3) 3 x 3 + 3 < 0 3 > 0
6 –First order model y = 0 + 1 x 1 + Polynomial models with two predictor variables x1x1 x2x2 y 2 x 2 + 1 < 0 1 > 0 x1x1 x2x2 y 2 > 0 2 < 0
7 –First order model with interaction y = 0 + 1 x 1 + 2 x 2 + 3 x 1 x 2 + X 2 = 2 X 2 = 3 x1x1 X 2 =1 The two variables interact to affect the value of y. –First order model y = 0 + 1 x 1 + 2 x 2 + Polynomial models with two predictor variables The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. x1x1 0 + 2 (1)] +( 1 + 3 (1))x 1 X 2 =1 X 2 = 2 X 2 = 3 0 + 2 (1)] + 1 x 1 0 + 2 (2)] + 1 x 1 0 + 2 (3)] + 1 x 1 0 + 2 (3)] +( 1 + 3 (3))x1 0 + 2 (2)] +( 1 + 3 (2))x 1
8 –Second order model with interaction y = 0 + 1 x 1 + 2 x 2 + 3 x 4 x y = [ 0 + 2 (3)+ 4 (3 2 )]+ 1 x 1 + 3 x y = [ 0 + 2 (2)+ 4 (2 2 )]+ 1 x 1 + 3 x –Second order model y = 0 + 1 x 1 + 2 x 2 + 3 x 4 x 5 x 1 x 2 + X 2 =1 X 2 = 2 X 2 = 3 y = [ 0 + 2 (1)+ 4 (1 2 )]+ 1 x 1 + 3 x x1x1 X 2 =1 X 2 = 2 X 2 = 3
9 Example 19.1 Location for a new restaurant –A fast food restaurant chain tries to identify new locations that are likely to be profitable. –The primary market for such restaurants is middle- income adults and their children (between the age 5 and 12). –Which regression model should be proposed to predict the profitability of new locations?
10 Solution –The dependent variable will be Gross Revenue –There are quadratic relationships between Revenue and each predictor variable. Why? Members of middle-class families are more likely to visit a fast food family than members of poor or wealthy families. Income Low Middle High Revenue Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids. age Revenue Low Middle High Revenue = 0 + 1 Income + 2 Age + 3 Income 2 + 4 Age 2 + 5 ( Income )( Age ) + Revenue = 0 + 1 Income + 2 Age + 3 Income 2 + 4 Age 2 + 5 ( Income )( Age ) +
Qualitative Independent Variables In many real-life situations one or more independent variables are qualitative. Including qualitative variables in a regression analysis model is done via indicator variables. An indicator variable (I) can assume one out of two values, “zero” or “one”. 1 if a first condition out of two is met 0 if a second condition out of two is met I= 1 if data were collected before if data were collected after if the temperature was below 50 o 0 if the temperature was 50 o or more 1 if a degree earned is in Finance 0 if a degree earned is not in Finance
12 Example continued The dealer believes that color is a variable that affects a car’s price. Three color categories are considered: –White –Silver –Other colors Note: Color is a qualitative variable. I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver And what about “Other colors”? Set I 1 = 0 and I 2 = 0
13 Solution –the proposed model is y = 0 + 1 (Odometer) + 2 I 1 + 3 I 2 + –The data To represent a qualitative variable that has m possible categories (levels), we must create m-1 indicator variables. White car Other color Silver color
14 Price = (Odometer) (0) + 148(1) Price = (Odometer) (1) + 148(0) Price = (Odometer) (0) + 148(0) From Excel we get the regression equation PRICE = (ODOMETER)+45.2I I 2 For one additional mile the auction price decreases by 2.78 cents. Odometer Price A white car sells, on the average, for $45.2 more than a car of the “Other color” category (Odometer) (Odometer) (Odometer) A silver color car sells, on the average, for $148 more than a car of the “Other color” category The equation for a car of the “Other color” category. The equation for a car of white color The equation for a car of silver color
15 There is insufficient evidence to infer that a white color car and a car of “Other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “Other color” category.
16 Create and identify indicator variables to represent the following qualitative variables. Religious affiliation (Catholic, Protestant, other) Working shift (8:00am to 4:00pm, 4:00pm to 12:00 midnight, 12:00 midnight to 8:00am) Supervisor (Ringo Star, Rondal Gondarfshkitka, Seymour Heinne, and Billy Bob Thorton) 1.Assume there are no other supervisors 2.Assume there are other supervisors Example
Model Building Identify the dependent variable, and clearly define it. List potential predictors. –Bear in mind the problem of multicolinearity. –Consider the cost of gathering, processing and storing data. –Be selective in your choice (try to use as few variables as possible).
18 Identify several possible models. –A scatter diagram of the dependent variables can be helpful in formulating the right model. –If you are uncertain, start with first order and second order models, with and without interaction. –Try other relationships (transformations) if the polynomial models fail to provide a good fit. Use statistical software to estimate the model. Gather the required observations (have at least six observations for each independent variable).
19 Determine whether the required conditions are satisfied. If not, attempt to correct the problem. Select the best model. –Use the statistical output. –Use your judgment!!