Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.

Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather than by hand.

Curvature: Midterm Problem 10

Remedy I: Transformations Use Tukey’s Bulging Rule to choose a transformation.

y =  0 +  1 x 1 +  2 x 2 +…+  p x p +  y =  0 +  1 x +  2 x 2 + …+  p x p +  Remedy II: Polynomial Models

Quadratic Regression

y  0  1 x   First order model (p = 1) y =  0 +  1 x +   2 x 2 +   2 < 0  2 > 0 Second order model (p=2) Polynomial Models with One Predictor Variable

y =  0 +  1 x +  2 x 2 +   3 x 3 +   3 < 0  3 > 0 Third order model (p = 3) Polynomial Models with One Predictor Variable

Interaction Two independent variables x 1 and x 2 interact if the effect of x 1 on y is influenced by the value of x 2. Interaction can be brought into the multiple linear regression model by including the independent variable x 1* x 2. Example:

Interaction Cont. “Slope” for x 1 =E(y|x 1 +1,x 2 )-E(y|x 1,x 2 )= Is the expected income increase from an extra year of education higher for people with IQ 100 or with IQ 130 (or is it the same)?

First order model, two predictors, and interaction y =  0 +  1 x 1 +  2 x 2 +  3 x 1 x 2 +  x1x1 X 2 = 2 X 2 = 3 X 2 =1  0 +  2 (1)] +[  1 +  3 (1)]x 1  0 +  2 (3)] +[  1 +  3 (3)]x1  0 +  2 (2)] +[  1 +  3 (2)]x 1 The two variables interact to affect the value of y. First order model y =  0 +  1 x 1 +  2 x 2 +  The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. x1x1 X 2 = 1 X 2 = 2 X 2 = 3  0 +  2 (1)] +  1 x 1  0 +  2 (2)] +  1 x 1  0 +  2 (3)] +  1 x 1 Polynomial Models with Two Predictor Variables

Second order model with interaction y =  0 +  1 x 1 +  2 x 2 +  3 x 1 2 +  4 x 2 2 +  y = [  0 +  2 (2)+  4 (2 2 )]+  1 x 1 +  3 x 1 2 +  Second order model y =  0 +  1 x 1 +  2 x 2 +  3 x 1 2 +  4 x 2 2 +  X 2 =1 X 2 = 2 X 2 = 3 y = [  0 +  2 (1)+  4 (1 2 )]+  1 x 1 +  3 x 1 2 +  x1x1 X 2 =1 X 2 = 2 X 2 = 3 y = [  0 +  2 (3)+  4 (3 2 )]+  1 x 1 +  3 x 1 2 +  Polynomial Models with Two Predictor Variables  5 x 1 x 2 + 

Selecting a Model Several models have been introduced. How do we select the right model? Selecting a model: –Use your knowledge of the problem (variables involved and the nature of the relationship between them) to select a model. –Test the model using statistical techniques.

Selecting a Model; Example Example 20.1 The location of a new restaurant –A fast food restaurant chain tries to identify new locations that are likely to be profitable. –The primary market for such restaurants is middle-income adults and their children (between the age 5 and 12). –Which regression model should be proposed to predict the profitability of new locations?

–Quadratic relationships between Revenue and each predictor variable should be observed. Why? Members of middle-class families are more likely to visit a fast food family than members of poor or wealthy families. Income Low Middle High Revenue Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids. age Revenue Low Middle High Selecting a Model; Example Solution –The dependent variable will be Gross Revenue

Selecting a Model; Example Solution –The quadratic regression model built is Sales =  0 +  1 INCOME +  2 AGE +  3 INCOME 2 +  4 AGE 2 +  5 ( INCOME )( AGE ) +  Sales =  0 +  1 INCOME +  2 AGE +  3 INCOME 2 +  4 AGE 2 +  5 ( INCOME )( AGE ) +  Include interaction term when in doubt, and test its relevance later. SALES = annual gross sales INCOME = median annual household income in the neighborhood AGE = mean age of children in the neighborhood

Example 20.2 –To verify the validity of the model proposed in example 20.1 for recommending the location of a new fast food restaurant, 25 areas with fast food restaurants were randomly selected. –Each area included one of the firm’s and three competing restaurants. –Data collected included (Xm20-02.jmp): Previous year’s annual gross sales. Mean annual household income. Mean age of children Selecting a Model; Example

Xm20-02 Collected data Added data Selecting a Model; Example

Quadratic Relationships – Graphical Illustration

Model Validation

20.3 Nominal Independent Variables In many real-life situations one or more independent variables are nominal. Including nominal variables in a regression analysis model is done via indicator (or dummy) variables. An indicator variable (I) can assume one out of two values, “zero” or “one”. I= 1 if data were collected before 1980 0 if data were collected after 1980 1 if the temperature was below 50 o 0 if the temperature was 50 o or more 1 if a degree earned is in Finance 0 if a degree earned is not in Finance

Nominal Independent Variables; Example: Auction Car Price (II) Example 18.2 - revised (Xm18-02a)Xm18-02a –Recall: A car dealer wants to predict the auction price of a car. –The dealer believes now that odometer reading and the car color are variables that affect a car’s price. –Three color categories are considered: White Silver Other colors Note: Color is a nominal variable.

Example 18.2 - revised (Xm18-02b)Xm18-02b I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver The category “Other colors” is defined by: I 1 = 0; I 2 = 0 Nominal Independent Variables; Example: Auction Car Price (II)

Note: To represent the situation of three possible colors we need only two indicator variables. Conclusion: To represent a nominal variable with m possible categories, we must create m-1 indicator variables. How Many Indicator Variables?

Solution –the proposed model is y =  0 +  1 (Odometer) +  2 I 1 +  3 I 2 +  –The data White car Other color Silver color Nominal Independent Variables; Example: Auction Car Price

Odometer Price Price = 16701 -.0555(Odometer) + 90.48(0) + 295.48(1) Price = 16701 -.0555(Odometer) + 90.48(1) + 295.48(0) Price = 6350 -.0278(Odometer) + 45.2(0) + 148(0) 16701 -.0555(Odometer) 16791.48 -.0555(Odometer) 16996.48 -.0555(Odometer) The equation for an “other color” car. The equation for a white color car. The equation for a silver color car. From JMP (Xm18-02b) we get the regression equationXm18-02b PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) Example: Auction Car Price The Regression Equation

From JMP we get the regression equation PRICE = 16701-.0555(Odometer)+90.48(I-1)+295.48(I-2) A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $295.48 more than a car of the “Other color” category. For one additional mile the auction price decreases by 5.55 cents. Example: Auction Car Price The Regression Equation

There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Xm18-02b Example: Auction Car Price The Regression Equation

Recall: The Dean wanted to evaluate applications for the MBA program by predicting future performance of the applicants. The following three predictors were suggested: –Undergraduate GPA –GMAT score –Years of work experience It is now believed that the type of undergraduate degree should be included in the model. Nominal Independent Variables; Example: MBA Program Admission (MBA II)MBA II Note: The undergraduate degree is nominal data.

Nominal Independent Variables; Example: MBA Program Admission (II) I 1 = 1 if B.A. 0 otherwise I 2 = 1 if B.B.A 0 otherwise The category “Other group” is defined by: I 1 = 0; I 2 = 0; I 3 = 0 I 3 = 1 if B.Sc. or B.Eng. 0 otherwise

MBA Program Admission (II)

20.4 Applications in Human Resources Management: Pay-Equity Pay-equity can be handled in two different forms: –Equal pay for equal work –Equal pay for work of equal value. Regression analysis is extensively employed in cases of equal pay for equal work.

Human Resources Management: Pay-Equity Example 20.3 (Xm20-03)Xm20-03 –Is there sex discrimination against female managers in a large firm? –A random sample of 100 managers was selected and data were collected as follows: Annual salary Years of education Years of experience Gender

Solution –Construct the following multiple regression model: y =  0 +  1 Education +  2 Experience +  3 Gender +  –Note the nature of the variables: Education – Interval Experience – Interval Gender – Nominal (Gender = 1 if male; =0 otherwise). Human Resources Management: Pay-Equity

Solution – Continued (Xm20-03)Xm20-03 Human Resources Management: Pay-Equity Analysis and Interpretation The model fits the data quite well. The model is very useful. Experience is a variable strongly related to salary. There is no evidence of sex discrimination.

Solution – Continued (Xm20-03)Xm20-03 Human Resources Management: Pay-Equity Analysis and Interpretation Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17 Average salary for female manager is $76,189 Average salary for male manager is $97,832

20.5 Stepwise Regression Purposes of stepwise regression: –Find strong predictors (stepwise forward) –Eliminate weak predictors (stepwise backward) –Prevent highly collinear groups of predictors from collectively entering the model (they degrade pvals) The workings of stepwise regression: –Predictors are entered/removed one at a time –Stepwise forward: given a current model, enter the predictor that increases R 2 the most,…. if pval<0.25 –Stepwise backward: …, remove the predictor that decreases R^2 the least,…. if pval>0.10

Stepwise Regression in JMP “Analyze” ! “Fit Model” response ! Y, predictors ! “add” pull-down menu top right: “Standard Least Squares” ! “Stepwise”; “Run Model” Stepwise Fit window: updates automagically –manual stepwise: check boxes in “Entered” column, to enter and remove predictors –stepwise forward/backward: “Step” to enter/remove one predictor, “Go” for automatic sequential selection –“Direction” pull-down for “forward” (default), “backward”, “mixed” selection strategies

Comments on Stepwise Regression Stepwise regression might not find the best model; you might find better models with manual search, where better means: fewer predictors & larger R 2. Forward search stops when there is no predictor with pval<0.25 (can be changed in JMP). Backward search stops when there is no predictor with pval>0.10 (can be changed in JMP). Often one wants to search only models with certain predictors included. Use “Lock” column in JMP.

Practice Problems 20.6,20.8,20.22,20.24

Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.

Similar presentations

Presentation on theme: "Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.

Similar presentations

Presentation on theme: "Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather."— Presentation transcript:

Similar presentations

About project

Feedback