Lecture 26 Model Building (Chapters 20.2-20.3) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.

Slides:



Advertisements
Similar presentations
1 Chapter 9 Supplement Model Building. 2 Introduction Introduction Regression analysis is one of the most commonly used techniques in statistics. It is.
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Example 1 To predict the asking price of a used Chevrolet Camaro, the following data were collected on the car’s age and mileage. Data is stored in CAMARO1.
Fundamentals of Real Estate Lecture 13 Spring, 2003 Copyright © Joseph A. Petry
1 Multiple Regression Chapter Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent.
1 Simple Linear Regression and Correlation The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES Assessing the model –T-tests –R-square.
Statistics for Managers Using Microsoft® Excel 5th Edition
Simple Linear Regression
Statistics for Managers Using Microsoft® Excel 5th Edition
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Lecture 25 Multiple Regression Diagnostics (Sections )
Multiple Regression Involves the use of more than one independent variable. Multivariate analysis involves more than one dependent variable - OMS 633 Adding.
Lecture 22 Multiple Regression (Sections )
1 Multiple Regression. 2 Introduction In this chapter we extend the simple linear regression model, and allow for any number of independent variables.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Class 10: Tuesday, Oct. 12 Hurricane data set, review of confidence intervals and hypothesis tests Confidence intervals for mean response Prediction intervals.
Lecture 20 Simple linear regression (18.6, 18.9)
1 Simple Linear Regression Chapter Introduction In this chapter we examine the relationship among interval variables via a mathematical equation.
1 Lecture Eleven Probability Models. 2 Outline Bayesian Probability Duration Models.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
1 Lecture Eleven Probability Models. 2 Outline Bayesian Probability Duration Models.
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Adminstrative Info for Final Exam Location: Steinberg Hall-Dietrich Hall 351 Time: Thursday, May 1st, 4:00-6:00 p.m. Closed book. Allowed two double-sided.
Lecture 17 Interaction Plots Simple Linear Regression (Chapter ) Homework 4 due Friday. JMP instructions for question are actually for.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Lecture 19 Simple linear regression (Review, 18.5, 18.8)
Simple Linear Regression. Introduction In Chapters 17 to 19, we examine the relationship between interval variables via a mathematical equation. The motivation.
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Copyright © 2011 Pearson Education, Inc. Multiple Regression Chapter 23.
Chapter 13: Inference in Regression
Secondary Data, Measures, Hypothesis Formulation, Chi-Square Market Intelligence Julie Edell Britton Session 3 August 21, 2009.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
1 1 Slide © 2005 Thomson/South-Western Slides Prepared by JOHN S. LOUCKS St. Edward’s University Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
1 1 Slide © 2007 Thomson South-Western. All Rights Reserved OPIM 303-Lecture #9 Jose M. Cruz Assistant Professor.
1 1 Slide © 2012 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 1 Slide Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple Coefficient of Determination n Model Assumptions n Testing.
Economics 173 Business Statistics Lecture 22 Fall, 2001© Professor J. Petry
Outline When X’s are Dummy variables –EXAMPLE 1: USED CARS –EXAMPLE 2: RESTAURANT LOCATION Modeling a quadratic relationship –Restaurant Example.
Chapter 13 Multiple Regression
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
Lecture 27 Chapter 20.3: Nominal Variables HW6 due by 5 p.m. Wednesday Office hour today after class. Extra office hour Wednesday from Final Exam:
Chapter 11 Correlation and Simple Linear Regression Statistics for Business (Econ) 1.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Economics 173 Business Statistics Lecture 10 Fall, 2001 Professor J. Petry
Chapter 8: Simple Linear Regression Yang Zhenlin.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 12 Multiple.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Classification Tree Interaction Detection. Use of decision trees Segmentation Stratification Prediction Data reduction and variable screening Interaction.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
1 Chapter 20 Model Building Introduction Regression analysis is one of the most commonly used techniques in statistics. It is considered powerful.
1 Assessment and Interpretation: MBA Program Admission Policy The dean of a large university wants to raise the admission standards to the popular MBA.
1 Simple Linear Regression Chapter Introduction In Chapters 17 to 19 we examine the relationship between interval variables via a mathematical.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Chapter 15 Multiple Regression Model Building
Lecture Eleven Probability Models.
Inference for Least Squares Lines
Linear Regression.
Multiple Regression Analysis and Model Building
Business Statistics Multiple Regression This lecture flows well with
1/18/2019 ST3131, Lecture 1.
Presentation transcript:

Lecture 26 Model Building (Chapters ) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather than by hand.

Curvature: Midterm Problem 10

Remedy I: Transformations Use Tukey’s Bulging Rule to choose a transformation.

y =  0 +  1 x 1 +  2 x 2 +…+  p x p +  y =  0 +  1 x +  2 x 2 + …+  p x p +  Remedy II: Polynomial Models

Quadratic Regression

y  0  1 x   First order model (p = 1) y =  0 +  1 x +   2 x 2 +   2 < 0  2 > 0 Second order model (p=2) Polynomial Models with One Predictor Variable

y  0  1 x   First order model (p = 1) y =  0 +  1 x +   2 x 2 +   2 < 0  2 > 0 Second order model (p=2) Polynomial Models with One Predictor Variable

y =  0 +  1 x +  2 x 2 +   3 x 3 +   3 < 0  3 > 0 Third order model (p = 3) Polynomial Models with One Predictor Variable

Interaction Two independent variables x 1 and x 2 interact if the effect of x 1 on y is influenced by the value of x 2. Interaction can be brought into the multiple linear regression model by including the independent variable x 1* x 2. Example:

Interaction Cont. “Slope” for x 1 =E(y|x 1 +1,x 2 )-E(y|x 1,x 2 )= Is the expected income increase from an extra year of education higher for people with IQ 100 or with IQ 130 (or is it the same)?

First order model, two predictors, and interaction y =  0 +  1 x 1 +  2 x 2 +  3 x 1 x 2 +  x1x1 X 2 = 2 X 2 = 3 X 2 =1  0 +  2 (1)] +[  1 +  3 (1)]x 1  0 +  2 (3)] +[  1 +  3 (3)]x1  0 +  2 (2)] +[  1 +  3 (2)]x 1 The two variables interact to affect the value of y. First order model y =  0 +  1 x 1 +  2 x 2 +  The effect of one predictor variable on y is independent of the effect of the other predictor variable on y. x1x1 X 2 = 1 X 2 = 2 X 2 = 3  0 +  2 (1)] +  1 x 1  0 +  2 (2)] +  1 x 1  0 +  2 (3)] +  1 x 1 Polynomial Models with Two Predictor Variables

Second order model with interaction y =  0 +  1 x 1 +  2 x 2 +  3 x  4 x  y = [  0 +  2 (2)+  4 (2 2 )]+  1 x 1 +  3 x  Second order model y =  0 +  1 x 1 +  2 x 2 +  3 x  4 x  X 2 =1 X 2 = 2 X 2 = 3 y = [  0 +  2 (1)+  4 (1 2 )]+  1 x 1 +  3 x  x1x1 X 2 =1 X 2 = 2 X 2 = 3 y = [  0 +  2 (3)+  4 (3 2 )]+  1 x 1 +  3 x  Polynomial Models with Two Predictor Variables  5 x 1 x 2 + 

Selecting a Model Several models have been introduced. How do we select the right model? Selecting a model: –Use your knowledge of the problem (variables involved and the nature of the relationship between them) to select a model. –Test the model using statistical techniques.

Selecting a Model; Example Example 20.1 The location of a new restaurant –A fast food restaurant chain tries to identify new locations that are likely to be profitable. –The primary market for such restaurants is middle-income adults and their children (between the age 5 and 12). –Which regression model should be proposed to predict the profitability of new locations?

–Quadratic relationships between Revenue and each predictor variable should be observed. Why? Members of middle-class families are more likely to visit a fast food family than members of poor or wealthy families. Income Low Middle High Revenue Families with very young or older kids will not visit the restaurant as frequent as families with mid-range ages of kids. age Revenue Low Middle High Selecting a Model; Example Solution –The dependent variable will be Gross Revenue

Selecting a Model; Example Solution –The quadratic regression model built is Sales =  0 +  1 INCOME +  2 AGE +  3 INCOME 2 +  4 AGE 2 +  5 ( INCOME )( AGE ) +  Sales =  0 +  1 INCOME +  2 AGE +  3 INCOME 2 +  4 AGE 2 +  5 ( INCOME )( AGE ) +  Include interaction term when in doubt, and test its relevance later. SALES = annual gross sales INCOME = median annual household income in the neighborhood AGE = mean age of children in the neighborhood

Example 20.2 –To verify the validity of the model proposed in example 20.1 for recommending the location of a new fast food restaurant, 25 areas with fast food restaurants were randomly selected. –Each area included one of the firm’s and three competing restaurants. –Data collected included (Xm20-02.jmp): Previous year’s annual gross sales. Mean annual household income. Mean age of children Selecting a Model; Example

Xm20-02 Collected data Added data Selecting a Model; Example

Quadratic Relationships – Graphical Illustration

Model Validation

20.3 Nominal Independent Variables In many real-life situations one or more independent variables are nominal. Including nominal variables in a regression analysis model is done via indicator (or dummy) variables. An indicator variable (I) can assume one out of two values, “zero” or “one”. I= 1 if data were collected before if data were collected after if the temperature was below 50 o 0 if the temperature was 50 o or more 1 if a degree earned is in Finance 0 if a degree earned is not in Finance

Nominal Independent Variables; Example: Auction Car Price (II) Example revised (Xm18-02a)Xm18-02a –Recall: A car dealer wants to predict the auction price of a car. –The dealer believes now that odometer reading and the car color are variables that affect a car’s price. –Three color categories are considered: White Silver Other colors Note: Color is a nominal variable.

Example revised (Xm18-02b)Xm18-02b I 1 = 1 if the color is white 0 if the color is not white I 2 = 1 if the color is silver 0 if the color is not silver The category “Other colors” is defined by: I 1 = 0; I 2 = 0 Nominal Independent Variables; Example: Auction Car Price (II)

Note: To represent the situation of three possible colors we need only two indicator variables. Conclusion: To represent a nominal variable with m possible categories, we must create m-1 indicator variables. How Many Indicator Variables?

Solution –the proposed model is y =  0 +  1 (Odometer) +  2 I 1 +  3 I 2 +  –The data White car Other color Silver color Nominal Independent Variables; Example: Auction Car Price

Odometer Price Price = (Odometer) (0) (1) Price = (Odometer) (1) (0) Price = (Odometer) (0) + 148(0) (Odometer) (Odometer) (Odometer) The equation for an “other color” car. The equation for a white color car. The equation for a silver color car. From JMP (Xm18-02b) we get the regression equationXm18-02b PRICE = (Odometer)+90.48(I-1) (I-2) Example: Auction Car Price The Regression Equation

From JMP we get the regression equation PRICE = (Odometer)+90.48(I-1) (I-2) A white car sells, on the average, for $90.48 more than a car of the “Other color” category A silver color car sells, on the average, for $ more than a car of the “Other color” category. For one additional mile the auction price decreases by 5.55 cents. Example: Auction Car Price The Regression Equation

There is insufficient evidence to infer that a white color car and a car of “other color” sell for a different auction price. There is sufficient evidence to infer that a silver color car sells for a larger price than a car of the “other color” category. Xm18-02b Example: Auction Car Price The Regression Equation

Recall: The Dean wanted to evaluate applications for the MBA program by predicting future performance of the applicants. The following three predictors were suggested: –Undergraduate GPA –GMAT score –Years of work experience It is now believed that the type of undergraduate degree should be included in the model. Nominal Independent Variables; Example: MBA Program Admission (MBA II)MBA II Note: The undergraduate degree is nominal data.

Nominal Independent Variables; Example: MBA Program Admission (II) I 1 = 1 if B.A. 0 otherwise I 2 = 1 if B.B.A 0 otherwise The category “Other group” is defined by: I 1 = 0; I 2 = 0; I 3 = 0 I 3 = 1 if B.Sc. or B.Eng. 0 otherwise

MBA Program Admission (II)

20.4 Applications in Human Resources Management: Pay-Equity Pay-equity can be handled in two different forms: –Equal pay for equal work –Equal pay for work of equal value. Regression analysis is extensively employed in cases of equal pay for equal work.

Human Resources Management: Pay-Equity Example 20.3 (Xm20-03)Xm20-03 –Is there sex discrimination against female managers in a large firm? –A random sample of 100 managers was selected and data were collected as follows: Annual salary Years of education Years of experience Gender

Solution –Construct the following multiple regression model: y =  0 +  1 Education +  2 Experience +  3 Gender +  –Note the nature of the variables: Education – Interval Experience – Interval Gender – Nominal (Gender = 1 if male; =0 otherwise). Human Resources Management: Pay-Equity

Solution – Continued (Xm20-03)Xm20-03 Human Resources Management: Pay-Equity Analysis and Interpretation The model fits the data quite well. The model is very useful. Experience is a variable strongly related to salary. There is no evidence of sex discrimination.

Solution – Continued (Xm20-03)Xm20-03 Human Resources Management: Pay-Equity Analysis and Interpretation Further studying the data we find: Average experience (years) for women is 12. Average experience (years) for men is 17 Average salary for female manager is $76,189 Average salary for male manager is $97,832

20.5 Stepwise Regression Purposes of stepwise regression: –Find strong predictors (stepwise forward) –Eliminate weak predictors (stepwise backward) –Prevent highly collinear groups of predictors from collectively entering the model (they degrade pvals) The workings of stepwise regression: –Predictors are entered/removed one at a time –Stepwise forward: given a current model, enter the predictor that increases R 2 the most,…. if pval<0.25 –Stepwise backward: …, remove the predictor that decreases R^2 the least,…. if pval>0.10

Stepwise Regression in JMP “Analyze” ! “Fit Model” response ! Y, predictors ! “add” pull-down menu top right: “Standard Least Squares” ! “Stepwise”; “Run Model” Stepwise Fit window: updates automagically –manual stepwise: check boxes in “Entered” column, to enter and remove predictors –stepwise forward/backward: “Step” to enter/remove one predictor, “Go” for automatic sequential selection –“Direction” pull-down for “forward” (default), “backward”, “mixed” selection strategies

Comments on Stepwise Regression Stepwise regression might not find the best model; you might find better models with manual search, where better means: fewer predictors & larger R 2. Forward search stops when there is no predictor with pval<0.25 (can be changed in JMP). Backward search stops when there is no predictor with pval>0.10 (can be changed in JMP). Often one wants to search only models with certain predictors included. Use “Lock” column in JMP.

Practice Problems 20.6,20.8,20.22,20.24