Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.

Slides:



Advertisements
Similar presentations
Class 18 – Thursday, Nov. 11 Omitted Variables Bias
Advertisements

Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Inference for Regression
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Chapter 13 Multiple Regression
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
BA 555 Practical Business Analysis
Lecture 26 Model Building (Chapters ) HW6 due Wednesday, April 23 rd by 5 p.m. Problem 3(d): Use JMP to calculate the prediction interval rather.
© 2000 Prentice-Hall, Inc. Chap Multiple Regression Models.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Class 6: Tuesday, Sep. 28 Section 2.4. Checking the assumptions of the simple linear regression model: –Residual plots –Normal quantile plots Outliers.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
Treatment Effects: What works for Whom? Spyros Konstantopoulos Michigan State University.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 24: Thurs., April 8th
Chapter 11 Multiple Regression.
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Chapter 12 Multiple Regression and Model Building.
Forecasting Techniques: Single Equation Regressions Su, Chapter 10, section III.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 15 Today: –Outliers and influential points. Homework 4 due on Thursday.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Economics 173 Business Statistics Lecture 22 Fall, 2001© Professor J. Petry
Outline When X’s are Dummy variables –EXAMPLE 1: USED CARS –EXAMPLE 2: RESTAURANT LOCATION Modeling a quadratic relationship –Restaurant Example.
1 Prices of Antique Clocks Antique clocks are sold at auction. We wish to investigate the relationship between the age of the clock and the auction price.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Copyright © 2009 Cengage Learning 18.1 Chapter 20 Model Building.
1 Quadratic Model In order to account for curvature in the relationship between an explanatory and a response variable, one often adds the square of the.
Categorical Independent Variables STA302 Fall 2013.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Business Statistics: A Decision-Making Approach, 6e © 2005 Prentice- Hall, Inc. Chap 14-1 Business Statistics: A Decision-Making Approach 6 th Edition.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Stat 112 Notes 14 Assessing the assumptions of the multiple regression model and remedies when assumptions are not met (Chapter 6).
Multiple Regression Learning Objectives n Explain the Linear Multiple Regression Model n Interpret Linear Multiple Regression Computer Output n Test.
1 Response Surface A Response surface model is a special type of multiple regression model with: Explanatory variables Interaction variables Squared variables.
Statistics for Business and Economics Module 2: Regression and time series analysis Spring 2010 Lecture 6: Multiple Regression Model Building Priyantha.
26134 Business Statistics Week 4 Tutorial Simple Linear Regression Key concepts in this tutorial are listed below 1. Detecting.
Transportation Planning Asian Institute of Technology
CHAPTER 15: THE NUTS AND BOLTS OF USING STATISTICS.
Chapter 14 Introduction to Multiple Regression
Applied Biostatistics: Lecture 2
Multiple Regression Analysis and Model Building
Rainfall Example The data set contains cord yield (bushes per acre) and rainfall (inches) in six US corn-producing states (Iowa, Nebraska, Illinois, Indiana,
Indicator Variables Response: Highway MPG
Presentation transcript:

Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables

Interaction variables Squared and higher polynomial terms for curvature Dummy variables for categorical variables.

Interaction Example The number of car accidents on a stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. A city alderman has decided to ask the county sheriff to provide him with statistics covering the last few years with the intention of examining these data statistically so that she can introduce new speed laws that will reduce traffic accidents. accidents.JMP contains data for different time periods on the number of cars passing along the stretch of road, the average speed of the cars and the number of accidents during the time period. It seems plausible that the effect of increases in speed on accidents is greater when there are more cars on the road.

Interaction Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X 1 and X 2 ). There is an interaction between X 1 and X 2 if the impact of an increase in X 2 on Y depends on the level of X 1. To incorporate interaction in multiple regression model, we add the explanatory variable. There is evidence of an interaction if the coefficient on is significant (t-test has p-value <.05).

Interaction variables in JMP To add an interaction variable in Fit Model in JMP, add the usual explanatory variables first, then highlight in the Select Columns box and in the Construct Model Effects Box. Then click Cross in the Construct Model Effects Box. JMP creates the explanatory variable

Interactions in Accident Data Increases in speed have a worse impact on number of accidents when there are a large number of cars on the road than when there are a small number of cars on the road.

Notes on Interactions The need for interactions is not easily spotted with residual plots. It is best to try including an interaction term and see if it is significant. To understand better the multiple regression relationship when there is an interaction, it is useful to make an Interaction Plot. After Fit Model, click red triangle next to Response, click Factor Profiling and then click Interaction Plots.

Plot on left displays E(Accidents|Cars, Speed=56.6), E(Accidents|Cars,Speed=62.5) as a function of Cars. Plot on right displays E(Accidents|Cars=12.6), E(Accidents| Cars,Speed=7) as a function of Speed. We can see that the impact of speed on Accidents depends critically on the number of cars on the road.

Aptitude-Treatment Interactions There is a large literature in education and psychology that investigates aptitude-treatments – interactions between instructional strategies (more generally treatments) and aptitudes (more generally characteristics) of individuals. There is evidence that in general highly structured instructional strategies (e.g., high level of external control, well-defined sequences/components) seem to help students with low ability but hinder those with high ability, relative to low-structure instructional strategies.

Examples of Interesting Interactions Y=Measure of psychological distress, X 1 =# of life events in last three years that are personal disruptions (e.g., death in the family), X 2 =socioeconomic status. Coefficient on X 1 is positive, X 2 is negative and is negative – subjects who possess greater resources in the form of higher SES are better able to withstand the mental stress of potentially traumatic life events. Y=Measure of depression, X 1 =Education, X 2 =Age. Coefficient on X 1 is negative,

Fast Food Locations An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

Squared Terms for Curvature To capture a quadratic relationship between X 1 and Y, we add as an explanatory variable. To do this in JMP, add X 1 to the model, then highlight X 1 in the Select Columns box and highlight X 1 in the Construct Model Effects box and click Cross.

Notes on Squared Terms for Curvature If t-test for squared term has p-value <.05, indicating that there is curvature, then we keep the linear term in the model regardless of its p-value. Coefficients in model with squared terms for curvature are tricky to interpret. If we have explanatory variables and in the model, then we can’t keep fixed and change As with interactions, to better understand the multiple regression relationship when there is a squared term for curvature, a plot is useful. After Fit Model, click red triangle next to Response, click Factor Profiling and click Profiler. JMP shows a plot for each explanatory variable of how the mean of Y changes as the explanatory variable is increased and the other explanatory variables are held fixed at their mean value.

Left hand plot is a plot of Mean Revenue for different levels of income when Age is held fixed at its mean value of The / is a confidence interval for the mean response at income=24.2, Age=8.392.

Regression Model for Fast Food Chain Data Interactions and polynomial terms can be combined in a multiple regression model. Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.

Categorical variables Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). How to use categorical variables as explanatory variables in regression analysis: –If the variable has two categories (e.g., sex (male/female), rain or not rain, snow or not snow), we have defined a variable that equals 1 for one of the categories and 0 for the other category.

Predicting Emergency Calls to the AAA Club Rain forecast=1 if rain is in forecast, 0 if not Snow forecast=1 if snow is in forecast, 0 if not Weekday=1 if weekday, 0 if not

Comparing Toy Factory Managers An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (A, B and C). Data in toyfactorymanager.JMP. How do the managers compare?

Marginal Comparison Marginal comparison could be misleading. We know that large production runs with more toys take longer than small runs with few toys. How can we be sure that Manager c has not simply been supervising very small production runs? Solution: Run a multiple regression in which we include size of the production run as an explanatory variable along with manager, in order to control for size of the production run.

Including Categorical Variable in Multiple Regression: Wrong Approach We could assign codes to the managers, e.g., Manager A = 0, Manager B=1, Manager C=2. This model says that for the same run size, Manager B is 31 minutes faster than Manager A and Manager C is 31 minutes faster than Manager B. This model restricts the difference between Manager A and B to be the same as the difference between Manager B and C – we have no reason to do this. If we use a different coding for Manager, we get different results, e.g., Manager B=0, Manager A=1, Manager C=2 Manager A 5 min. faster than Manager B

Including Categorical Variable in Multiple Regression: Right Approach Create an indicator (dummy) variable for each category. Manager[a] = 1 if Manager is A 0 if Manager is not A Manager[b] = 1 if Manager is B 0 if Manager is not B Manager[c] = 1 if Manager is C 0 if Manager is not C

For a run size of length 100, the estimated time for run of Managers A, B and C ar For the same run size, Manager A is estimated to be on average (-14.65)=53.06 minutes slower than Manager B and (-23.76)=62.17 minutes slower than Manager C.

Categorical Variables in Multiple Regression in JMP Make sure that the categorical variable is coded as nominal. To change coding, right clock on column of variable, click Column Info and change Modeling Type to nominal. Use Fit Model and include the categorical variable into the multiple regression. After Fit Model, click red triangle next to Response and click Estimates, then Expanded Estimates (the initial output in JMP uses a different, more confusing coding of the dummy variables).

Equivalence of Using One 0/1 Dummy Variable and Two 0/1 Dummy Variables when Categorical Variable has two categories Two models give equivalent predictions. The difference in mean number of Emergency calls between a day with a rain forecast and a day without a rain forecast holding all other variables fixed is = ( ).