Presentation is loading. Please wait.

Presentation is loading. Please wait.

Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.

Similar presentations


Presentation on theme: "Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables."— Presentation transcript:

1 Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables

2 Interaction variables Squared and higher polynomial terms for curvature Dummy variables for categorical variables.

3 Interaction Example The number of car accidents on a stretch of highway seems to be related to the number of vehicles that travel over it and the speed at which they are traveling. A city alderman has decided to ask the county sheriff to provide him with statistics covering the last few years with the intention of examining these data statistically so that she can introduce new speed laws that will reduce traffic accidents. accidents.JMP contains data for different time periods on the number of cars passing along the stretch of road, the average speed of the cars and the number of accidents during the time period. It seems plausible that the effect of increases in speed on accidents is greater when there are more cars on the road.

4 Interaction Interaction is a three-variable concept. One of these is the response variable (Y) and the other two are explanatory variables (X 1 and X 2 ). There is an interaction between X 1 and X 2 if the impact of an increase in X 2 on Y depends on the level of X 1. To incorporate interaction in multiple regression model, we add the explanatory variable. There is evidence of an interaction if the coefficient on is significant (t-test has p-value <.05).

5 Interaction variables in JMP To add an interaction variable in Fit Model in JMP, add the usual explanatory variables first, then highlight in the Select Columns box and in the Construct Model Effects Box. Then click Cross in the Construct Model Effects Box. JMP creates the explanatory variable

6 Interactions in Accident Data Increases in speed have a worse impact on number of accidents when there are a large number of cars on the road than when there are a small number of cars on the road.

7 Notes on Interactions The need for interactions is not easily spotted with residual plots. It is best to try including an interaction term and see if it is significant. To understand better the multiple regression relationship when there is an interaction, it is useful to make an Interaction Plot. After Fit Model, click red triangle next to Response, click Factor Profiling and then click Interaction Plots.

8 Plot on left displays E(Accidents|Cars, Speed=56.6), E(Accidents|Cars,Speed=62.5) as a function of Cars. Plot on right displays E(Accidents|Cars=12.6), E(Accidents| Cars,Speed=7) as a function of Speed. We can see that the impact of speed on Accidents depends critically on the number of cars on the road.

9 Aptitude-Treatment Interactions There is a large literature in education and psychology that investigates aptitude-treatments – interactions between instructional strategies (more generally treatments) and aptitudes (more generally characteristics) of individuals. There is evidence that in general highly structured instructional strategies (e.g., high level of external control, well-defined sequences/components) seem to help students with low ability but hinder those with high ability, relative to low-structure instructional strategies.

10 Examples of Interesting Interactions Y=Measure of psychological distress, X 1 =# of life events in last three years that are personal disruptions (e.g., death in the family), X 2 =socioeconomic status. Coefficient on X 1 is positive, X 2 is negative and is negative – subjects who possess greater resources in the form of higher SES are better able to withstand the mental stress of potentially traumatic life events. Y=Measure of depression, X 1 =Education, X 2 =Age. Coefficient on X 1 is negative,

11 Fast Food Locations An analyst working for a fast food chain is asked to construct a multiple regression model to identify new locations that are likely to be profitable. The analyst has for a sample of 25 locations the annual gross revenue of the restaurant (y), the mean annual household income and the mean age of children in the area. Data in fastfoodchain.jmp

12

13 Squared Terms for Curvature To capture a quadratic relationship between X 1 and Y, we add as an explanatory variable. To do this in JMP, add X 1 to the model, then highlight X 1 in the Select Columns box and highlight X 1 in the Construct Model Effects box and click Cross.

14

15 Notes on Squared Terms for Curvature If t-test for squared term has p-value <.05, indicating that there is curvature, then we keep the linear term in the model regardless of its p-value. Coefficients in model with squared terms for curvature are tricky to interpret. If we have explanatory variables and in the model, then we can’t keep fixed and change As with interactions, to better understand the multiple regression relationship when there is a squared term for curvature, a plot is useful. After Fit Model, click red triangle next to Response, click Factor Profiling and click Profiler. JMP shows a plot for each explanatory variable of how the mean of Y changes as the explanatory variable is increased and the other explanatory variables are held fixed at their mean value.

16 Left hand plot is a plot of Mean Revenue for different levels of income when Age is held fixed at its mean value of 8.392. The 1208.257+/-32.825 is a confidence interval for the mean response at income=24.2, Age=8.392.

17 Regression Model for Fast Food Chain Data Interactions and polynomial terms can be combined in a multiple regression model. Strong evidence of a quadratic relationship between revenue and age, revenue and income. Moderate evidence of an interaction between age and income.

18 Categorical variables Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). How to use categorical variables as explanatory variables in regression analysis: –If the variable has two categories (e.g., sex (male/female), rain or not rain, snow or not snow), we have defined a variable that equals 1 for one of the categories and 0 for the other category.

19 Predicting Emergency Calls to the AAA Club Rain forecast=1 if rain is in forecast, 0 if not Snow forecast=1 if snow is in forecast, 0 if not Weekday=1 if weekday, 0 if not

20 Comparing Toy Factory Managers An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (A, B and C). Data in toyfactorymanager.JMP. How do the managers compare?

21 Marginal Comparison Marginal comparison could be misleading. We know that large production runs with more toys take longer than small runs with few toys. How can we be sure that Manager c has not simply been supervising very small production runs? Solution: Run a multiple regression in which we include size of the production run as an explanatory variable along with manager, in order to control for size of the production run.

22 Including Categorical Variable in Multiple Regression: Wrong Approach We could assign codes to the managers, e.g., Manager A = 0, Manager B=1, Manager C=2. This model says that for the same run size, Manager B is 31 minutes faster than Manager A and Manager C is 31 minutes faster than Manager B. This model restricts the difference between Manager A and B to be the same as the difference between Manager B and C – we have no reason to do this. If we use a different coding for Manager, we get different results, e.g., Manager B=0, Manager A=1, Manager C=2 Manager A 5 min. faster than Manager B

23 Including Categorical Variable in Multiple Regression: Right Approach Create an indicator (dummy) variable for each category. Manager[a] = 1 if Manager is A 0 if Manager is not A Manager[b] = 1 if Manager is B 0 if Manager is not B Manager[c] = 1 if Manager is C 0 if Manager is not C

24 For a run size of length 100, the estimated time for run of Managers A, B and C ar For the same run size, Manager A is estimated to be on average 38.41-(-14.65)=53.06 minutes slower than Manager B and 38.41-(-23.76)=62.17 minutes slower than Manager C.

25 Categorical Variables in Multiple Regression in JMP Make sure that the categorical variable is coded as nominal. To change coding, right clock on column of variable, click Column Info and change Modeling Type to nominal. Use Fit Model and include the categorical variable into the multiple regression. After Fit Model, click red triangle next to Response and click Estimates, then Expanded Estimates (the initial output in JMP uses a different, more confusing coding of the dummy variables).

26 Equivalence of Using One 0/1 Dummy Variable and Two 0/1 Dummy Variables when Categorical Variable has two categories Two models give equivalent predictions. The difference in mean number of Emergency calls between a day with a rain forecast and a day without a rain forecast holding all other variables fixed is 429.71=214.85-(-214.85).


Download ppt "Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables."

Similar presentations


Ads by Google