Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables.

Slides:



Advertisements
Similar presentations
Class 21: Tues., Nov. 23 Today: Multicollinearity, One-way analysis of variance Schedule: –Tues., Nov. 30 th – Review, Homework 8 due –Thurs., Dec. 2 nd.
Advertisements

Multiple Regression. Multiple regression Typically, we want to use more than a single predictor (independent variable) to make predictions Regression.
Class 18 – Thursday, Nov. 11 Omitted Variables Bias
Stat 112: Lecture 7 Notes Homework 2: Due next Thursday The Multiple Linear Regression model (Chapter 4.1) Inferences from multiple regression analysis.
Lecture 28 Categorical variables: –Review of slides from lecture 27 (reprint of lecture 27 categorical variables slides with typos corrected) –Practice.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Chapter 13 Multiple Regression
Stat 112: Lecture 17 Notes Chapter 6.8: Assessing the Assumption that the Disturbances are Independent Chapter 7.1: Using and Interpreting Indicator Variables.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Lecture 1 Outline: Tue, Jan 13 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Class 15: Tuesday, Nov. 2 Multiple Regression (Chapter 11, Moore and McCabe).
Stat 112: Lecture 10 Notes Fitting Curvilinear Relationships –Polynomial Regression (Ch ) –Transformations (Ch ) Schedule: –Homework.
Class 19: Tuesday, Nov. 16 Specially Constructed Explanatory Variables.
Stat 512 – Lecture 18 Multiple Regression (Ch. 11)
Regresi dan Rancangan Faktorial Pertemuan 23 Matakuliah: I0174 – Analisis Regresi Tahun: Ganjil 2007/2008.
Chapter 12 Multiple Regression
Stat 112: Lecture 22 Notes Chapter 9.1: One-way Analysis of Variance. Chapter 9.3: Two-way Analysis of Variance Homework 6 is due on Friday.
January 6, afternoon session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
Stat 112: Lecture 23 Notes Chapter 9.3: Two-way Analysis of Variance Schedule: –Homework 6 is due on Friday. –Quiz 4 is next Tuesday. –Final homework assignment.
Multiple Regression Models. The Multiple Regression Model The relationship between one dependent & two or more independent variables is a linear function.
Lecture 23: Tues., April 6 Interpretation of regression coefficients (handout) Inference for multiple regression.
Class 22: Tuesday, Nov. 30th Today: One-way analysis of variance I will you tonight or tomorrow morning with comments on your project. Schedule:
Lecture 25 Regression diagnostics for the multiple linear regression model Dealing with influential observations for multiple linear regression Interaction.
© 2003 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression.
Stat 112: Lecture 19 Notes Chapter 7.2: Interaction Variables Thursday: Paragraph on Project Due.
Lecture 26 Omitted Variable Bias formula revisited Specially constructed variables –Interaction variables –Polynomial terms for curvature –Dummy variables.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Lecture 27 Polynomial Terms for Curvature Categorical Variables.
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Stat 112: Lecture 20 Notes Chapter 7.2: Interaction Variables. Chapter 8: Model Building. I will Homework 6 by Friday. It will be due on Friday,
Stat 217 – Day 25 Regression. Last Time - ANOVA When?  Comparing 2 or means (one categorical and one quantitative variable) Research question  Null.
Stat 112: Lecture 18 Notes Chapter 7.1: Using and Interpreting Indicator Variables. Visualizing polynomial regressions in multiple regression Review Problem.
Stat 112: Lecture 13 Notes Finish Chapter 5: –Review Predictions in Log-Log Transformation. –Polynomials and Transformations in Multiple Regression Start.
Stat Notes 5 p-values for one-sided tests Caution about forecasting outside the range of the explanatory variable (Chapter 3.7.2) Fitting a linear.
Lecture 22 – Thurs., Nov. 25 Nominal explanatory variables (Chapter 9.3) Inference for multiple regression (Chapter )
Class 24: Tues., Dec. 7th Today: Two-way analysis of variance Thursday: Design of Experiments Schedule: –Thurs., Dec. 9 th – Final class –Mon., Dec. 13.
Lecture 20 – Tues., Nov. 18th Multiple Regression: –Case Studies: Chapter 9.1 –Regression Coefficients in the Multiple Linear Regression Model: Chapter.
Data Analysis Statistics. Levels of Measurement Nominal – Categorical; no implied rankings among the categories. Also includes written observations and.
Lecture 21 – Thurs., Nov. 20 Review of Interpreting Coefficients and Prediction in Multiple Regression Strategy for Data Analysis and Graphics (Chapters.
Stat 112: Lecture 21 Notes Model Building (Brief Discussion) Chapter 9.1: One way Analysis of Variance. Homework 6 is due Friday, Dec. 1 st. I will be.
Class 11: Thurs., Oct. 14 Finish transformations Example Regression Analysis Next Tuesday: Review for Midterm (I will take questions and go over practice.
Stat 112: Lecture 16 Notes Finish Chapter 6: –Influential Points for Multiple Regression (Section 6.7) –Assessing the Independence Assumptions and Remedies.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
Copyright © 2014, 2011 Pearson Education, Inc. 1 Chapter 25 Categorical Explanatory Variables.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.2 Estimating Differences.
Modeling Possibilities
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
The Use of Dummy Variables. In the examples so far the independent variables are continuous numerical variables. Suppose that some of the independent.
Stat 112 Notes 17 Time Series and Assessing the Assumption that the Disturbances Are Independent (Chapter 6.8) Using and Interpreting Indicator Variables.
Stat 112 Notes 20 Today: –Interaction Variables (Chapter ) –Interpreting slope when Y is logged but not X –Model Building (Chapter 8)
Lab 5 instruction.  a collection of statistical methods to compare several groups according to their means on a quantitative response variable  Two-Way.
Regression Part II One-factor ANOVA Another dummy variable coding scheme Contrasts Multiple comparisons Interactions.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
ANALYSIS OF VARIANCE By ADETORO Gbemisola Wuraola.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 13 Multiple Regression Section 13.3 Using Multiple Regression to Make Inferences.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Seven.
Data Analysis.
Categorical Independent Variables STA302 Fall 2013.
Copyright © 2013, 2009, and 2007, Pearson Education, Inc. Chapter 14 Comparing Groups: Analysis of Variance Methods Section 14.3 Two-Way ANOVA.
Stat 112 Notes 10 Today: –Fitting Curvilinear Relationships (Chapter 5) Homework 3 due Thursday.
Stat 112 Notes 23. Quiz 4 Info 4 double sided sheets of notes Covers interactions, models with categorical variables and interactions, one way analysis.
Soc 3306a Lecture 7: Inference and Hypothesis Testing T-tests and ANOVA.
Copyright c 2001 The McGraw-Hill Companies, Inc.1 Chapter 11 Testing for Differences Differences betweens groups or categories of the independent variable.
Stat 112 Notes 6 Today: –Chapters 4.2 (Inferences from a Multiple Regression Analysis)
Lecture 1 Outline: Thu, Sep 4 Introduction/Syllabus Course outline Some useful guidelines Case studies and
Beginners statistics Assoc Prof Terry Haines. 5 simple steps 1.Understand the type of measurement you are dealing with 2.Understand the type of question.
Soc 3306a: ANOVA and Regression Models
Soc 3306a Lecture 11: Multivariate 4
Presentation transcript:

Class 20: Thurs., Nov. 18 Specially Constructed Explanatory Variables –Dummy variables for categorical variables –Interactions involving dummy variables I will you HW8 tomorrow. It will be due Tuesday, Nov. 30 th. Schedule: –Tuesday, Nov. 23 rd : One-way ANOVA –Tuesday, Nov. 30 th : Review –Thursday, Dec. 2 nd : Midterm II –Tuesday, Dec. 7 th, Thursday, Dec. 9 th : Two-way ANOVA

Categorical variables Categorical (nominal) variables: Variables that define group membership, e.g., sex (male/female), color (blue/green/red), county (Bucks County, Chester County, Delaware County, Philadelphia County). How to use categorical variables as explanatory variables in regression analysis: –If the variable has two categories (e.g., sex (male/female), rain or not rain, snow or not snow), we have defined a variable that equals 1 for one of the categories and 0 for the other category.

Predicting Emergency Calls to the AAA Club Rain forecast=1 if rain is in forecast, 0 if not Snow forecast=1 if snow is in forecast, 0 if not Weekday=1 if weekday, 0 if not

Comparing Toy Factory Managers An analysis has shown that the time required to complete a production run in a toy factory increases with the number of toys produced. Data were collected for the time required to process 20 randomly selected production runs as supervised by three managers (A, B and C). Data in toyfactorymanager.JMP. How do the managers compare?

Marginal Comparison Marginal comparison could be misleading. We know that large production runs with more toys take longer than small runs with few toys.

How can we be sure that Manager c’s advantage is not due to simply having supervised smaller production runs? Solution: Run a multiple regression in which we include size of the production run as an explanatory variable, along with manager, in order to control for size of the production run.

Including Categorical Variable in Multiple Regression: Wrong Approach We could assign codes to the managers, e.g., Manager A = 0, Manager B=1, Manager C=2. This model says that for the same run size, Manager B is 31 minutes faster than Manager A and Manager C is 31 minutes faster than Manager B. This model restricts the difference between Manager A and B to be the same as the difference between Manager B and C – we have no reason to do this. If we use a different coding for Manager, we get different results, e.g., Manager B=0, Manager A=1, Manager C=2 Manager A 5 min. faster than Manager B

Including Categorical Variable in Multiple Regression: Right Approach Create an indicator (dummy) variable for each category. Manager[a] = 1 if Manager is A 0 if Manager is not A Manager[b] = 1 if Manager is B 0 if Manager is not B Manager[c] = 1 if Manager is C 0 if Manager is not C

For a run size of length 100, the estimated time for run of Managers A, B and C are For the same run size, Manager A is estimated to be on average (-14.65)=53.06 minutes slower than Manager B and (-23.76)=62.17 minutes slower than Manager C.

Categorical Variables in Multiple Regression in JMP Make sure that the categorical variable is coded as nominal. To change coding, right clock on column of variable, click Column Info and change Modeling Type to nominal. Use Fit Model and include the categorical variable into the multiple regression. After Fit Model, click red triangle next to Response and click Estimates, then Expanded Estimates (the initial output in JMP uses a different, more confusing, coding of the dummy variables).

The coefficients on Manager A, Manager B and Manager C add up to zero. So the positive coefficient on Manager A means that Manager A is slower than the average (of Manager A, B and C) and the negative coefficients on Manager B and Manager C mean that these two managers are faster than the average (of Manager A, B and C). The coefficients on the indicator variables will always add up to zero in JMP. Caution: Different software uses different coding for indicator variables. It doesn’t change the predictions from the multiple regression but does change the interpretation.

Equivalence of Using One 0/1 Dummy Variable and Two 0/1 Dummy Variables when Categorical Variable has two categories Two models give equivalent predictions. The difference in mean number of Emergency calls between a day with a rain forecast and a day without a rain forecast holding all other variables fixed is = ( ).

Effect Tests Effect test for manager: a : not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed. vs. H a : not all manager[a],manager[b],manager[c] equal. Null hypothesis is that all managers are the same (in terms of mean run time) when run size is held fixed, alternative hypothesis is that not all managers are the same (in terms of mean run time) when run size is held fixed. p-value for Effect Test < Strong evidence that not all managers are the same when run size is held fixed.p-value for Effect Test < Strong evidence that not all managers are the same when run size is held fixed. Note: equivalent toNote: equivalent to because JMP has constraint that manager[a]+manager[b]+manager[c]=0. Effect test for Run size tests null hypothesis that Run Size coefficient is 0 versus alternative hypothesis that Run size coefficient isn’t zero. Same p-value as t-test.

Effect tests shows that managers are not equal. For the same run size, Manager C is best (lowest mean run time), followed by Manager B and then Manager C. The above model assumes no interaction between Manager and run size – the difference between the mean run time of the managers is the same for all run sizes.

Interaction Model

Interaction Model in JMP To add interactions involving categorical variables in JMP, follow the same procedure as with two continuous variables. Run Fit Model in JMP, add the usual explanatory variables first, then highlight one of the variables in the interaction in the Construct Model Effects box and highlight the other variable in the interaction in the Columns box and then click Cross in the Construct Model Effects box.

Interaction Model Interaction between run size and Manager: The effect on mean run time of increasing run size by one is different for different managers. Effect Test for Interaction: Manager*Run Size Effect test tests null hypothesis that there is no interaction (effect on mean run time of increasing run size is same for all managers) vs. alternative hypothesis that there is an interaction between run size and managers. p-value = Evidence that there is an interaction.

The runs supervised by Manager A appear abnormally time consuming. Manager b has higher initial fixed setup costs than Manager c ( > ) but has lower per unit production time (0.136<0.259).

Interaction Profile Plot Lower left hand plot shows mean time for run vs. run size for the three managers a, b and c.

Interactions Involving Categorical Variables: General Approach First fit model with an interaction between categorical explanatory variable and continuous explanatory variable. Use effect test on interaction to see if there is evidence of an interaction. If there is evidence of an interaction (p-value <0.05 for effect test), use interaction model. If there is not strong evidence of an interaction (p-value >0.05 for effect test), use model without interactions.

Example: A Sex Discrimination Lawsuit Did a bank discriminatorily pay higher starting salaries to men than to women. Harris Trust and Savings Bank was sued by a group of female employees who accused the bank of paying lower starting salries to women. The data in harrisbank.JMP are the starting salaries for all 32 male and all 61 female skilled, entry-level clerical employees hired by the bank between 1969 and 1977, as well as the education levels and sex of the employees.

No evidence of an interaction between Sex and Education. Fit model without interactions.

Discrimination Case Regression Results Strong evidence that there is a difference in the mean starting salaries of women and men of the same education level. Estimated difference: Men have =$ higher mean starting salaries than women of the same education level. 95% confidence interval for mean difference = (2*$214.55,2*$477.25)=($429.10,$854.50). Bank’s defense: Omitted variable bias. Variables such as Seniority, Age, Experience also need to be controlled for.