QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.

QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project
QM222 Fall 2016 Section D1

Multicollinearity QM222 Fall 2016 Section D1

Multicollinearity Recall that the interpretation of a coefficient in a multiple regression is: The effect on Y of X changing by 1 if the other variables stay the same And the t-test tests the null: Could this coefficient be zero? Sometimes you run a regression of on two very very correlated variables like #toothbrushes sold and amount of toothpaste sold in a country in a year. The t-tests will both be very low. Because each coefficient could be zero and the regression would predict approximately the same thing. But if you drop one of them, the other would become highly significant. E.g. GDP and Unemployment QM222 Fall 2016 Section D1

What to do if you find that variables that you believe should be significant are not
If several variables are really measuring the same concept, drop one of them if its |t-stat| is less than ONE. If you drop a variable with a |t-stat| <1, the adjusted R- squared increases. Which do you drop? The one with the lowest |t|. In other words, let the computer tell you which of the two variables you need. If you are right, the other variable will become more significant. NEVER DROP MORE THAN ONE VARIABLE AT A TIME. If you do, you might drop BOTH highly correlated variables. You can test if two (or more) insignificant variables together are significant but writing this after you run the regression: test varname1 varnname2 QM222 Fall 2016 Section D1

Making Regression Tables (see chapter 19)

Use Tables to report several regressions
Your different regressions will have different combinations of variables. Why present more than 1 regression? -To develop your ideas. -Or for different dependent variables (list in column title.) QM222 Fall 2016 Section D1

In footnotes, say which you included.
Include either t-stats or coefficient standard errors in parentheses directly below the coefficient. In footnotes, say which you included. QM222 Fall 2016 Section D1

Use asterisks to denote significance
Include the number of observations and at least the adjusted Rsq (and maybe RMSE aka SEE) Use asterisks to denote significance QM222 Fall 2016 Section D1

For any set of multiple dummies,
include in footnote what the excluded category is. (here, year1965) Note: If you are using i. for your dummies, Stata might use different reference categories for different regressions. QM222 Fall 2016 Section D1

What to do next on your project

Assignment 6 Ideally, have by Friday.
Post your current data set under Stata data set (if you can). Run additional multiple regressions. Specifically: Think hard about whether there are additional omitted variables (i.e. confounding factors) that you can measure that are likely to be biasing your key coefficient(s). If you can find data on them, add them into the regressions. (If you really cannot think of anything beyond what you have, just write that.) Identify at least one omitted variable that you cannot measure, reason out the sign of the omitted variable bias and explain here (Ass.6) in 1-3 sentences why and in what direction it will bias your key coefficient. QM222 Fall 2016 Section D1

Assignment 6 cont. If you have any numeric explanatory (X) variable, add a quadratic term in addition to your other variables to test if this nonlinear specification fits better. (If you are good at math and prefer to add a different nonlinear variable or to make your dependent variable non-linear, be my guest.) Explain here (Ass.6) what you learn from this result (1-3 sentences). Explain/show (e.g. with graph) what you learn from this. If you have a numeric explanatory (X) variable that is very skewed, think about whether top-coding or taking the log of that variables is appropriate instead. QM222 Fall 2016 Section D1

Another approach if you think the relationship between Y and X is really really nonlinear
You could try a set of dummy variables for different ranges of the variable. Even though it is a numerical variable. Only use this approach if you believe that the relationship between Y and X changes so much at every value that it can’t be estimated as a quadratic (or cubic etc.) Education sometimes is better as a set of dummies QM222 Fall 2016 Section D1

Not in Assignment 6 If you have a very skewed Y (dependent) variable
Try top-coding it (if you think that once it reaches a quite high level, it doesn’t matter how much higher it gets) Try changing it into an indicator variable Try estimating the median Y, replacing regress with qreg. QM222 Fall 2016 Section D1

Assignment 6 cont. Think about if you can and should use an interaction term. (This will be most useful if you think that different groups have different slopes.) Try at least one out in a multiple regression (with all your other variables as well). Copy and paste here (PS 6) Explain here what you learn from this interaction term result (1-3 sentences). QM222 Fall 2016 Section D1

Review interaction terms: If we think that the effect of X1 on Y depends on a different indicator variable X2 (e.g. scifi) The simplest way to model this in a regression is: Make an additional variable by multiplying X1 * X2 Make an additional variable by multiplying X1 * (1-X2) Recalling that (1-X2) is 1 if X2=0 Run a regression of Y on 3 variables: X1*X2 This is X1 for observations where X2 =1 X1*(1-X2) This is X1 for observations where X2 =0 X This is X2 QM222 Fall 2016 Section D1

Graph of this model SciFi movies Revenues Other movies Budget

Interaction terms with numeric variables (for those who dare): If we think that the effect of X1 on Y depends on a different numeric variable X2 (e.g. scifi) The simplest way to model this in a regression is: Make an additional variable by multiplying X1 * X2 Run a regression of Y on 3 variables: X1 X2 X1*X2 So Y = b0 + b1X1 + b2X2 + b3X1*X2 Note that dY/dX1= b1 + b3 X2 QM222 Fall 2016 Section D1

More generally, ask yourself if your regressions are really answering the question….
I like sophisticated approaches if you are using them correctly, if they are the most appropriate way to answer your question. QM222 Fall 2016 Section D1

Assignment 6 cont. Decide which is the best regression or set of regressions that you will use in your project. Update your Current Project Status including replacing/adding these regressions to Question 7. Also answer Question 9, which asks for the conclusions of your project, as it now stands. The more fully you answer Questions 7 and 9, the better feedback I can give you at your required meeting #2. QM222 Fall 2016 Section D1

More things to be careful about

When not to control for a variable
I want to know how education affects men and women’s belief that people should legalize pot. Grass: Indicator variable if believe marijuana should be legalized. If I run this regression Grass = b0 + b1 education + b2 income + b3 age… Then the coefficient on education tells us “If someone gets a lot of education but has the same income as another person, how does the education affect grass?” You might instead want to know “If someone gets a lot of education and as a result has higher income than another person as well as being better education, how does the education affect grass?” For this, run Grass = b0 + b1 education + + b3 age… QM222 Fall 2016 Section D1

Some misunderstandings on multiple dummies
You cannot use categorical variables as numbers. This includes: Marital status Work status Each coefficient is that variable versus the reference, excluded category Often, it makes sense to choose the reference category to be something you would most want to be the comparison. You can NEVER put in all categories into the regression Stata will omit one Example next page. QM222 Fall 2015 Section D1

When you have categorical variables
You cannot use categorical variables as numbers. This includes: Marital status Work status Etc. Don’t use LOTS of dummies for important variables (whose coefficients you want to understand). If you have a categorical variable with more than 10 categories, try to combine them into broader categories. It’s okay to use dummies when you have more than 10 (or so) categories as control variables that you don’t plan to discuss or report, you can include them. (e.g. occupation) QM222 Fall 2016 Section D1

Or you can see if the coefficients on 2 categories (of the same thing) are similar by using Stata lincom tests You can’t use i.marital, so first make actual indicagtor variables gen widow=marital==2 gen divorced=marital==3 gen separated= marital==4 gen nevermarried= marital==5 regress realrinc widow divorced separated nevermarried then after the regression test a linear combination: . lincom widow – nevermarried RESULTS: ( 1) widow - nevermarried = 0 realrinc | Coef. Std. Err t P>|t| [95% Conf. Interval] (1) | This result tells me that I could combine widow and nevermarried into a single category if I want since |t)<1. QM222 Fall 2015 Section D1

Some of you are getting close to writing your project up.
Do this ONLY after you meet with me. Everyone needs to meet with me after they think they have the results they want to present. If that is you….. Make an appointment. QM222 Fall 2016 Section D1

What should the paper look like?
Put yourself in the clients’ mind as they are reading it. Introduction: Motivate the paper. Address the client. Why is it interesting to them? Be sure to describe your data and data sources. Be sure to develop your ideas and have a logical train of thought. It needs to look professional and the English needs to be correct. After you finish the paper, make an executive summary that an executive can read INSTEAD of the paper. It will repeat ideas and sentences from the introduction and conclusion, for sure. It should be understood by someone who knows no statistics. MORE on this later. QM222 Fall 2016 Section D1

QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.

Similar presentations

Presentation on theme: "QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.

Similar presentations

Presentation on theme: "QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1."— Presentation transcript:

Similar presentations

About project

Feedback