Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
More on Regression Spring The Linear Relationship between African American Population & Black Legislators.
ELASTICITIES AND DOUBLE-LOGARITHMIC MODELS
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
SLOPE DUMMY VARIABLES 1 The scatter diagram shows the data for the 74 schools in Shanghai and the cost functions derived from a regression of COST on N.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: precision of the multiple regression coefficients Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: semilogarithmic models Original citation: Dougherty, C. (2012) EC220.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy variable classification with two categories Original citation:
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: two sets of dummy variables Original citation: Dougherty, C. (2012) EC220.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: the effects of changing the reference category Original citation: Dougherty,
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: dummy classification with more than two categories Original citation:
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Bivariate Relationships Testing associations (not causation!) Continuous data  Scatter plot (always use first!)  (Pearson) correlation coefficient.
1 TWO SETS OF DUMMY VARIABLES The explanatory variables in a regression model may include multiple sets of dummy variables. This sequence provides an example.
1 PROXY VARIABLES Suppose that a variable Y is hypothesized to depend on a set of explanatory variables X 2,..., X k as shown above, and suppose that for.
Returning to Consumption
© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model.
Serial Correlation and the Housing price function Aka “Autocorrelation”
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
. reg LGEARN S WEIGHT85 Source | SS df MS Number of obs = F( 2, 537) = Model |
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
POSSIBLE DIRECT MEASURES FOR ALLEVIATING MULTICOLLINEARITY 1 What can you do about multicollinearity if you encounter it? We will discuss some possible.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
SEMILOGARITHMIC MODELS 1 This sequence introduces the semilogarithmic model and shows how it may be applied to an earnings function. The dependent variable.
GRAPHING A RELATIONSHIP IN A MULTIPLE REGRESSION MODEL The output above shows the result of regressing EARNINGS, hourly earnings in dollars, on S, years.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 REPARAMETERIZATION OF A MODEL AND t TEST OF A LINEAR RESTRICTION Linear restrictions can also be tested using a t test. This involves the reparameterization.
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Introduction to Econometrics, 5th edition
Presentation transcript:

Special topics

Importance of a variable

Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max death | bd | wv | ac | fv | vs | v2 | ms | yv |

Death penalty example. reg death bd-yv, beta death | Coef. Std. Err. P>|t| Beta bd | wv | ac | fv | vs | v2 | ms | yv | _cons |

Importance of a variable Three potential answers  Theoretical importance  Level importance  Dispersion importance

Importance of a variable Theoretical importance  Theoretical importance = Regression coefficient (b)  To compare explanatory variables, put them on the same scale E.g., vary between 0 and 1

Importance of a variable Level importance: most important in particular times and places  E.g., did the economy or presidential popularity matter more in congressional races in 2006?  Level importance= b j * x j

Importance of a variable Dispersion importance: what explains the variance on the dependent variable  E.g., given that the GOP won in this particular election, why did some people vote for them and others against?  Dispersion importance = Standardized coefficients, or alternatively Regression coefficient times standard deviation of explanatory variable In bivariate case, correlation

Which to use? Depends on the research question  Usually theoretical importance  Sometimes level importance  Dispersion importance not usually relevant

Interactions

How would we test whether defendants are sentenced to death more frequently for killing white strangers then you would expect from the coefficients on white victim and on victim stranger?. tab wv vs | vs wv | 0 1 | Total | | 26 1 | | Total | | 100

Interactions. g wvXvs = wv* vs. reg death bd yv ac fv v2 ms wv vs wvXvs death | Coef. Std. Err. t P>|t| (omitted) wv | vs | wvXvs | _cons | To interpret interactions, substitute the appropriate values for each variable E.g., what’s the effect for.099wv+.108vs +.330wvXvs White, non-stranger:.099(1)+.108(0)+.330(1)X(0) =.099 White, stranger:.099(1)+.108(1)+.330(1)X(1) =.537 Black, non-stranger:.099(0)+.108(0)+.330(0)X(0) = comparison Black, stranger:.099(0)+.108(1)+.330(1)X(0) =.108

Interactions. tab wv vs, sum(death) Means, Standard Deviations and Frequencies of death | vs wv | 0 1 | Total | | | | | | | | | | | | Total | |.49 | | | | 100

Analyzing and presenting your data

Code your variables to vary between 0 and 1  Use replace or recode commands  Makes coefficients comparable To check your coding and for missing values, always run summary before running regress  sum y x1 x2 x3  reg y x1 x2 x3 Always present the bivariate results first, then the multiple regression (or other) results Never present raw Stata output and papers

Partial residual scatter plots

Importance of plotting your data Importance of controls How do you plot your data after you’ve adjusted it for control variables? Example: inferences about candidates in Mexico from faces

Greatest competence disparity: pairing 10 AB Gubernatorial race A more competent Who won?  A by 65%

Regression Source | SS df MS Number of obs = F( 3, 29) = 4.16 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = vote_a | Coef. Std. Err. t P>|t| [95% Conf. Interval] competent | incumbent | party_a | _cons | vote_a is vote share for Candidate A incumbent is a dummy variable for whether the party currently holds the office party_a is the vote share for the party of Candidate A in the previous election We want to create a scatter plot of vote_a by competent controlling for incumbent and party_a

Calculating partial residuals First run your regression with all the relevant variables. reg vote_a competent incumbent party_a To calculate the residual for the full model, use. predict e, res (This creates a new variable “e”, which equals to the residual.) Here, however, we want to generate the residual controlling only for some of the variables. To do this, we could manually predict vote_a based only on incumbent and party_a:. g y_hat = 0*.167+ incumbent* party_a*.212 We can then generate the partial residual. g partial_e = vote_a – y_hat Instead, can use the Stata adjust. adjust competent = 0, by(incumbent party_a) gen(y_hat). g partial_e = vote_a – y_hat

Calculating partial residuals Regression of the partial residual on competent should give you the same coefficient as in the earlier regression. It does.. reg partial_e competent Source | SS df MS Number of obs = F( 1, 31) = 4.52 Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = e | Coef. Std. Err. t P>|t| [95% Conf. Interval] competent | _cons | -7.25e

Compare scatter plot (top) with residual scatter plot (bottom) Residual plots especially important if results change when adding controls

Imputing missing data

Variables often have missing data Sources of missing data Missing data reduces estimate precision and may bias estimates To rescue data with missing cases: impute using other variables Imputing data can  Increase sample size and so increase precision of estimates  Reduce bias if data is not missing at random

Imputation example Car ownership in 1948 Say that some percentage of sample forgot to answer a question about whether they own a car The data set contains variables that predict car ownership: family_income, family_size, rural, urban, employed

Stata imputation command impute depvar varlist [weight] [if exp] [in range], generate(newvar1)  depvar is the variable whose missing values are to be imputed.  varlist is the list of variables on which the imputations are to be based  newvar1 is the new variable to contain the imputations Example  impute own_car family_income family_size rural suburban employed, g(i_own_car)

Rules about imputing Before you estimate a regression model, use the summary command to check for missing data Before you impute, check that relevant variables actually predict the variable with missing values (use regression or other estimator) Don’t use your studies’ dependent variable or key explanatory variable in the imputation (exceptions) Don’t impute missing values on your studies’ dependent variable express (exceptions) Always note whether imputation changed results If too much data as missing, imputation won’t help