Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Sociology 601 Class 24: November 19, 2009 (partial) Review –regression results for spurious & intervening effects –care with sample sizes for comparing.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
Heteroskedasticity The Problem:
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
INTERPRETATION OF A REGRESSION EQUATION
Sociology 601, Class17: October 27, 2009 Linear relationships. A & F, chapter 9.1 Least squares estimation. A & F 9.2 The linear regression model (9.3)
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 4 This week’s reading: Ch. 1 Today:
Sociology 601 Class 19: November 3, 2008 Review of correlation and standardized coefficients Statistical inference for the slope (9.5) Violations of Model.
Adaptive expectations and partial adjustment Presented by: Monika Tarsalewska Piotrek Jeżak Justyna Koper Magdalena Prędota.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
1 Multiple Regression EPP 245/298 Statistical Analysis of Laboratory Data.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Sociology 601 Class 23: November 17, 2009 Homework #8 Review –spurious, intervening, & interactions effects –stata regression commands & output F-tests.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
1 INTERPRETATION OF A REGRESSION EQUATION The scatter diagram shows hourly earnings in 2002 plotted against years of schooling, defined as highest grade.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
DUMMY CLASSIFICATION WITH MORE THAN TWO CATEGORIES This sequence explains how to extend the dummy variable technique to handle a qualitative explanatory.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
EXERCISE 5.5 The Stata output shows the result of a semilogarithmic regression of earnings on highest educational qualification obtained, work experience,
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
How do Lawyers Set fees?. Learning Objectives 1.Model i.e. “Story” or question 2.Multiple regression review 3.Omitted variables (our first failure of.
Addressing Alternative Explanations: Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Biostat 200 Lecture Simple linear regression Population regression equationμ y|x = α +  x α and  are constants and are called the coefficients.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Special topics. Importance of a variable Death penalty example. sum death bd- yv Variable | Obs Mean Std. Dev. Min Max
Lecture 5. Linear Models for Correlated Data: Inference.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 6) Slideshow: exercise 6.13 Original citation: Dougherty, C. (2012) EC220 - Introduction.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
F TESTS RELATING TO GROUPS OF EXPLANATORY VARIABLES 1 We now come to more general F tests of goodness of fit. This is a test of the joint explanatory power.
WHITE TEST FOR HETEROSCEDASTICITY 1 The White test for heteroscedasticity looks for evidence of an association between the variance of the disturbance.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
Spring 2007 Lecture 9Slide #1 More on Multivariate Regression Analysis Multivariate F-Tests Multicolinearity The EVILS of Stepwise Regression Intercept.
QM222 Class 19 Section D1 Tips on your Project
QM222 Class 9 Section A1 Coefficient statistics
QM222 Class 11 Section D1 1. Review and Stata: Time series data, multi-category dummies, etc. (chapters 10,11) 2. Capturing nonlinear relationships (Chapter.
QM222 Class 10 Section D1 1. Goodness of fit -- review 2
QM222 Nov. 7 Section D1 Multicollinearity Regression Tables What to do next on your project QM222 Fall 2016 Section D1.
QM222 Class 16 & 17 Today’s New topic: Estimating nonlinear relationships QM222 Fall 2017 Section A1.
QM222 Class 11 Section A1 Multiple Regression
QM222 Class 8 Section A1 Using categorical data in regression
The slope, explained variance, residuals
QM222 Your regressions and the test
QM222 Class 15 Section D1 Review for test Multicollinearity
Covariance x – x > 0 x (x,y) y – y > 0 y x and y axes.
Eva Ørnbøl + Morten Frydenberg
EPP 245 Statistical Analysis of Laboratory Data
Introduction to Econometrics, 5th edition
Presentation transcript:

Sociology 601 Class 25: November 24, 2009 Homework 9 Review –dummy variable example from ASR (finish) –regression results for dummy variables Quadratic effects –example: earnings and age –plotting F-tests comparing models Example from Sociology of Religion 1

Review: Regression with Dummy Variables 2 Create dummy variables for age: why? age is an interval variable, what advantage is there to creating a series of dummies? gen byte age25=0 if age<. /* new variable, age25, will be missing if age is missing */ replace age25=1 if age>=25 & age<=29 gen byte age30=0 if age<. replace age30=1 if age>=30 & age<=34 gen byte age35=0 if age<. replace age35=1 if age>=35 & age<=39 gen byte age40=0 if age<. replace age40=1 if age>=40 & age<=44 gen byte age45=0 if age<. replace age45=1 if age>=45 & age<=49 gen byte age50=0 if age<. replace age50=1 if age>=50 & age<=55 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age25-age50) tab agecheck, missing

Stata Shortcut for Dummy Variables 3 gen byte agecat= floor(age/5)*5 tab agecat, gen(age) * floor function deletes decimal places: * e.g., at age 23: floor(23/5)*5 = floor(4.6)*5 = 4*5 = 20 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age1-age6) tab agecheck, missing drop if age 54

Regression with Age Dummy Variables 4. regress conrinc age2-age6 if sex==1 Source | SS df MS Number of obs = F( 5, 719) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age2 | age3 | age4 | age5 | age6 | _cons | Same R-squared and overall F, but different b’s and t’s (although same relative order):. regress conrinc age1-age5 if sex==1 Source | SS df MS Number of obs = F( 5, 719) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age1 | age2 | age3 | age4 | age5 | _cons |

Plot Earnings by Age 5. tab age, sum(conrinc) | Summary of respondent income in age of | constant dollars respondent | Mean Std. Dev. Freq | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Total |

Regression Test for Curvilinearity 6 test whether x has a curvilinear relationship with y: testing for a quadratic relationship is the most common, but not the only method of testing for curvilinearity. y i = β 0 + β 1 x i + β 2 x i 2 + e i test whether β 2 ≠ 0 o if β 2 > 0, then U-shape curve (or part) o if β 2 < 0, then inverted-U curve (or part) o if β 2 !> 0 & β 2 !< 0, then revert to linear equation by dropping x 2 β 1 is rather irrelevant in this test o if p(β 2 ≠ 0)>.05 and p(β 1 ≠ 0)>.05, that does not mean there is no linear relationship.

Curvilinear Regression Equation: β 2 7 y i = β 0 + β 1 x i + β 2 x i 2 + e i β 2 (quadratic coefficient) determines how steeply the curve accelerates: y = 2x 2 ; y = x 2 ; y =.5 x 2

Curvilinear Regression Equation: β 2 < 0 8 y i = β 0 + β 1 x i + β 2 x i 2 + e i β 2 (quadratic coefficient) < 0 then curve is inverted-U y = -2x 2 ; y = -x 2 ; y = -.5 x 2

Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum 9 y i = β 0 + β 1 x i + β 2 x i 2 + e i inflexion point = value of x when y is a maximum or minimum = - β 1 / 2β 2 y = -20x x inflexion= -800 / (-20 * 2) = 20 (i.e., below observed x values) y = -100x x – inflexion = / (-100 *2) = 40 (i.e., within the x range) y = -20x x inflexion = / (-20 * 2) = 60 (i.e., above observed values)

Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum 10 y i = β 0 + β 1 x i + β 2 x i 2 + e i for completeness, when β 2 is positive: inflexion point = value of x when y is a maximum or minimum = - β 1 / 2β y = 20x x inflexion= / (20 * 2) = 20 (i.e., below observed x values) y = 100x x inflexion = / (-100 *2) = 40 (i.e., within the x range) y = 20x x inflexion = / (-20 * 2) = 60 (i.e., above observed values)

Example: Regression with Curvilinear Age 11. gen int agesq=age*age. summarize age agesq Variable | Obs Mean Std. Dev. Min Max age | agesq | regress conrinc age agesq if sex==1 Source | SS df MS Number of obs = F( 2, 722) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | agesq | _cons | t agesq = -3.52; p <.001, so: curvilinear; b agesq = negative, so: inverted U; inflexion point = -b age / (2 * b agesq) ) = / (2 * ) = 47.4 so maximum earnings at age 47 and a half.

Cubic Polynomials 12 Occasionally (actually, rarely), it is worthwhile to investigate whether a more complex polynomial would better describe the curvilinear relationship. Add a cubic term (x 3 ) to the previous quadratic equation: y i = β 0 + β 1 x i + β 2 x i 2 + β 3 x i 3 + e i Test β 3 ≠ 0 o if you can’t show β 3 ≠ 0, then revert to quadratic model o if p(β 3 ≠ 0) >.05, then don’t interpret β 2 and β 1 if β 3 ≠ 0, then curve has at least two bends (although not necessarily over the range of observed x’s)

Cubic Polynomials: Earnings and Age Example. regress conrinc age agesq agecu if sex==1 Source | SS df MS Number of obs = F( 3, 721) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | agesq | agecu | _cons | Note: after age cubed in entered, none of the coefficients are statistically significant (even though age and age squared were in the quadratic model). So, since β agecubed is not statistically significant, revert to the quadratic model (DON’T conclude that age has no relationship with earnings!) 13

Cubic Polynomials: Actual Results 14

Inferences: F-tests Comparing models 15 Comparing Regression Models, Agresti & Finlay, p 409: Where: R c 2 = R-square for complete model, R r 2 = R-square for reduced model, k = number of explanatory variables in complete model, g = number of explanatory variables in reduced model, and N = number of cases.

Example: F-tests Comparing models 16 Complete model: men’s earnings on age, age square, age cubed, education, and currently married dummy. Reduced model: men’s earnings on education and currently married dummy. F-test comparing model is whether age variables, as a group, have a significant relationship with earnings after controls for education and marital status

Example: F-tests Comparing models 17 Complete model: men’s earnings. regress conrinc age agesq agecu educ married if sex==1 Source | SS df MS Number of obs = F( 5, 719) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] age | agesq | agecu | educ | married | _cons | Note: none of the three age coefficients are, by themselves, statistically significant. R c 2 =.2387; k = 5.

Example: F-tests Comparing models 18 Reduced model: men’s earnings. regress conrinc educ married if sex==1 Source | SS df MS Number of obs = F( 2, 722) = Model | e e+10 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] educ | married | _cons | R r 2 =.1818; g = 2.

Inferences: F-tests Comparing models 19 F = ( – ) / (5 – 2)df 1 =5-2; df 1 =725-6 ( ) / (725 – 6) = / /719 = 26.87, df=(3,719), p <.001 (Agresti & Finlay, table D, page 673)

Next: Regression with Interaction Effects 20 Examples with earnings: married x gender age x gender age x education marital status x gender