1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL.

Slides:



Advertisements
Similar presentations
Dummy Variables and Interactions. Dummy Variables What is the the relationship between the % of non-Swiss residents (IV) and discretionary social spending.
Advertisements

Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.16 Original citation: Dougherty, C. (2012) EC220 - Introduction.
SC968: Panel Data Methods for Sociologists Random coefficients models.
11 Use of Gaussian Integration (Quadrature) in Stata Alan H. Feiveson NASA Johnson Space Center What is Gaussian quadrature? Gauss-Legendre case A simple.
1 Results from hsb_subset.do. 2 Example of Kloeck problem Two-stage sample of high school sophomores 1 st school is selected, then students are picked,
From Anova to Regression: analyzing the effect on consumption of no. of persons in family Family consumption data family.dta E/Albert/Courses/cdas/appstat00/From.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: interactive explanatory variables Original citation: Dougherty, C. (2012)
More on Regression Spring The Linear Relationship between African American Population & Black Legislators.
Heteroskedasticity The Problem:
Lecture 4 (Chapter 4). Linear Models for Correlated Data We aim to develop a general linear model framework for longitudinal data, in which the inference.
Repeated Measures, Part 3 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
HETEROSCEDASTICITY-CONSISTENT STANDARD ERRORS 1 Heteroscedasticity causes OLS standard errors to be biased is finite samples. However it can be demonstrated.
Lecture 9 Today: Ch. 3: Multiple Regression Analysis Example with two independent variables Frisch-Waugh-Lovell theorem.
1 Nonlinear Regression Functions (SW Chapter 8). 2 The TestScore – STR relation looks linear (maybe)…
Christopher Dougherty EC220 - Introduction to econometrics (chapter 3) Slideshow: exercise 3.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Multilevel Models 4 Sociology 8811, Class 26 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission.
Lecture 4 This week’s reading: Ch. 1 Today:
Valuation 4: Econometrics Why econometrics? What are the tasks? Specification and estimation Hypotheses testing Example study.
Sociology 601 Class 21: November 10, 2009 Review –formulas for b and se(b) –stata regression commands & output Violations of Model Assumptions, and their.
Shall we take Solow seriously?? Empirics of growth Ania Nicińska Agnieszka Postępska Paweł Zaboklicki.
Multiple Regression Spring Gore Likeability Example Suppose: –Gore’s* likeability is a function of Clinton’s likeability and not directly.
Sociology 601 Class 28: December 8, 2009 Homework 10 Review –polynomials –interaction effects Logistic regressions –log odds as outcome –compared to linear.
Multilevel Models 2 Sociology 8811, Class 24
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Introduction to Regression Analysis Straight lines, fitted values, residual values, sums of squares, relation to the analysis of variance.
1 Review of Correlation A correlation coefficient measures the strength of a linear relation between two measurement variables. The measure is based on.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
1 Michigan.do. 2. * construct new variables;. gen mi=state==26;. * michigan dummy;. gen hike=month>=33;. * treatment period dummy;. gen treatment=hike*mi;
Regression Forced March Spring Regression quantifies how one variable can be described in terms of another.
A trial of incentives to attend adult literacy classes Carole Torgerson, Greg Brooks, Jeremy Miles, David Torgerson Classes randomised to incentive or.
Interpreting Bi-variate OLS Regression
1 Zinc Data EPP 245 Statistical Analysis of Laboratory Data.
1 Regression and Calibration EPP 245 Statistical Analysis of Laboratory Data.
Sociology 601 Class 26: December 1, 2009 (partial) Review –curvilinear regression results –cubic polynomial Interaction effects –example: earnings on married.
TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypotheses relating to regression coefficients. It is.
EDUC 200C Section 4 – Review Melissa Kemmerle October 19, 2012.
TOBIT ANALYSIS Sometimes the dependent variable in a regression model is subject to a lower limit or an upper limit, or both. Suppose that in the absence.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 10) Slideshow: Tobit models Original citation: Dougherty, C. (2012) EC220 - Introduction.
1 INTERACTIVE EXPLANATORY VARIABLES The model shown above is linear in parameters and it may be fitted using straightforward OLS, provided that the regression.
Quantile Regression Prize Winnings – LPGA 2009/2010 Seasons Kahane, L.H. (2010). “Returns to Skill in Professional Golf: A Quantile Regression.
Confidence intervals were treated at length in the Review chapter and their application to regression analysis presents no problems. We will not repeat.
1 BINARY CHOICE MODELS: PROBIT ANALYSIS In the case of probit analysis, the sigmoid function is the cumulative standardized normal distribution.
Returning to Consumption
Country Gini IndexCountryGini IndexCountryGini IndexCountryGini Index Albania28.2Georgia40.4Mozambique39.6Turkey38 Algeria35.3Germany28.3Nepal47.2Turkmenistan40.8.
Multilevel Analysis Kate Pickett Senior Lecturer in Epidemiology.
MULTILEVEL ANALYSIS Kate Pickett Senior Lecturer in Epidemiology SUMBER: www-users.york.ac.uk/.../Multilevel%20Analysis.ppt‎University of York.
Repeated Measures, Part 2 May, 2009 Charles E. McCulloch, Division of Biostatistics, Dept of Epidemiology and Biostatistics, UCSF.
EDUC 200C Section 3 October 12, Goals Review correlation prediction formula Calculate z y ’ = r xy z x for a new data set Use formula to predict.
What is the MPC?. Learning Objectives 1.Use linear regression to establish the relationship between two variables 2.Show that the line is the line of.
MULTIPLE REGRESSION WITH TWO EXPLANATORY VARIABLES: EXAMPLE 1 This sequence provides a geometrical interpretation of a multiple regression model with two.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 1) Slideshow: exercise 1.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Lecture 3 Linear random intercept models. Example: Weight of Guinea Pigs Body weights of 48 pigs in 9 successive weeks of follow-up (Table 3.1 DLZ) The.
Econ 314: Project 1 Answers and Questions Examining the Growth Data Trends, Cycles, and Turning Points.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 5) Slideshow: exercise 5.2 Original citation: Dougherty, C. (2012) EC220 - Introduction.
Panel Data. Assembling the Data insheet using marriage-data.csv, c d u "background-data", clear d u "experience-data", clear u "wage-data", clear d reshape.
Christopher Dougherty EC220 - Introduction to econometrics (chapter 4) Slideshow: exercise 4.5 Original citation: Dougherty, C. (2012) EC220 - Introduction.
COST 11 DUMMY VARIABLE CLASSIFICATION WITH TWO CATEGORIES 1 This sequence explains how you can include qualitative explanatory variables in your regression.
Lecture 5. Linear Models for Correlated Data: Inference.
STAT E100 Section Week 12- Regression. Course Review - Project due Dec 17 th, your TA. - Exam 2 make-up is Dec 5 th, practice tests have been updated.
RAMSEY’S RESET TEST OF FUNCTIONAL MISSPECIFICATION 1 Ramsey’s RESET test of functional misspecification is intended to provide a simple indicator of evidence.
1 CHANGES IN THE UNITS OF MEASUREMENT Suppose that the units of measurement of Y or X are changed. How will this affect the regression results? Intuitively,
Modeling Multiple Source Risk Factor Data and Health Outcomes in Twins Andy Bogart, MS Jack Goldberg, PhD.
1 BINARY CHOICE MODELS: LINEAR PROBABILITY MODEL Economists are often interested in the factors behind the decision-making of individuals or enterprises,
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
1 COMPARING LINEAR AND LOGARITHMIC SPECIFICATIONS When alternative specifications of a regression model have the same dependent variable, R 2 can be used.
VARIABLE MISSPECIFICATION I: OMISSION OF A RELEVANT VARIABLE In this sequence and the next we will investigate the consequences of misspecifying the regression.
The slope, explained variance, residuals
QM222 Class 15 Section D1 Review for test Multicollinearity
Introduction to Econometrics, 5th edition
Presentation transcript:

1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL

2 y i =  0 +  1 x i + e i V( e i ) =  2 Variance Models with Simple Linear Regression y i =  0 +  1 x i + e i y i =  0 +  1 x i + e i V( e i ) =  2 (  0 +  1 x i ) 2 y = X  + Zu

3.clear. set obs 100. gen x=10*uniform(). gen mu = 1+.5*x. replace y=mu+.10*mu*invnorm(uniform()) Example:  0 = 1.0,  1 = 0.5,  = 0.10 y i =  0 +  1 x i + e i V(e i ) =  2 (  0 +  1 x i ) 2 Problem: Estimate  0,  1, and 

4 Variance Stabilization y i =  0 +  1 x i + e i V(e i ) =  2 (  0 +  1 x i ) 2 But E(log y i ) = g(     1, x i )

5 Source | SS df MS Number of obs = F( 2, 97) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = z | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | x2 | _cons | gen z = log(y). gen x2 = x*x. reg z x x2. predict z_hat Approximate g(     1, x i ) by polynomial in x, then do OLS regression of log y on x:  0 = 1.0,  1 = 0.5,  = 0.10

6 But what about  0 and  1 ?

7 reg y x predict muh reg y x [w=1/muh^2].local rmse=e(rmse).gen wt = 1/muh^2.summ wt.local wbar=r(mean).local sigh = sqrt(`wbar’)*`rmse’ Alternative: Iteratively re-weighted regression

8 ITERATION 0. reg y x Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | gen wt = 1/(_b[_cons] + _b[x]*x)^2

9 ITERATION 1. reg y x [w=wt] (analytic weights assumed) (sum of wgt is e+01) Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)

10 ITERATION 2. reg y x [w=wt] (analytic weights assumed) (sum of wgt is e+01) Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)

11 ITERATION 3. noi reg y x [w=wt] (analytic weights assumed) (sum of wgt is e+01) Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | summ wt Variable | Obs Mean Std. Dev. Min Max wt | local wbar=r(mean). noi di e(rmse)*sqrt(`wbar')  0 = 1.0,  1 = 0.5,  = 0.10

12 Can we do this using -xtmixed- ?. xtmixed y x ||???: x How do we get –xtmixed- to estimate a non-constant residual variance? Degenerate dependency of random effects (u 0i = u 1i ). Coefficients of random intercept and slope (c 0 and c 1 ) need to be constrained. y i =  0 +  1 x i +  (  0 +  1 x i )u i =  0 +  1 x i + c 0 u 0i + c 1 x i u 1i where u 0i = u 1i and c 1 /c 0 =  1 /  0

13 y i =  0 +  1 x i + c 0 u 0i + c 1 x i u 1i Can we do this using -xtmixed- ? set obs 1000 gen x = 5*uniform() gen mu = 3+1.4*x gen u0=invnorm(uniform()) gen u1=invnorm(uniform()) gen y = mu *u *x*u1 How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 1: Ignore dependency of u’s and constraints on c’s: gen ord=_n xtmixed y x ||ord: x,noc

14. xtmixed y x ||ord: x,noc nolog Mixed-effects REML regression Number of obs = 1000 Group variable: ord Number of groups = 1000 Obs per group: min = 1 avg = 1.0 max = 1 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = y | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ord: Identity | sd(x) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 =  0 = 3.0,  1 = 1.4, c 0 = 0.05, c 1 = 0.50

15 y i =  0 +  1 x i + c 1 z i u 1i Can we do this using -xtmixed- ? set obs 1000 gen x = 5*uniform() gen z = *x gen u1=invnorm(uniform()) gen y = *x *z*u1 gen ord=_n xtmixed y x ||ord: z,noc How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known:

16 Can we do this using -xtmixed- ? Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = numerical derivatives are approximate flat or discontinuous region encountered Iteration 1: log restricted-likelihood = numerical derivatives are approximate Garbage! y i =  0 +  1 x i + c 1 z i u 1i How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: xtmixed y x ||ord: z,noc

17 expand 3 sort ord gen yf=y +.001*invnorm(uniform()) xtmixed yf x ||ord: z,noc nolog Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: y i =  0 +  1 x i + c 1 z i u 1i

18. xtmixed yf x ||ord: z,noc nolog Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ord: Identity | sd(z) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 =  0 = 3.0,  1 = 1.4, c 1 = 0.50

19 Can we do this using -xtmixed- ? Degenerate dependency of random effects (u 0i = u 1i ). Coefficients of random intercept and slope (c 0 and c 1 ) need to be constrained. y i =  0 +  1 x i +  (  0 +  1 x i )u i =  0 +  1 x i + c 0 u 0i + c 1 x i u 1i where u 0i = u 1i and c 1 /c 0 =  1 /  0 Ex 3: No random intercept (unknown covariate)

20 Can we do this using -xtmixed- ? Degenerate dependency of random effects (u 0i = u 1i ). Coefficients of random intercept and slope (c 0 and c 1 ) need to be constrained. y i =  0 +  1 x i +  (  0 +  1 x i )u i =  0 +  1 x i + c 0 u 0i + c 1 x i u 1i where u 0i = u 1i and c 1 /c 0 =  1 /  0 Recast model with one error term and pretend z i =  0 +  1 x i is known. Then iterate. Ex 3: No random intercept (unknown covariate)

21 y i =  0 +  1 x i +  (  0 +  1 x i )u i =  0 +  1 x i + c 1 z i u 1i Can we do this using -xtmixed- ? 1.Expand and introduce artificial “residual” error term.expand 3.gen yf=y +.001*invnorm(uniform())

22 1.Expand and introduce artificial “residual” error term. 2. Estimate z i by OLS or other “easy” method. Can we do this using -xtmixed- ?.expand 3.gen yf=y +.001*invnorm(uniform()).reg y x.predict zh y i =  0 +  1 x i +  (  0 +  1 x i )u i =  0 +  1 x i + c 1 z i u 1i

23 1.Expand and introduce artificial “residual” error term. 2. Estimate z i by OLS or other “easy” method. 3. Fit model pretending prediction zh i is actual z i..expand 3.gen yf=y +.001*invnorm(uniform()).reg y x.predict zh.xtmixed yf x ||ord: zh,noc nolog y i =  0 +  1 x i + c 1 z i u 1i  z i =  0 +  1 x i is unknown] Can we do this using -xtmixed- ?

24.expand 3.gen yf=y +.001*invnorm(uniform()).reg y x.predict zh.xtmixed yf x ||ord: zh,noc nolog.drop zh.predict zh Can we do this using -xtmixed- ? y i =  0 +  1 x i + c 1 z i u 1i  z i =  0 +  1 x i is unknown] 1.Expand and introduce artificial “residual” error term. 2. Estimate z i by OLS or other “easy” method. 3. Fit model pretending prediction zh i is actual z i. 4. Iterate.

25. xtmixed yf x ||ord: zh,noc Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ord: Identity | sd(zh) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 =  0 = 3.0,  1 = 1.4, c 1 = 0.50

26 args NS nr be0 be1 c1 c2 drop _all set obs `NS' gen id=_n gen u1 = invnorm(uniform()) expand `nr' sort id gen u2=invnorm(uniform()) gen x = 10*uniform() gen z = `be0' + `be1'*x gen zi = z + `c1'*z*u1 gen y = zi + `c2'*zi*u2 y ij =  0 +  1 x ij + c 1 (  0 +  1 x ij )u 1i + c 2 [  0 +  1 x ij + c 1 (  0 +  1 x ij )u 1i ]u 2i 2-level model E(y ij | x ij ) E(y ij | x ij, u 1i ) [“z”] [“z i ”]

27 2-level model (example)

28 //[gen y = zi + `c2'*zi*e] gen obs=_n expand 3 sort obs gen yf = y +.001*invnorm(uniform()) xtmixed y x ||id: x,noc nolog predict zh0 predict uh1i_0,reffects level(id) gen zhi_0 = zh0 + uh1i_0 xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nolog predict zh1 predict uh1i_1,reffects level(id) gen zhi_1 = zh1 + uh1i_1 xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nolog predict zh2 predict zhi_2,reffects level(id) gen zhi_2 = zh2 + uh1i_2 noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog

29. run nasug_2008_sim Mixed-effects REML regression Number of obs = | No. of Observations per Group Group Variable | Groups Minimum Average Maximum id | obs |

30. run nasug_2008_sim Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] id: Identity | sd(zh) | obs: Identity | sd(zhi) | sd(Residual) | LR test vs. linear regression: chi2(2) = Prob > chi2 =

31  c2c2  c1c1 Bayesian Estimation (WINBUGS)

32 WINBUGS STATA (xtmixed) node mean sd 2.5%median 97.5% start sample be be c c yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons| Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] id: Identity | sd(xb) | obs: Identity | s d(muhi) |