1 Estimation of constant-CV regression models Alan H. Feiveson NASA – Johnson Space Center Houston, TX SNASUG 2008 Chicago, IL
2 y i = 0 + 1 x i + e i V( e i ) = 2 Variance Models with Simple Linear Regression y i = 0 + 1 x i + e i y i = 0 + 1 x i + e i V( e i ) = 2 ( 0 + 1 x i ) 2 y = X + Zu
3.clear. set obs 100. gen x=10*uniform(). gen mu = 1+.5*x. replace y=mu+.10*mu*invnorm(uniform()) Example: 0 = 1.0, 1 = 0.5, = 0.10 y i = 0 + 1 x i + e i V(e i ) = 2 ( 0 + 1 x i ) 2 Problem: Estimate 0, 1, and
4 Variance Stabilization y i = 0 + 1 x i + e i V(e i ) = 2 ( 0 + 1 x i ) 2 But E(log y i ) = g( 1, x i )
5 Source | SS df MS Number of obs = F( 2, 97) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = z | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | x2 | _cons | gen z = log(y). gen x2 = x*x. reg z x x2. predict z_hat Approximate g( 1, x i ) by polynomial in x, then do OLS regression of log y on x: 0 = 1.0, 1 = 0.5, = 0.10
6 But what about 0 and 1 ?
7 reg y x predict muh reg y x [w=1/muh^2].local rmse=e(rmse).gen wt = 1/muh^2.summ wt.local wbar=r(mean).local sigh = sqrt(`wbar’)*`rmse’ Alternative: Iteratively re-weighted regression
8 ITERATION 0. reg y x Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | gen wt = 1/(_b[_cons] + _b[x]*x)^2
9 ITERATION 1. reg y x [w=wt] (analytic weights assumed) (sum of wgt is e+01) Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)
10 ITERATION 2. reg y x [w=wt] (analytic weights assumed) (sum of wgt is e+01) Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | replace wt = 1/(_b[_cons] + _b[x]*x)^2 (100 real changes made)
11 ITERATION 3. noi reg y x [w=wt] (analytic weights assumed) (sum of wgt is e+01) Source | SS df MS Number of obs = F( 1, 98) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] x | _cons | summ wt Variable | Obs Mean Std. Dev. Min Max wt | local wbar=r(mean). noi di e(rmse)*sqrt(`wbar') 0 = 1.0, 1 = 0.5, = 0.10
12 Can we do this using -xtmixed- ?. xtmixed y x ||???: x How do we get –xtmixed- to estimate a non-constant residual variance? Degenerate dependency of random effects (u 0i = u 1i ). Coefficients of random intercept and slope (c 0 and c 1 ) need to be constrained. y i = 0 + 1 x i + ( 0 + 1 x i )u i = 0 + 1 x i + c 0 u 0i + c 1 x i u 1i where u 0i = u 1i and c 1 /c 0 = 1 / 0
13 y i = 0 + 1 x i + c 0 u 0i + c 1 x i u 1i Can we do this using -xtmixed- ? set obs 1000 gen x = 5*uniform() gen mu = 3+1.4*x gen u0=invnorm(uniform()) gen u1=invnorm(uniform()) gen y = mu *u *x*u1 How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 1: Ignore dependency of u’s and constraints on c’s: gen ord=_n xtmixed y x ||ord: x,noc
14. xtmixed y x ||ord: x,noc nolog Mixed-effects REML regression Number of obs = 1000 Group variable: ord Number of groups = 1000 Obs per group: min = 1 avg = 1.0 max = 1 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = y | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ord: Identity | sd(x) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 = 0 = 3.0, 1 = 1.4, c 0 = 0.05, c 1 = 0.50
15 y i = 0 + 1 x i + c 1 z i u 1i Can we do this using -xtmixed- ? set obs 1000 gen x = 5*uniform() gen z = *x gen u1=invnorm(uniform()) gen y = *x *z*u1 gen ord=_n xtmixed y x ||ord: z,noc How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known:
16 Can we do this using -xtmixed- ? Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = numerical derivatives are approximate flat or discontinuous region encountered Iteration 1: log restricted-likelihood = numerical derivatives are approximate Garbage! y i = 0 + 1 x i + c 1 z i u 1i How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: xtmixed y x ||ord: z,noc
17 expand 3 sort ord gen yf=y +.001*invnorm(uniform()) xtmixed yf x ||ord: z,noc nolog Can we do this using -xtmixed- ? How do we get –xtmixed- to estimate a non-constant residual variance? Ex. 2: No random intercept, covariate known: y i = 0 + 1 x i + c 1 z i u 1i
18. xtmixed yf x ||ord: z,noc nolog Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ord: Identity | sd(z) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 = 0 = 3.0, 1 = 1.4, c 1 = 0.50
19 Can we do this using -xtmixed- ? Degenerate dependency of random effects (u 0i = u 1i ). Coefficients of random intercept and slope (c 0 and c 1 ) need to be constrained. y i = 0 + 1 x i + ( 0 + 1 x i )u i = 0 + 1 x i + c 0 u 0i + c 1 x i u 1i where u 0i = u 1i and c 1 /c 0 = 1 / 0 Ex 3: No random intercept (unknown covariate)
20 Can we do this using -xtmixed- ? Degenerate dependency of random effects (u 0i = u 1i ). Coefficients of random intercept and slope (c 0 and c 1 ) need to be constrained. y i = 0 + 1 x i + ( 0 + 1 x i )u i = 0 + 1 x i + c 0 u 0i + c 1 x i u 1i where u 0i = u 1i and c 1 /c 0 = 1 / 0 Recast model with one error term and pretend z i = 0 + 1 x i is known. Then iterate. Ex 3: No random intercept (unknown covariate)
21 y i = 0 + 1 x i + ( 0 + 1 x i )u i = 0 + 1 x i + c 1 z i u 1i Can we do this using -xtmixed- ? 1.Expand and introduce artificial “residual” error term.expand 3.gen yf=y +.001*invnorm(uniform())
22 1.Expand and introduce artificial “residual” error term. 2. Estimate z i by OLS or other “easy” method. Can we do this using -xtmixed- ?.expand 3.gen yf=y +.001*invnorm(uniform()).reg y x.predict zh y i = 0 + 1 x i + ( 0 + 1 x i )u i = 0 + 1 x i + c 1 z i u 1i
23 1.Expand and introduce artificial “residual” error term. 2. Estimate z i by OLS or other “easy” method. 3. Fit model pretending prediction zh i is actual z i..expand 3.gen yf=y +.001*invnorm(uniform()).reg y x.predict zh.xtmixed yf x ||ord: zh,noc nolog y i = 0 + 1 x i + c 1 z i u 1i z i = 0 + 1 x i is unknown] Can we do this using -xtmixed- ?
24.expand 3.gen yf=y +.001*invnorm(uniform()).reg y x.predict zh.xtmixed yf x ||ord: zh,noc nolog.drop zh.predict zh Can we do this using -xtmixed- ? y i = 0 + 1 x i + c 1 z i u 1i z i = 0 + 1 x i is unknown] 1.Expand and introduce artificial “residual” error term. 2. Estimate z i by OLS or other “easy” method. 3. Fit model pretending prediction zh i is actual z i. 4. Iterate.
25. xtmixed yf x ||ord: zh,noc Mixed-effects REML regression Number of obs = 3000 Group variable: ord Number of groups = 1000 Obs per group: min = 3 avg = 3.0 max = 3 Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] ord: Identity | sd(zh) | sd(Residual) | LR test vs. linear regression: chibar2(01) = Prob >= chibar2 = 0 = 3.0, 1 = 1.4, c 1 = 0.50
26 args NS nr be0 be1 c1 c2 drop _all set obs `NS' gen id=_n gen u1 = invnorm(uniform()) expand `nr' sort id gen u2=invnorm(uniform()) gen x = 10*uniform() gen z = `be0' + `be1'*x gen zi = z + `c1'*z*u1 gen y = zi + `c2'*zi*u2 y ij = 0 + 1 x ij + c 1 ( 0 + 1 x ij )u 1i + c 2 [ 0 + 1 x ij + c 1 ( 0 + 1 x ij )u 1i ]u 2i 2-level model E(y ij | x ij ) E(y ij | x ij, u 1i ) [“z”] [“z i ”]
27 2-level model (example)
28 //[gen y = zi + `c2'*zi*e] gen obs=_n expand 3 sort obs gen yf = y +.001*invnorm(uniform()) xtmixed y x ||id: x,noc nolog predict zh0 predict uh1i_0,reffects level(id) gen zhi_0 = zh0 + uh1i_0 xtmixed yf x ||id: zh0,noc ||obs: zhi_0,noc nolog predict zh1 predict uh1i_1,reffects level(id) gen zhi_1 = zh1 + uh1i_1 xtmixed yf x ||id: zh1,noc ||obs: zhi_1,noc nolog predict zh2 predict zhi_2,reffects level(id) gen zhi_2 = zh2 + uh1i_2 noi xtmixed yf x ||id: zh2,noc ||obs: zhi_2,noc nolog
29. run nasug_2008_sim Mixed-effects REML regression Number of obs = | No. of Observations per Group Group Variable | Groups Minimum Average Maximum id | obs |
30. run nasug_2008_sim Wald chi2(1) = Log restricted-likelihood = Prob > chi2 = yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons | Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] id: Identity | sd(zh) | obs: Identity | sd(zhi) | sd(Residual) | LR test vs. linear regression: chi2(2) = Prob > chi2 =
31 c2c2 c1c1 Bayesian Estimation (WINBUGS)
32 WINBUGS STATA (xtmixed) node mean sd 2.5%median 97.5% start sample be be c c yf | Coef. Std. Err. z P>|z| [95% Conf. Interval] x | _cons| Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] id: Identity | sd(xb) | obs: Identity | s d(muhi) |