Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008
Univariate logistic regression Multivariate logistic regression Prior specification and convergence Posterior computation Experimental result Conclusions Outlines
Univariate Logistic Regression Model Equivalent: z i : latent variable L( ): logistic density logistic density: CDF:
Univariate Logistic Regression Model Approximation using t distribution set
Multivariate Logistic Regression Model Binary variable for each output with -- marginal pdf has univariate logistic density, F -1 ( ) is the inverse CDF of density
Multivariate Logistic Regression Model Property The marginal univariate densities of z j, for j=1,…,p, have univariate logistic form p=1, reduce to the univariate logistic density R is a correlation matrix (with 1’s on the diagonal), reflecting the correlations between z j, and hence the correlations between y j R=diag(1,…,1), reduce to a product of univariate logistic densities, and the elements of z are uncorrelated Good convergence property for MCMC sampling
Multivariate Logistic Regression Model Likelihood M-ary variable for each output (ordered) Assume Define
Prior specification and convergence or R: uniform density [-1,1] for each element in non-diagonal position
Posterior Computation Posterior: Prior and likelihood are not conjugate Proposal distribution: = Use multivariate t distribution to approximate the multivariate logistic density in the likelihood part. Importance sampling: sample from a proposal distribution to approximate samples from, and use importance weights for exact inference.
Posterior Computation Introduce latent variables and z, the proposal is expressed as Sample and z from the full conditionals since the likelihood is conjugate to prior. Update R using a Metropolis step (accept/reject) z)z) Set with probability Set otherwise
Posterior Computation Importance weights for inference weights
Application Subject: 584 twin pregnancies Output: small for gestational age (SGA), defined as a birthweight below the 10th percentile for a given gestational age in a reference population. Binary output, y ij ={0,1}, i=1,…,584, j=1, 2 Covariates: x ij for the ith pregnancy and the jth infant
Application Obtain nearly identical estimates to the study of AP for the regression coefficients. Female gender (β 1 ), prior preterm delivery (β 4, β 5 ) and smoking (β 8 ) are associated with an increased risk of SGA. Outcomes for twins are highly correlated, represented by R.
Conclusions Propose a multivariate logistic density for multivariate logistic regression model. The proposed multivariate logistic density is closely approximated by a multivariate t distribution. Has properties that facilitate efficient sampling and guaranteed convergence. The marginals are univariate logistic densities. Embed the correlation structure within the model.