Download presentation
Presentation is loading. Please wait.
Published byPaola Claybrooks Modified over 9 years ago
1
1 Connections between MCMC and Likelihood Methods Donald A. Pierce with Ruggero Bellio Winter 2010 OSU Slides are at www.science.oregonstate.edu/~piercedo/osu-mcmc-mpl.ppt www.science.oregonstate.edu/~piercedo/osu-mcmc-mpl.ppt
2
2 It is popular these days to be “Bayesian”, in large part due to the utility of MCMC and in particular (Win)BUGS However, substantive prior information is seldom used, aiming for “objective Bayes”, and connections to likelihood inference are interesting Largely, the gain in MCMC is in utilizing rather intractable likelihood functions: integrating over latent variates, e.g. latent cluster effects or covariates observed with error However, if everything except observed data is a random variable, issues of inference become highly (too?) automatic
3
3 A key issue in this is the contrast of profile and integrated likelihoods, namely Modern higher-order likelihood theory suggests, surprisingly, that integrated likelihoods can overcome shortcomings of profile likelihood A posterior for is an instance of integrated likelihood That is, so
4
4 An integrated likelihood is approximated very well by a Laplace approximation Hence, the MCMC posterior for “flat” priors is essentially We will see that this depends substantially on the representation of the nuisance parameter --- to be avoided in frequentist or likelihood inference The approximation above is, within reason, valid for any such representation (not that this is so comforting)
5
5 Regarding “flat” priors: in practice those used in WinBUGS manual examples seem advisable, i.e. proper but very diffuse for parameters on, e.g. dnorm(0,1E-6), and implicitly for the logs of inherently positive parameters, e.g. dgamma(1E- 6,1E-6) The latter is to obtain approximate invariance to scale for scale parameters, a natural requirement If to facilitate convergence is chosen otherwise, then for likelihood analysis one should divide the posterior of by the prior Geyer & Thompson (1992 JRSS-B) gave a method for computing the likelihood using MCMC, but the proposal here is far simpler
6
6 An attempt to generally improve on profile likelihood was the Cox-Reid approximate conditional likelihood requiring that the nuisance parameter be represented as ‘orthogonal’ to, i.e. that varies slowly with However, orthogonal parameters are not at all uniquely defined, resulting in arbitrariness of the ACL that must be resolved A partial indication of our interests is that the ACL is formally the same as the above approximation to the posterior for using flat priors
7
7 Barndorff-Nielsen developed the modified profile likelihood that is invariant to representation of the nuisance parameter --- a really key issue Remarkable stroke of intuition, and B-N only showed that the MPL approximates what is desired for the primary special settings: exponential families, regression-scale models, etc We have been developing the idea that what the MPL in general approximates is a suitable integrated likelihood, hence with close connections to MCMC
8
8 Example (Pierce & Peters 1992): CC study, 40 sets with 2:1 matching, 30/80 of controls “exposed” Solid line PL, dashed lines conditional likelihood and MPL
9
9 Concept of ‘orthogonal’ parameter, for ACL and for MCMC, needs clarification In principle there is an ‘ideal’ choice of orthogonal parameter such that the integrated likelihood, i.e. the Bayes posterior (with uniform priors), approximates the MPL Some goals are: (a) to actually compute this, either from the likelihood or the posterior samples, (b) to recover the PL from the posterior distribution, and (c) to approximate the MPL in this way, even if not as in (a) These are not completed, but some progress has been made
10
10 Example: Binary data on 50 subjects, repeated observations at up to five times, total of 220 observations Suitable for logistic mixed model with latent random intercepts for subjects Interest parameter the standard deviation of the random intercepts. Seven nuisance parameters: constant term, 2 treatment parameters, 4 for time effects Usual parametrization is not orthogonal: vector of canonical regression parameters are ‘attenuated’ as suggesting an approximately orthogonal parameter
11
11 WinBUGS posterior densities of using flat priors: heavy line original parametrization, light line using the approximately orthogonal nuisance parameters
12
12 Posterior samples: Sigma vs constant term, for original and orthogonal parametrizations This provides a clue that we can use posterior samples to assess and correct for lack of orthogonality
13
13 Important but confusing issue –- clearly, if we transform the posterior samples as the marginal distribution of is unchanged Part of reason reparametrization of matters is that this is done in the model specification, where in contrast to the above there is no (implicit) Jacobian involved in the density Having samples from the joint distribution of, it would be possible but impractical to divide the density by the Jacobian, to avoid re-doing MCMC We can achieve this aim otherwise by resampling from the posterior samples with weights inversely proportional to the reciprocal Jacobian
14
14 Recall that to very good approximation the MCMC posterior, for flat priors, is essentially which can be expressed approximately as We can approximate the final factor from the MCMC samples at hand, and thus approximate the PL by dividing the posterior density of by our estimate of There are, however, issues involving the distinction between posterior and sampling theory
15
15 A transparent way to do this, although there may be more accurate ways Choose bins for (e.g. 20 using quantiles), for each of these compute, and then smooth (the logs of) these by quadratic regression on the bin classmarks
16
16 Red right: MCMC posterior original parametrization Red left (dashed): after above adjustment Black: PL computed by quadrature
17
17 What should be the meaning of ‘orthogonal’ parameter for use in the APL? Said earlier that should vary slowly with which is related to the more usual definition that the (expected) cross-information terms are zero But if satisfies this definition then so does any 1-1 transformation of it --- very unsatisfactory Further, this could not be a requirement for validity of APL, since linear transformations leave the APL unchanged even though not conforming at all to such requirements This suggests more difficulties than first thought in utilizing plots such as on slide 12 for such purposes
18
18 There is in principle a reparametrization such that MPL and IL agree (related to Severini, 2007 Bmtrka) The constrained MLE can be thought of as a function of if sufficient, otherwise If there is taken as a variable, this defines a nuisance parameter representation This representation of the NP depends on or on --- no real problem for Bayesian methods Define as the inverse function solving the equation Then the MPL is the Laplace approximation to the integrated likelihood based on representation of the nuisance parameter
19
19 Theory for this: Laplace approximation in parametrizations and differ only by Jacobian factor and we are matching that Jacobian with final factor of Actually need only derivatives Difficulty in all this is in utilizing, for likelihood, variations in while holding fixed a suitable ancillary “a” Roughly speaking, a suitable ancillary is the ratio of observed to expected information for
20
20 Ex: Two exponential samples with means and Reparametrize orthogonally with means Then provides the corresponding parametric function Set this equal to and solve for the inverse Then to up to Laplace approximation the MPL is the IL for nuisance parameter representation log PL log ACL and MCMC posterior with “obvious” orthog but for this example MPL=PL
21
21 Our MCMC example is not very suitable for investigating all this --- MPL is (again) very near the PL When likelihood is intractable, or when the MLE is not sufficient, can we use the MCMC to approximate the MPL? Is it better to approximate the reparametrization for which IL = MPL, or better to compute the required Jacobian more directly? An issue is whether there can, in principle, be enough information in the likelihood, or posterior samples, to approximate the MPL Can we tell from the posterior samples how the joint distribution would change for slightly different data?
22
22 There is yet another parametrization such that locally the nuisance parameter becomes a translation parameter In this parametrization the answer to that question is “yes” An aim is to capitalize on this without solving for that new parametrization, perhaps taking advantage of the fact that the product of the final two terms in the MPL is invariant to reparametrization Have had some success for a single nuisance parameter, but there remains much to do
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.