Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November.

Similar presentations


Presentation on theme: "Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November."— Presentation transcript:

1 Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November 9 th, 2006

2 Outline 1.Brief introduction to hierarchical models 2.Introduce 2 ‘standard’ parametric models 3.Extend them with 2 semi-parametric models 4.Applied example of Disinfection by- products and spontaneous abortion

3 Bayesian Hierarchical Models Natural way to model many applied problems μ i are assumed exchangeable G may depend on further coefficients that extend the hierarchy

4 Bayesian Hierarchical Models Frequently, researchers will be interested in estimating a large number of coefficients, possibly with a large amount of correlation between predictors. Hierarchical models may be particularly useful by allowing ‘borrowing’ of information between groups.

5 Bayesian Hierarchical Models For example, we wish to estimate the effect of 13 chemicals on early pregnancy loss. The chemicals are highly correlated, making standard frequentist methods unstable.

6 Hierarchical Regression Models Traditional models treat the outcome as random Hierarchical models also treat coefficients as random

7 Some Parametric and Semi- Parametric Bayesian Hierarchical Models 1.Simplest hierarchical model (P1) 2.Fully Bayesian hierarchical model (P2) 3.Dirichlet process prior 4.Dirichlet process prior with selection component

8 1: The first parametric model (P1) A simple “one-level” hierarchical model Popularized by Greenland in Epidemiology –he refers to it as “semi-Bayes” –may refer to asymptotic methods commonly used in fitting semi-Bayes models Has seen use in nutritional, genetic, occupational, and cancer research

9 Hierarchical Models: Bayes and Shrinkage μ is our prior belief about the size of the effect and ϕ 2 is uncertainty in it Effect estimates from hierarchical models are ‘shrunk’ (moved) toward the prior distribution Shrinkage: –for Bayesian: natural consequence of combining prior with data –for frequentist: introduce bias to reduce MSE (biased but more precise) Amount of shrinkage depends on prior variance

10 Hierarchical Models: Bayes and Shrinkage In the simple Normal-Normal setting, So the posterior is: Where: And I is the pxp identity matrix yi may be either a continuous response or an imputed latent response (via Albert and Chib)

11 Model P1: Shrinkage SB model: μ=0 and ϕ 2 =2.0, 1.0, and 0.5

12 The problem with model P1 Assumes the prior variance is known with certainty –constant shrinkage of all coefficients Sensitivity analyses address changes to results with different prior variances Data contain information on prior variance

13 2: A richer parametric model (P2) Places prior distribution on ϕ 2 –reduces dependence on prior variance Could place prior on μ as well (in some situations)

14 Properties of model P2 Prior distribution on ϕ 2 allows it to be updated by the data As variability of estimates from prior mean increases, so does ϕ 2 As variability of estimates from prior mean decreases, so does ϕ 2 Adaptive shrinkage of all coefficients

15 Posterior Sampling for Model P2 In the simple Normal-Normal setting, So the conditional posteriors are: Where:

16 Adaptive Shrinkage of model P2 ModelPrior variance Φ | data shrinkage P1Fixed Constant P2Random ↓↑

17 Adaptive Shrinkage of Model P2 ModelPrior variance Φ | data shrinkage P1Fixed Constant P2Random ↑↓

18 The Problem with Model P2 How sure are we of our parametric specification of the prior? Can we do better by grouping coefficients into clustering and then shrinking the cluster specific coefficients separately? –Amount of shrinkage varies by coefficient

19 Clustering Coefficients

20 3: Dirichlet Process Priors Popular Bayesian non-parametric approach Rather than specifying that β j ~N(μ, ϕ 2 ), we specify β j ~D –D is an unknown distribution –D needs a prior distribution: D~DP(λ,D 0 ) D 0 is a base distribution such as N(μ, ϕ 2 ) λ is a precision parameter. As λ gets large, D converges to D 0

21 Dirichlet Process Prior An extension of the finite mixture model David presented last week: As k becomes infinitely large, this specification becomes equivalent to a DPP

22 Equivalent Representations of DPP Polya Urn Representation: Stick Breaking Representation Where P is the number of coefficients and D 0 ~ N-Inv.Gamma

23 Realizations from a Dirichlet Process λ=1 λ=100 D 0 =N(0,1)

24 Dirichlet Process Prior Discrete nature of DPP implies clustering Probability of clustering increases as λ decreases In this application, we want to cluster coefficients Soft clustering: coefficients are clustered at each iteration of the Gibbs sampler, not assumed to be clustered together with certainty

25 Dirichlet Process Prior prior for β 1 for a given D 0 and β 2 to β 10

26 Posterior Inference for DPP Use Polya Urn Scheme: where

27 Posterior Inference for DPP Coefficients are assigned to clusters based on the weights, w 0j to w op. After assignment, cluster specific coefficients are updated to improve mixing DPP precision parameter can be random as well

28 4: Dirichlet Process Prior with Variable Selection Models Minor modification to Dirichlet process prior model We may desire a more parsimonious model If some DBPs have no effect, would prefer to eliminate them from the model forward/backward selection result in inappropriate confidence intervals

29 Dirichlet Process Prior with Variable Selection Models We incorporate a selection model in the Dirichlet Process’s base distribution: π is the probability that a coefficient has no effect (1- π) is the probability that it is N(μ, ϕ 2 )

30 Dirichlet Process with Variable Selection A coefficient is equal to zero (no effect) with probability π A priori, we expect this to happen (π100)% of the time We place a prior distribution on π to allow the data to guide inference

31 Posterior Inference Gibbs sampling proceeds as in previous model, except weights are modified. Additional weight for null cluster

32 Dirichlet Process Prior with Variable Selection prior for β1 for a given D0 and β2 to β10

33 Simulations Four hierarchical models, how do they compare? The increased complexity of these hierarchical models seems to make sense, but what does it gain us? Simulated datasets of size n=500

34 MSE of Hierarchical Models

35 Example: Spontaneous Abortion and Disinfection By-Products Pregnancy loss prior to 20 weeks of gestation Very common (>30% of all pregnancies) Relatively little known about its causes –maternal age, smoking, prior pregnancy loss, occupational exposures, caffeine –disinfection by-products (DBPs)

36 Disinfection By-Products (DBPs) A vast array of DBPs are formed in the disinfection process We focus on 2 main types: –trihalomethanes (THMs): CHCl 3, CHBr 3, CHCl 2 Br, CHClBr 2 –haloacetic acids (HAAs): ClAA, Cl 2 AA, Cl 3 AA, BrAA, Br 2 AA, Br 3 AA, BrClAA, Br 2 ClAA, BrCl 2 AA

37 Specific Aim To estimate the effect of each of the 13 constituent DBPs (4 THMs and 9 HAAs) on SAB The Problem: DBPs are very highly correlated –for example: ρ=0.91 between Cl 2 AA and Cl 3 AA

38 Right From the Start Enrolled 2507 women from three metropolitan areas in US 2001-2004 Recruitment: –Prenatal care practices (52%) –Health department (32%) –Promotional mailings (3%) –Drug stores, referral, etc (13%)

39 Preliminary Analysis Discrete time hazard model including all 13 DBPs (categorized into 32 coefficients) –time to event: gestational weeks until loss α’s are week specific intercepts (weeks 5…20) z’s are confounders: smoking, alcohol use, ethnicity, maternal age x kij is the concentration of k th category of DBP for the i th individual in the j th week

40 Results of Logistic Regression

41 Several large but imprecise effects are seen 4 of 32 coefficients are statistically significant Imprecision makes us question results –better analytic approach

42 DBPs and SAB: model P1 Little prior evidence of effect: specify μ=0 Calculate ϕ 2 from existing literature largest effect: OR=3.0 ϕ 2 =(ln(3.0)-ln(1/3))/(2 x 1.96)=0.3142

43 Semi-Bayes Results Red=ML Estimates Black= SB Estimates

44 DBPs and SAB: Model P2 μ=0 ϕ 2 is random. Choose α 1 =3.39 α 2 =1.33 –E(ϕ 2 )=0.31 (as in Semi-Bayes analysis) –V(ϕ 2 )=0.07 (at ϕ 2 ’s 95 th percentile, 95% of β’s will fall between OR=6 and OR=1/6…the most extreme we believe to be possible)

45 Fully-Bayes Results Red=ML & semi-Bayes Black=fully-Bayes

46 DBP and SAB: Dirichlet Process Priors μ=0, α 1 =3.39,α 2 = 1.33 ν 1 = 1 ν 2 =1  uninformative choice for λ

47 Dirichlet Process Priors Results

48 DBPs and SAB: Dirichlet Process Priors with Selection Component μ=0, α 1 =3.39,α 2 = 1.33,ν 1 = 1 ν 2 =1 ω 1 =1.5, ω 2 =1.5  E(π)=0.5, 95%CI(0.01, 0.99)

49 Selection Component Results

50 Conclusions (Hierarchical Models) Semi-Bayes: Assumes β random Fully-Bayes: Assumes ϕ 2 random Dirichlet Process: Assumes prior distribution is random Dirichlet Process with Selection Component: Assumes prior distribution is random and allows coefficients to cluster at the null Can improve performance (MSE) with increasing complexity

51 Conclusions (DBPs and SAB) Semi-Bayes models provided the least shrinkage; Dirichlet Process models, the most These results are in contrast to previous research Very little evidence of an effect of any constituent DBP on SAB

52 Future Directions Enormous dimensional data –e.g. SNPs –Cluster effects to reduce dimensions –Algorithmic problems in large datasets retrospective DP


Download ppt "Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November."

Similar presentations


Ads by Google