Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November.

Slides:



Advertisements
Similar presentations
Bayes rule, priors and maximum a posteriori
Advertisements

Sinead Williamson, Chong Wang, Katherine A. Heller, David M. Blei
Qualitative predictor variables
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Hierarchical Dirichlet Processes
Bayesian dynamic modeling of latent trait distributions Duke University Machine Learning Group Presented by Kai Ni Jan. 25, 2007 Paper by David B. Dunson,
Introduction to Regression with Measurement Error STA431: Spring 2015.
Probability and Statistics Basic concepts II (from a physicist point of view) Benoit CLEMENT – Université J. Fourier / LPSC
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Fast Bayesian Matching Pursuit Presenter: Changchun Zhang ECE / CMR Tennessee Technological University November 12, 2010 Reading Group (Authors: Philip.
HW 4. Nonparametric Bayesian Models Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
2005 Hopkins Epi-Biostat Summer Institute1 Module 2: Bayesian Hierarchical Models Francesca Dominici Michael Griswold The Johns Hopkins University Bloomberg.
Nicky Best and Chris Jackson With Sylvia Richardson Department of Epidemiology and Public Health Imperial College, London
What is Statistical Modeling
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Confidence Intervals © Scott Evans, Ph.D..
Making rating curves - the Bayesian approach. Rating curves – what is wanted? A best estimate of the relationship between stage and discharge at a given.
A Discussion of the Bayesian Approach Reference: Chapter 1 and notes from Dr. David Madigan.
Machine Learning CMPT 726 Simon Fraser University
. PGM: Tirgul 10 Parameter Learning and Priors. 2 Why learning? Knowledge acquisition bottleneck u Knowledge acquisition is an expensive process u Often.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Introduction to Regression with Measurement Error STA431: Spring 2013.
Correlation and Regression Analysis
Lecture 9: p-value functions and intro to Bayesian thinking Matthew Fox Advanced Epidemiology.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
1 Bayesian methods for parameter estimation and data assimilation with crop models Part 2: Likelihood function and prior distribution David Makowski and.
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
Simple Linear Regression
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
University of Ottawa - Bio 4118 – Applied Biostatistics © Antoine Morin and Scott Findlay 08/10/ :23 PM 1 Some basic statistical concepts, statistics.
Applied Bayesian Inference, KSU, April 29, 2012 § ❷ / §❷ An Introduction to Bayesian inference Robert J. Tempelman 1.
Inferring structure from data Tom Griffiths Department of Psychology Program in Cognitive Science University of California, Berkeley.
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
Nuoo-Ting (Jassy) Molitor 1 Chris Jackson 2 With Nicky Best, Sylvia Richardson 1 1 Department of Epidemiology and Public Health Imperial College, London.
HSRP 734: Advanced Statistical Methods July 17, 2008.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
STA 216 Generalized Linear Models Meets: 2:50-4:05 T/TH (Old Chem 025) Instructor: David Dunson 219A Old Chemistry, Teaching.
Bayesian Multivariate Logistic Regression by Sean O’Brien and David Dunson (Biometrics, 2004 ) Presented by Lihan He ECE, Duke University May 16, 2008.
1 Dirichlet Process Mixtures A gentle tutorial Graphical Models – Khalid El-Arini Carnegie Mellon University November 6 th, 2006 TexPoint fonts used.
1 Clustering in Generalized Linear Mixed Model Using Dirichlet Process Mixtures Ya Xue Xuejun Liao April 1, 2005.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Bayesian Density Regression Author: David B. Dunson and Natesh Pillai Presenter: Ya Xue April 28, 2006.
Nonparametric Bayesian Models. HW 4 x x Parametric Model Fixed number of parameters that is independent of the data we’re fitting.
The Nested Dirichlet Process Duke University Machine Learning Group Presented by Kai Ni Nov. 10, 2006 Paper by Abel Rodriguez, David B. Dunson, and Alan.
Bayesian Semi-Parametric Multiple Shrinkage
BINARY LOGISTIC REGRESSION
Module 2: Bayesian Hierarchical Models
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
Bayesian Generalized Product Partition Model
Non-Parametric Models
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
STA 216 Generalized Linear Models
Omiros Papaspiliopoulos and Gareth O. Roberts
Kernel Stick-Breaking Process
STA 216 Generalized Linear Models
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
دانشگاه صنعتی امیرکبیر Instructor : Saeed Shiry
Mark Rothmann U.S. Food and Drug Administration September 14, 2018
More about Posterior Distributions
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Linear Model Selection and regularization
Task 6 Statistical Approaches
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

Bayesian Parametric and Semi- Parametric Hierarchical models: An application to Disinfection By-Products and Spontaneous Abortion: Rich MacLehose November 9 th, 2006

Outline 1.Brief introduction to hierarchical models 2.Introduce 2 ‘standard’ parametric models 3.Extend them with 2 semi-parametric models 4.Applied example of Disinfection by- products and spontaneous abortion

Bayesian Hierarchical Models Natural way to model many applied problems μ i are assumed exchangeable G may depend on further coefficients that extend the hierarchy

Bayesian Hierarchical Models Frequently, researchers will be interested in estimating a large number of coefficients, possibly with a large amount of correlation between predictors. Hierarchical models may be particularly useful by allowing ‘borrowing’ of information between groups.

Bayesian Hierarchical Models For example, we wish to estimate the effect of 13 chemicals on early pregnancy loss. The chemicals are highly correlated, making standard frequentist methods unstable.

Hierarchical Regression Models Traditional models treat the outcome as random Hierarchical models also treat coefficients as random

Some Parametric and Semi- Parametric Bayesian Hierarchical Models 1.Simplest hierarchical model (P1) 2.Fully Bayesian hierarchical model (P2) 3.Dirichlet process prior 4.Dirichlet process prior with selection component

1: The first parametric model (P1) A simple “one-level” hierarchical model Popularized by Greenland in Epidemiology –he refers to it as “semi-Bayes” –may refer to asymptotic methods commonly used in fitting semi-Bayes models Has seen use in nutritional, genetic, occupational, and cancer research

Hierarchical Models: Bayes and Shrinkage μ is our prior belief about the size of the effect and ϕ 2 is uncertainty in it Effect estimates from hierarchical models are ‘shrunk’ (moved) toward the prior distribution Shrinkage: –for Bayesian: natural consequence of combining prior with data –for frequentist: introduce bias to reduce MSE (biased but more precise) Amount of shrinkage depends on prior variance

Hierarchical Models: Bayes and Shrinkage In the simple Normal-Normal setting, So the posterior is: Where: And I is the pxp identity matrix yi may be either a continuous response or an imputed latent response (via Albert and Chib)

Model P1: Shrinkage SB model: μ=0 and ϕ 2 =2.0, 1.0, and 0.5

The problem with model P1 Assumes the prior variance is known with certainty –constant shrinkage of all coefficients Sensitivity analyses address changes to results with different prior variances Data contain information on prior variance

2: A richer parametric model (P2) Places prior distribution on ϕ 2 –reduces dependence on prior variance Could place prior on μ as well (in some situations)

Properties of model P2 Prior distribution on ϕ 2 allows it to be updated by the data As variability of estimates from prior mean increases, so does ϕ 2 As variability of estimates from prior mean decreases, so does ϕ 2 Adaptive shrinkage of all coefficients

Posterior Sampling for Model P2 In the simple Normal-Normal setting, So the conditional posteriors are: Where:

Adaptive Shrinkage of model P2 ModelPrior variance Φ | data shrinkage P1Fixed Constant P2Random ↓↑

Adaptive Shrinkage of Model P2 ModelPrior variance Φ | data shrinkage P1Fixed Constant P2Random ↑↓

The Problem with Model P2 How sure are we of our parametric specification of the prior? Can we do better by grouping coefficients into clustering and then shrinking the cluster specific coefficients separately? –Amount of shrinkage varies by coefficient

Clustering Coefficients

3: Dirichlet Process Priors Popular Bayesian non-parametric approach Rather than specifying that β j ~N(μ, ϕ 2 ), we specify β j ~D –D is an unknown distribution –D needs a prior distribution: D~DP(λ,D 0 ) D 0 is a base distribution such as N(μ, ϕ 2 ) λ is a precision parameter. As λ gets large, D converges to D 0

Dirichlet Process Prior An extension of the finite mixture model David presented last week: As k becomes infinitely large, this specification becomes equivalent to a DPP

Equivalent Representations of DPP Polya Urn Representation: Stick Breaking Representation Where P is the number of coefficients and D 0 ~ N-Inv.Gamma

Realizations from a Dirichlet Process λ=1 λ=100 D 0 =N(0,1)

Dirichlet Process Prior Discrete nature of DPP implies clustering Probability of clustering increases as λ decreases In this application, we want to cluster coefficients Soft clustering: coefficients are clustered at each iteration of the Gibbs sampler, not assumed to be clustered together with certainty

Dirichlet Process Prior prior for β 1 for a given D 0 and β 2 to β 10

Posterior Inference for DPP Use Polya Urn Scheme: where

Posterior Inference for DPP Coefficients are assigned to clusters based on the weights, w 0j to w op. After assignment, cluster specific coefficients are updated to improve mixing DPP precision parameter can be random as well

4: Dirichlet Process Prior with Variable Selection Models Minor modification to Dirichlet process prior model We may desire a more parsimonious model If some DBPs have no effect, would prefer to eliminate them from the model forward/backward selection result in inappropriate confidence intervals

Dirichlet Process Prior with Variable Selection Models We incorporate a selection model in the Dirichlet Process’s base distribution: π is the probability that a coefficient has no effect (1- π) is the probability that it is N(μ, ϕ 2 )

Dirichlet Process with Variable Selection A coefficient is equal to zero (no effect) with probability π A priori, we expect this to happen (π100)% of the time We place a prior distribution on π to allow the data to guide inference

Posterior Inference Gibbs sampling proceeds as in previous model, except weights are modified. Additional weight for null cluster

Dirichlet Process Prior with Variable Selection prior for β1 for a given D0 and β2 to β10

Simulations Four hierarchical models, how do they compare? The increased complexity of these hierarchical models seems to make sense, but what does it gain us? Simulated datasets of size n=500

MSE of Hierarchical Models

Example: Spontaneous Abortion and Disinfection By-Products Pregnancy loss prior to 20 weeks of gestation Very common (>30% of all pregnancies) Relatively little known about its causes –maternal age, smoking, prior pregnancy loss, occupational exposures, caffeine –disinfection by-products (DBPs)

Disinfection By-Products (DBPs) A vast array of DBPs are formed in the disinfection process We focus on 2 main types: –trihalomethanes (THMs): CHCl 3, CHBr 3, CHCl 2 Br, CHClBr 2 –haloacetic acids (HAAs): ClAA, Cl 2 AA, Cl 3 AA, BrAA, Br 2 AA, Br 3 AA, BrClAA, Br 2 ClAA, BrCl 2 AA

Specific Aim To estimate the effect of each of the 13 constituent DBPs (4 THMs and 9 HAAs) on SAB The Problem: DBPs are very highly correlated –for example: ρ=0.91 between Cl 2 AA and Cl 3 AA

Right From the Start Enrolled 2507 women from three metropolitan areas in US Recruitment: –Prenatal care practices (52%) –Health department (32%) –Promotional mailings (3%) –Drug stores, referral, etc (13%)

Preliminary Analysis Discrete time hazard model including all 13 DBPs (categorized into 32 coefficients) –time to event: gestational weeks until loss α’s are week specific intercepts (weeks 5…20) z’s are confounders: smoking, alcohol use, ethnicity, maternal age x kij is the concentration of k th category of DBP for the i th individual in the j th week

Results of Logistic Regression

Several large but imprecise effects are seen 4 of 32 coefficients are statistically significant Imprecision makes us question results –better analytic approach

DBPs and SAB: model P1 Little prior evidence of effect: specify μ=0 Calculate ϕ 2 from existing literature largest effect: OR=3.0 ϕ 2 =(ln(3.0)-ln(1/3))/(2 x 1.96)=0.3142

Semi-Bayes Results Red=ML Estimates Black= SB Estimates

DBPs and SAB: Model P2 μ=0 ϕ 2 is random. Choose α 1 =3.39 α 2 =1.33 –E(ϕ 2 )=0.31 (as in Semi-Bayes analysis) –V(ϕ 2 )=0.07 (at ϕ 2 ’s 95 th percentile, 95% of β’s will fall between OR=6 and OR=1/6…the most extreme we believe to be possible)

Fully-Bayes Results Red=ML & semi-Bayes Black=fully-Bayes

DBP and SAB: Dirichlet Process Priors μ=0, α 1 =3.39,α 2 = 1.33 ν 1 = 1 ν 2 =1  uninformative choice for λ

Dirichlet Process Priors Results

DBPs and SAB: Dirichlet Process Priors with Selection Component μ=0, α 1 =3.39,α 2 = 1.33,ν 1 = 1 ν 2 =1 ω 1 =1.5, ω 2 =1.5  E(π)=0.5, 95%CI(0.01, 0.99)

Selection Component Results

Conclusions (Hierarchical Models) Semi-Bayes: Assumes β random Fully-Bayes: Assumes ϕ 2 random Dirichlet Process: Assumes prior distribution is random Dirichlet Process with Selection Component: Assumes prior distribution is random and allows coefficients to cluster at the null Can improve performance (MSE) with increasing complexity

Conclusions (DBPs and SAB) Semi-Bayes models provided the least shrinkage; Dirichlet Process models, the most These results are in contrast to previous research Very little evidence of an effect of any constituent DBP on SAB

Future Directions Enormous dimensional data –e.g. SNPs –Cluster effects to reduce dimensions –Algorithmic problems in large datasets retrospective DP