Bayesian Interval Mapping

Slides:



Advertisements
Similar presentations
MCMC estimation in MlwiN
Advertisements

METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
April 2010UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
Model SelectionSeattle SISG: Yandell © QTL Model Selection 1.Bayesian strategy 2.Markov chain sampling 3.sampling genetic architectures 4.criteria.
Sessão Temática 2 Análise Bayesiana Utilizando a abordagem Bayesiana no mapeamento de QTL´s Roseli Aparecida Leandro ESALQ/USP 11 o SEAGRO / 50ª RBRAS.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Bayes Factor Based on Han and Carlin (2001, JASA).
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
Model Inference and Averaging
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
October 2007Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison
October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling18-25.
Tutorial I: Missing Value Analysis
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
September 2002Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Quantitative Trait Loci using Markov chain Monte Carlo in Experimental Crosses.
April 2008Stat 877 © Brian S. Yandell1 Reversible Jump Details Brian S. Yandell University of Wisconsin-Madison April.
ModelNCSU QTL II: Yandell © Model Selection for Multiple QTL 1.reality of multiple QTL3-8 2.selecting a class of QTL models comparing QTL models16-24.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo in R
Plant Microarray Course
PSY 626: Bayesian Statistics for Psychological Science
Bayesian Semi-Parametric Multiple Shrinkage
MCMC Output & Metropolis-Hastings Algorithm Part I
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Identifying QTLs in experimental crosses
Bayesian data analysis
Model Inference and Averaging
Model Selection for Multiple QTL
Linear and generalized linear mixed effects models
Jun Liu Department of Statistics Stanford University
Bayesian Model Selection for Multiple QTL
Bayesian Inference for QTLs in Inbred Lines I
Gene mapping in mice Karl W Broman Department of Biostatistics
Bayesian inference Presented by Amir Hadadi
PSY 626: Bayesian Statistics for Psychological Science
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison.
Inferring Genetic Architecture of Complex Biological Processes Brian S
Jax Workshop © Brian S. Yandell
Bayesian Inference for QTLs in Inbred Lines
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Bayesian Interval Mapping
Bayesian Model Selection for Multiple QTL
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Seattle SISG: Yandell © 2007
Inferring Genetic Architecture of Complex Biological Processes Brian S
Seattle SISG: Yandell © 2006
Note on Outbred Studies
Brian S. Yandell University of Wisconsin-Madison
PSY 626: Bayesian Statistics for Psychological Science
Robust Full Bayesian Learning for Neural Networks
Seattle SISG: Yandell © 2009
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Bayesian Interval Mapping
Presentation transcript:

Bayesian Interval Mapping Bayesian strategy Markov chain sampling sampling genetic architectures criteria for model selection QTL 2: Bayes Seattle SISG: Yandell © 2010

QTL model selection: key players observed measurements y = phenotypic trait m = markers & linkage map i = individual index (1,…,n) missing data missing marker data q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities  = QT locus (or loci)  = phenotype model parameters  = QTL model/genetic architecture pr(q|m,,) genotype model grounded by linkage map, experimental cross recombination yields multinomial for q given m pr(y|q,,) phenotype model distribution shape (assumed normal here) unknown parameters  (could be non-parametric) m y q    after Sen Churchill (2001) QTL 2: Bayes Seattle SISG: Yandell © 2010

1. Bayesian strategy for QTL study augment data (y,m) with missing genotypes q study unknowns (,,) given augmented data (y,m,q) find better genetic architectures  find most likely genomic regions = QTL =  estimate phenotype parameters = genotype means =  sample from posterior in some clever way multiple imputation (Sen Churchill 2002) Markov chain Monte Carlo (MCMC) (Satagopan et al. 1996; Yi et al. 2005, 2007) QTL 2: Bayes Seattle SISG: Yandell © 2010

Bayes posterior for normal data actual mean actual mean prior mean prior mean small prior variance large prior variance QTL 2: Bayes Seattle SISG: Yandell © 2010

Bayes posterior for normal data model yi =  + ei environment e ~ N( 0, 2 ), 2 known likelihood y ~ N( , 2 ) prior  ~ N( 0, 2 ),  known posterior: mean tends to sample mean single individual  ~ N( 0 + b1(y1 – 0), b12) sample of n individuals shrinkage factor (shrinks to 1) QTL 2: Bayes Seattle SISG: Yandell © 2010

what values are the genotypic means? phenotype model pr(y|q,) data means prior mean data mean posterior means QTL 2: Bayes Seattle SISG: Yandell © 2010

Bayes posterior QTL means posterior centered on sample genotypic mean but shrunken slightly toward overall mean phenotype mean: genotypic prior: posterior: shrinkage: QTL 2: Bayes Seattle SISG: Yandell © 2010

partition genotypic effects on phenotype phenotype depends on genotype genotypic value partitioned into main effects of single QTL epistasis (interaction) between pairs of QTL QTL 2: Bayes Seattle SISG: Yandell © 2010

partitition genotypic variance consider same 2 QTL + epistasis centering variance genotypic variance heritability QTL 2: Bayes Seattle SISG: Yandell © 2010

posterior mean ≈ LS estimate QTL 2: Bayes Seattle SISG: Yandell © 2010

pr(q|m,) recombination model pr(q|m,) = pr(geno | map, locus)  pr(geno | flanking markers, locus) q? markers distance along chromosome QTL 2: Bayes Seattle SISG: Yandell © 2010

Seattle SISG: Yandell © 2010 QTL 2: Bayes Seattle SISG: Yandell © 2010

what are likely QTL genotypes q? how does phenotype y improve guess? what are probabilities for genotype q between markers? recombinants AA:AB all 1:1 if ignore y and if we use y? QTL 2: Bayes Seattle SISG: Yandell © 2010

posterior on QTL genotypes q full conditional of q given data, parameters proportional to prior pr(q | m, ) weight toward q that agrees with flanking markers proportional to likelihood pr(y | q, ) weight toward q with similar phenotype values posterior recombination model balances these two this is the E-step of EM computations QTL 2: Bayes Seattle SISG: Yandell © 2010

Where are the loci  on the genome? prior over genome for QTL positions flat prior = no prior idea of loci or use prior studies to give more weight to some regions posterior depends on QTL genotypes q pr( | m,q) = pr() pr(q | m,) / constant constant determined by averaging over all possible genotypes q over all possible loci  on entire map no easy way to write down posterior QTL 2: Bayes Seattle SISG: Yandell © 2010

what is the genetic architecture ? which positions correspond to QTLs? priors on loci (previous slide) which QTL have main effects? priors for presence/absence of main effects same prior for all QTL can put prior on each d.f. (1 for BC, 2 for F2) which pairs of QTL have epistatic interactions? prior for presence/absence of epistatic pairs depends on whether 0,1,2 QTL have main effects epistatic effects less probable than main effects QTL 2: Bayes Seattle SISG: Yandell © 2010

Seattle SISG: Yandell © 2010  = genetic architecture: loci: main QTL epistatic pairs effects: add, dom aa, ad, dd QTL 2: Bayes Seattle SISG: Yandell © 2010

Bayesian priors & posteriors augmenting with missing genotypes q prior is recombination model posterior is (formally) E step of EM algorithm sampling phenotype model parameters  prior is “flat” normal at grand mean (no information) posterior shrinks genotypic means toward grand mean (details for unexplained variance omitted here) sampling QTL loci  prior is flat across genome (all loci equally likely) sampling QTL genetic architecture model  number of QTL prior is Poisson with mean from previous IM study genetic architecture of main effects and epistatic interactions priors on epistasis depend on presence/absence of main effects QTL 2: Bayes Seattle SISG: Yandell © 2010

Seattle SISG: Yandell © 2010 2. Markov chain sampling construct Markov chain around posterior want posterior as stable distribution of Markov chain in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well sample QTL model components from full conditionals sample locus  given q, (using Metropolis-Hastings step) sample genotypes q given ,,y, (using Gibbs sampler) sample effects  given q,y, (using Gibbs sampler) sample QTL model  given ,,y,q (using Gibbs or M-H) QTL 2: Bayes Seattle SISG: Yandell © 2010

MCMC sampling of unknowns (q,µ,) for given genetic architecture  Gibbs sampler genotypes q effects µ not loci  Metropolis-Hastings sampler extension of Gibbs sampler does not require normalization pr( q | m ) = sum pr( q | m,  ) pr( ) QTL 2: Bayes Seattle SISG: Yandell © 2010

Gibbs sampler for two genotypic means want to study two correlated effects could sample directly from their bivariate distribution assume correlation  is known instead use Gibbs sampler: sample each effect from its full conditional given the other pick order of sampling at random repeat many times QTL 2: Bayes Seattle SISG: Yandell © 2010

Gibbs sampler samples:  = 0.6 N = 50 samples N = 200 samples QTL 2: Bayes Seattle SISG: Yandell © 2010

full conditional for locus cannot easily sample from locus full conditional pr( |y,m,µ,q) = pr(  | m,q) = pr( q | m,  ) pr( ) / constant constant is very difficult to compute explicitly must average over all possible loci  over genome must do this for every possible genotype q Gibbs sampler will not work in general but can use method based on ratios of probabilities Metropolis-Hastings is extension of Gibbs sampler QTL 2: Bayes Seattle SISG: Yandell © 2010

Metropolis-Hastings idea f() want to study distribution f() take Monte Carlo samples unless too complicated take samples using ratios of f Metropolis-Hastings samples: propose new value * near (?) current value  from some distribution g accept new value with prob a Gibbs sampler: a = 1 always g(–*) QTL 2: Bayes Seattle SISG: Yandell © 2010

Metropolis-Hastings for locus  added twist: occasionally propose from entire genome QTL 2: Bayes Seattle SISG: Yandell © 2010

Metropolis-Hastings samples N = 200 samples N = 1000 samples narrow g wide g narrow g wide g     histogram histogram histogram histogram QTL 2: Bayes Seattle SISG: Yandell © 2010

3. sampling genetic architectures search across genetic architectures  of various sizes allow change in number of QTL allow change in types of epistatic interactions methods for search reversible jump MCMC Gibbs sampler with loci indicators complexity of epistasis Fisher-Cockerham effects model general multi-QTL interaction & limits of inference QTL 2: Bayes Seattle SISG: Yandell © 2010

Seattle SISG: Yandell © 2010 reversible jump MCMC consider known genotypes q at 2 known loci  models with 1 or 2 QTL M-H step between 1-QTL and 2-QTL models model changes dimension (via careful bookkeeping) consider mixture over QTL models H QTL 2: Bayes Seattle SISG: Yandell © 2010

geometry of reversible jump 2 2 1 1 QTL 2: Bayes Seattle SISG: Yandell © 2010

geometry allowing q and  to change 2 2 1 1 QTL 2: Bayes Seattle SISG: Yandell © 2010

collinear QTL = correlated effects linked QTL = collinear genotypes correlated estimates of effects (negative if in coupling phase) sum of linked effects usually fairly constant QTL 2: Bayes Seattle SISG: Yandell © 2010

sampling across QTL models  1 m+1 2 … m L action steps: draw one of three choices update QTL model  with probability 1-b()-d() update current model using full conditionals sample QTL loci, effects, and genotypes add a locus with probability b() propose a new locus along genome innovate new genotypes at locus and phenotype effect decide whether to accept the “birth” of new locus drop a locus with probability d() propose dropping one of existing loci decide whether to accept the “death” of locus QTL 2: Bayes Seattle SISG: Yandell © 2010

Gibbs sampler with loci indicators consider only QTL at pseudomarkers every 1-2 cM modest approximation with little bias use loci indicators in each pseudomarker  = 1 if QTL present  = 0 if no QTL present Gibbs sampler on loci indicators  relatively easy to incorporate epistasis Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005 Genetics) (see earlier work of Nengjun Yi and Ina Hoeschele) QTL 2: Bayes Seattle SISG: Yandell © 2010

Bayesian shrinkage estimation soft loci indicators strength of evidence for j depends on  0    1 (grey scale) shrink most s to zero Wang et al. (2005 Genetics) Shizhong Xu group at U CA Riverside QTL 2: Bayes Seattle SISG: Yandell © 2010

other model selection approaches include all potential loci in model assume “true” model is “sparse” in some sense Sparse partial least squares Chun, Keles (2009 Genetics; 2010 JRSSB) LASSO model selection Foster (2006); Foster Verbyla Pitchford (2007 JABES) Xu (2007 Biometrics); Yi Xu (2007 Genetics) Shi Wahba Wright Klein Klein (2008 Stat & Infer) QTL 2: Bayes Seattle SISG: Yandell © 2010

4. criteria for model selection balance fit against complexity classical information criteria penalize likelihood L by model size || IC = – 2 log L( | y) + penalty() maximize over unknowns Bayes factors marginal posteriors pr(y | ) average over unknowns QTL 2: Bayes Seattle SISG: Yandell © 2010

classical information criteria start with likelihood L( | y, m) measures fit of architecture () to phenotype (y) given marker data (m) genetic architecture () depends on parameters have to estimate loci (µ) and effects () complexity related to number of parameters | | = size of genetic architecture BC: | | = 1 + n.qtl + n.qtl(n.qtl - 1) = 1 + 4 + 12 = 17 F2: | | = 1 + 2n.qtl +4n.qtl(n.qtl - 1) = 1 + 8 + 48 = 57 QTL 2: Bayes Seattle SISG: Yandell © 2010

classical information criteria construct information criteria balance fit to complexity Akaike AIC = –2 log(L) + 2 || Bayes/Schwartz BIC = –2 log(L) + || log(n) Broman BIC = –2 log(L) +  || log(n) general form: IC = –2 log(L) + || D(n) compare models hypothesis testing: designed for one comparison 2 log[LR(1, 2)] = L(y|m, 2) – L(y|m, 1) model selection: penalize complexity IC(1, 2) = 2 log[LR(1, 2)] + (|2| – |1|) D(n) QTL 2: Bayes Seattle SISG: Yandell © 2010

information criteria vs. model size WinQTL 2.0 SCD data on F2 A=AIC 1=BIC(1) 2=BIC(2) d=BIC() models 1,2,3,4 QTL 2+5+9+2 epistasis 2:2 AD epistasis QTL 2: Bayes Seattle SISG: Yandell © 2010

Seattle SISG: Yandell © 2010 Bayes factors ratio of model likelihoods ratio of posterior to prior odds for architectures averaged over unknowns roughly equivalent to BIC BIC maximizes over unknowns BF averages over unknowns QTL 2: Bayes Seattle SISG: Yandell © 2010

scan of marginal Bayes factor & effect QTL 2: Bayes Seattle SISG: Yandell © 2010

issues in computing Bayes factors BF insensitive to shape of prior on  geometric, Poisson, uniform precision improves when prior mimics posterior BF sensitivity to prior variance on effects  prior variance should reflect data variability resolved by using hyper-priors automatic algorithm; no need for user tuning easy to compute Bayes factors from samples sample posterior using MCMC posterior pr( | y, m) is marginal histogram QTL 2: Bayes Seattle SISG: Yandell © 2010

Bayes factors & genetic architecture  | | = number of QTL prior pr() chosen by user posterior pr( |y,m) sampled marginal histogram shape affected by prior pr(A) pattern of QTL across genome gene action and epistasis QTL 2: Bayes Seattle SISG: Yandell © 2010

BF sensitivity to fixed prior for effects QTL 2: Bayes Seattle SISG: Yandell © 2010

BF insensitivity to random effects prior QTL 2: Bayes Seattle SISG: Yandell © 2010