Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
METHODS FOR HAPLOTYPE RECONSTRUCTION
Advertisements

1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
April 2010UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
Model SelectionSeattle SISG: Yandell © QTL Model Selection 1.Bayesian strategy 2.Markov chain sampling 3.sampling genetic architectures 4.criteria.
Sessão Temática 2 Análise Bayesiana Utilizando a abordagem Bayesiana no mapeamento de QTL´s Roseli Aparecida Leandro ESALQ/USP 11 o SEAGRO / 50ª RBRAS.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Quantitative Genetics
Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.
April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
QTL 2: OverviewSeattle SISG: Yandell © Seattle Summer Institute 2010 Advanced QTL Brian S. Yandell, UW-Madison
October 2001Jackson Labs Workshop © Brian S. Yandell 1 Efficient and Robust Statistical Methods for Quantitative Trait Loci Analysis Brian S. Yandell University.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
October 2007Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison
October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.
June 1999NCSU QTL Workshop © Brian S. Yandell 1 Bayesian Inference for QTLs in Inbred Lines II Brian S Yandell University of Wisconsin-Madison
BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling18-25.
Yandell © June Bayesian analysis of microarray traits Arabidopsis Microarray Workshop Brian S. Yandell University of Wisconsin-Madison
Tutorial I: Missing Value Analysis
Lecture 22: Quantitative Traits II
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Yandell © 2003Wisconsin Symposium III1 QTLs & Microarrays Overview Brian S. Yandell University of Wisconsin-Madison
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,
June 1999NCSU QTL Workshop © Brian S. Yandell 1 Bayesian Inference for QTLs in Inbred Lines II Brian S Yandell University of Wisconsin-Madison
September 2002Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Quantitative Trait Loci using Markov chain Monte Carlo in Experimental Crosses.
NSF StatGen: Yandell © NSF StatGen 2009 Bayesian Interval Mapping Brian S. Yandell, UW-Madison overview: multiple.
April 2008Stat 877 © Brian S. Yandell1 Reversible Jump Details Brian S. Yandell University of Wisconsin-Madison April.
ModelNCSU QTL II: Yandell © Model Selection for Multiple QTL 1.reality of multiple QTL3-8 2.selecting a class of QTL models comparing QTL models16-24.
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.
Plant Microarray Course
Bayesian Semi-Parametric Multiple Shrinkage
MCMC Output & Metropolis-Hastings Algorithm Part I
Identifying QTLs in experimental crosses
upstream vs. ORF binding and gene expression?
Model Selection for Multiple QTL
Genome Wide Association Studies using SNP
Jun Liu Department of Statistics Stanford University
Bayesian Model Selection for Multiple QTL
Bayesian Inference for QTLs in Inbred Lines I
Graphical Diagnostics for Multiple QTL Investigation Brian S
Gene mapping in mice Karl W Broman Department of Biostatistics
Inferring Genetic Architecture of Complex Biological Processes BioPharmaceutical Technology Center Institute (BTCI) Brian S. Yandell University of Wisconsin-Madison.
Inferring Genetic Architecture of Complex Biological Processes Brian S
Jax Workshop © Brian S. Yandell
Bayesian Inference for QTLs in Inbred Lines
High Throughput Gene Mapping Brian S. Yandell
Bayesian Interval Mapping
Brian S. Yandell University of Wisconsin-Madison 26 February 2003
Bayesian Model Selection for Multiple QTL
Genome-wide Association Studies
Bayesian Interval Mapping
Ch13 Empirical Methods.
Seattle SISG: Yandell © 2007
examples in detail days to flower for Brassica napus (plant) (n = 108)
Inferring Genetic Architecture of Complex Biological Processes Brian S
Seattle SISG: Yandell © 2006
Note on Outbred Studies
Brian S. Yandell University of Wisconsin-Madison
Robust Full Bayesian Learning for Neural Networks
Seattle SISG: Yandell © 2009
Lecture 9: QTL Mapping II: Outbred Populations
Chapter 7 Beyond alleles: Quantitative Genetics
Bayesian Interval Mapping
Quantitative Trait Locus (QTL) Mapping
Presentation transcript:

Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell/statgen↑ Jackson Laboratory, September 2004 September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell outline What is the goal of QTL study? Bayesian priors & posteriors Model search using MCMC Gibbs sampler and Metropolis-Hastings Reversible jump MCMC Fully MCMC approach (loci indicators) Model assessment Bayes factors model selection diagnostics simulation and Brassica napus example September 2004 Jax Workshop © Brian S. Yandell

1. what is the goal of QTL study? uncover underlying biochemistry identify how networks function, break down find useful candidates for (medical) intervention epistasis may play key role statistical goal: maximize number of correctly identified QTL basic science/evolution how is the genome organized? identify units of natural selection additive effects may be most important (Wright/Fisher debate) select “elite” individuals predict phenotype (breeding value) using suite of characteristics (phenotypes) translated into a few QTL statistical goal: mimimize prediction error September 2004 Jax Workshop © Brian S. Yandell

advantages of multiple QTL approach improve statistical power, precision increase number of QTL that can be detected better estimates of loci and effects: less bias, smaller intervals improve inference of complex genetic architecture infer number of QTL and their pattern across chromosomes construct “good” estimates of effects gene action (additive, dominance) and epistatic interactions assess relative contributions of different QTL improve estimates of genotypic values want less bias (more accurate) and smaller variance (more precise) balance in mean squared error = MSE = (bias)2 + variance always a compromise… September 2004 Jax Workshop © Brian S. Yandell

why worry about multiple QTL? many, many QTL may affect most any trait how many QTL are detectable with these data? limits to useful detection (Bernardo 2000) depends on sample size, heritability, environmental variation consider probability that a QTL is in the model avoid sharp in/out dichotomy major QTL usually selected, minor QTL sampled infrequently build M = model = genetic architecture into model M = {loci 1,2,…,m, plus interactions 12,13,…} directly allow uncertainty in genetic architecture model selection over number of QTL, genetic architecture use Bayes factors and model averaging to identify “better” models September 2004 Jax Workshop © Brian S. Yandell

Pareto diagram of QTL effects major QTL on linkage map minor QTL major QTL 3 polygenes 2 4 5 1 September 2004 Jax Workshop © Brian S. Yandell

interval mapping basics observed measurements Y = phenotypic trait X = markers & linkage map i = individual index 1,…,n missing data missing marker data Q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities  = QT locus (or loci)  = phenotype model parameters m = number of QTL pr(Q|X,,m) genotype model grounded by linkage map, experimental cross recombination yields multinomial for Q given X pr(Y|Q,,m) phenotype model distribution shape (assumed normal here) unknown parameters  (could be non-parametric) after Sen Churchill (2001) September 2004 Jax Workshop © Brian S. Yandell

2. Bayesian priors for QTL genomic region = locus  may be uniform over genome pr( | X ) = 1 / length of genome or may be restricted based on prior studies missing genotypes Q depends on marker map and locus for QTL pr( Q | X,  ) genotype (recombination) model is formally a prior genotypic means and variance  = ( Gq, 2 ) pr(  ) = pr( Gq | 2 ) pr(2 ) use conjugate priors for normal phenotype pr( Gq | 2 ) = normal pr(2 ) = inverse chi-square September 2004 Jax Workshop © Brian S. Yandell

Bayesian model posterior augment data (Y,X) with unknowns Q study unknowns (,,Q) given data (Y,X) properties of posterior pr(,,Q | Y, X ) sample from posterior in some clever way multiple imputation or MCMC September 2004 Jax Workshop © Brian S. Yandell

genotype prior model: pr(Q|X,) locus  is distance along linkage map map assumed known from earlier study  identifies flanking marker interval use flanking markers to approximate prior on Q slight inaccuracy by ignoring multipoint map function use next flanking markers if missing data pr(Q|X,) = pr(geno | map, locus)  pr(geno | flanking markers, locus) Q? September 2004 Jax Workshop © Brian S. Yandell

how does phenotype Y improve posterior for genotype Q? what are probabilities for genotype Q between markers? recombinants AA:AB all 1:1 if ignore Y and if we use Y? September 2004 Jax Workshop © Brian S. Yandell

posterior on QTL genotypes full conditional of Q given data, parameters proportional to prior pr(Q | Xi,  ) weight toward Q that agrees with flanking markers proportional to likelihood pr(Yi|Q,) weight toward Q so that group mean GQ  Yi phenotype and flanking markers may conflict posterior recombination balances these two weightsE E step of EM algorithm September 2004 Jax Workshop © Brian S. Yandell

idealized phenotype model trait = mean + additive + error trait = effect_of_geno + error pr( trait | geno, effects ) September 2004 Jax Workshop © Brian S. Yandell

priors & posteriors: normal data large prior variance small prior variance September 2004 Jax Workshop © Brian S. Yandell

priors & posteriors: normal data model Yi =  + Ei environment E ~ N( 0, 2 ), 2 known likelihood Y ~ N( , 2 ) prior  ~ N( 0, 2 ),  known posterior: mean tends to sample mean single individual  ~ N( 0 + B1(Y1 – 0), B12) sample of n individuals fudge factor (shrinks to 1) September 2004 Jax Workshop © Brian S. Yandell

prior & posteriors: genotypic means Gq September 2004 Jax Workshop © Brian S. Yandell

prior & posteriors: genotypic means Gq posterior centered on sample genotypic mean but shrunken slightly toward overall mean  is related to heritability prior: posterior: fudge factor: September 2004 Jax Workshop © Brian S. Yandell

What if variance 2 is unknown? sample variance is proportional to chi-square ns2 / 2 ~ 2 ( n ) likelihood of sample variance s2 given n, 2 conjugate prior is inverse chi-square 2 / 2 ~ 2 (  ) prior of population variance 2 given , 2 posterior is weighted average of likelihood and prior (2+ns2) / 2 ~ 2 ( +n ) posterior of population variance 2 given n, s2, , 2 empirical choice of hyper-parameters 2= s2/3, =6 E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4 September 2004 Jax Workshop © Brian S. Yandell

multiple QTL phenotype model phenotype affected by genotype & environment pr(Y|Q=q,) ~ N(Gq , 2) Y = GQ + environment partition genotypic mean into QTL effects Gq =  + 1q + . . . + mq + 12q + . . . Gq = mean + main effects + epistatic interactions general form of QTL effects for model M Gq =  + sumj in M jq |M| = number of terms in model M < 2m can partition prior and posterior into effects jq (details omitted) September 2004 Jax Workshop © Brian S. Yandell

prior & posterior on number of QTL what prior on number of QTL? uniform over some range Poisson with prior mean geometric with prior mean prior influences posterior good: reflects prior belief push data in discovery process bad: skeptic revolts! “answer” depends on “guess” September 2004 Jax Workshop © Brian S. Yandell

3. QTL Model Search using MCMC construct Markov chain around posterior want posterior as stable distribution of Markov chain in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well update m-QTL model components from full conditionals update locus  given Q,X (using Metropolis-Hastings step) update genotypes Q given ,,Y,X (using Gibbs sampler) update effects  given Q,Y (using Gibbs sampler) September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell Gibbs sampler idea two correlated normals (genotypic means in BC) could draw samples from both together but easier to sample one at a time Gibbs sampler: sample each from its full conditional pick order of sampling at random repeat N times September 2004 Jax Workshop © Brian S. Yandell

Gibbs sampler samples:  = 0.6 N = 50 samples N = 200 samples September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell How to sample a locus ? cannot easily sample from locus full conditional pr( |Y,X,,Q) = pr(  | X,Q) = pr( ) pr( Q | X,  ) / constant to explicitly determine constant, must average over all possible genotypes over entire map Gibbs sampler will not work in general but can use method based on ratios of probabilities Metropolis-Hastings is extension of Gibbs sampler September 2004 Jax Workshop © Brian S. Yandell

Metropolis-Hastings idea f() want to study distribution f() take Monte Carlo samples unless too complicated Metropolis-Hastings samples: current sample value  propose new value * from some distribution g(,*) Gibbs sampler: g(,*) = f(*) accept new value with prob A Gibbs sampler: A = 1 g(–*) September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell MCMC realization added twist: occasionally propose from whole domain September 2004 Jax Workshop © Brian S. Yandell

Metropolis-Hastings samples N = 200 samples N = 1000 samples narrow g wide g narrow g wide g September 2004 Jax Workshop © Brian S. Yandell

sampling multiple loci 1 m+1 2 … m L action steps: draw one of three choices update m-QTL model with probability 1-b(m+1)-d(m) update current model using full conditionals sample m QTL loci, effects, and genotypes add a locus with probability b(m+1) propose a new locus along genome innovate new genotypes at locus and phenotype effect decide whether to accept the “birth” of new locus drop a locus with probability d(m) propose dropping one of existing loci decide whether to accept the “death” of locus September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell reversible jump MCMC consider known genotypes Q at 2 known loci  models with 1 or 2 QTL jump between 1-QTL and 2-QTL models adjust parameters when model changes  and 1 differ due to collinearity of QTL genotypes September 2004 Jax Workshop © Brian S. Yandell

geometry of reversible jump 2 2 1 1 September 2004 Jax Workshop © Brian S. Yandell

geometry allowing Q and  to change 2 2 1 1 September 2004 Jax Workshop © Brian S. Yandell

Gibbs sampler with loci indicators partition genome into intervals at most one QTL per interval interval = marker interval or large chromosome region use loci indicators in each interval  = 1 if QTL in interval  = 0 if no QTL Gibbs sampler on loci indicators still need to adjust genetic effects for collinearity of Q see work of Nengjun Yi (and earlier work of Ina Hoeschele) September 2004 Jax Workshop © Brian S. Yandell

epistatic interactions model space issues 2-QTL interactions only? Fisher-Cockerham partition vs. tree-structured? general interactions among multiple QTL model search issues epistasis between significant QTL check all possible pairs when QTL included? allow higher order epistasis? epistasis with non-significant QTL whole genome paired with each significant QTL? pairs of non-significant QTL? Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi (2004) September 2004 Jax Workshop © Brian S. Yandell

limits of epistatic inference power to detect effects epistatic model size grows exponentially |M| = 3m for general interactions power depends on ratio of n to model size want n / |M| to be fairly large (say > 5) n = 100, m = 3, n / |M| ≈ 4 empty cells mess up adjusted (Type 3) tests missing q1Q2 / q1Q2 or q1Q2q3 / q1Q2q3 genotype null hypotheses not what you would expect can confound main effects and interactions can bias AA, AD, DA, DD partition September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell 4. Model Assessment balance model fit against model complexity smaller model bigger model model fit miss key features fits better prediction may be biased no bias interpretation easier more complicated parameters low variance high variance information criteria: penalize L by model size |M| compare IC = – 2 log L( M | Y ) + penalty( M ) Bayes factors: balance posterior by prior choice compare pr( data Y | model M ) September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell QTL Bayes factors BF = posterior odds / prior odds BF equivalent to BIC simple comparison: 1 vs 2 QTL same as LOD test general comparison of models want Bayes factor >> 1 m = number of QTL indexes model complexity genetic architecture also important September 2004 Jax Workshop © Brian S. Yandell

Bayes factors to assess models Bayes factor: which model best supports the data? ratio of posterior odds to prior odds ratio of model likelihoods equivalent to LR statistic when comparing two nested models simple hypotheses (e.g. 1 vs 2 QTL) Bayes Information Criteria (BIC) Schwartz introduced for model selection in general settings penalty to balance model size (p = number of parameters) September 2004 Jax Workshop © Brian S. Yandell

BF sensitivity to fixed prior for effects September 2004 Jax Workshop © Brian S. Yandell

BF insensitivity to random effects prior September 2004 Jax Workshop © Brian S. Yandell

simulations and data studies simulated F2 intercross, 8 QTL (Stephens, Fisch 1998) n=200, heritability = 50% detected 3 QTL increase to detect all 8 n=500, heritability to 97% QTL chr loci effect 1 1 11 –3 2 1 50 –5 3 3 62 +2 4 6 107 –3 5 6 152 +3 6 8 32 –4 7 8 54 +1 8 9 195 +2 posterior September 2004 Jax Workshop © Brian S. Yandell

loci pattern across genome notice which chromosomes have persistent loci best pattern found 42% of the time Chromosome m 1 2 3 4 5 6 7 8 9 10 Count of 8000 8 2 0 1 0 0 2 0 2 1 0 3371 9 3 0 1 0 0 2 0 2 1 0 751 7 2 0 1 0 0 2 0 1 1 0 377 9 2 0 1 0 0 2 0 2 1 0 218 9 2 0 1 0 0 3 0 2 1 0 218 9 2 0 1 0 0 2 0 2 2 0 198 September 2004 Jax Workshop © Brian S. Yandell

B. napus 8-week vernalization whole genome study 108 plants from double haploid similar genetics to backcross: follow 1 gamete parents are Major (biennial) and Stellar (annual) 300 markers across genome 19 chromosomes average 6cM between markers median 3.8cM, max 34cM 83% markers genotyped phenotype is days to flowering after 8 weeks of vernalization (cooling) Stellar parent requires vernalization to flower available in R/bim package Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002) September 2004 Jax Workshop © Brian S. Yandell

Markov chain Monte Carlo sequence burnin (sets up chain) mcmc sequence number of QTL environmental variance h2 = heritability (genetic/total variance) LOD = likelihood September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell MCMC sampled loci subset of chromosomes N2, N3, N16 points jittered for view blue lines at markers note concentration on chromosome N2 includes all models September 2004 Jax Workshop © Brian S. Yandell

Bayesian model assessment row 1: # QTL row 2: pattern col 1: posterior col 2: Bayes factor note error bars on bf evidence suggests 4-5 QTL N2(2-3),N3,N16 September 2004 Jax Workshop © Brian S. Yandell

Bayesian estimates of loci & effects model averaging: at least 4 QTL histogram of loci blue line is density red lines at estimates estimate additive effects (red circles) grey points sampled from posterior blue line is cubic spline dashed line for 2 SD September 2004 Jax Workshop © Brian S. Yandell

Bayesian model diagnostics pattern: N2(2),N3,N16 col 1: density col 2: boxplots by m environmental variance 2 = .008,  = .09 heritability h2 = 52% LOD = 16 (highly significant) but note change with m September 2004 Jax Workshop © Brian S. Yandell

Bayesian software for QTLs R/bim (Satagopan Yandell 1996; Gaffney 2001) www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl www.r-project.org contributed package version available within WinQTLCart (statgen.ncsu.edu/qtlcart) Bayesian IM with epistasis (Nengjun Yi, U AB) separate C++ software (papers with Xu) plans in progress to incorporate into R/bim R/qtl (Broman et al. 2003) biosun01.biostat.jhsph.edu/~kbroman/software Pseudomarker (Sen Churchill 2002) www.jax.org/staff/churchill/labsite/software Bayesian QTL / Multimapper Sillanpää Arjas (1998) www.rni.helsinki.fi/~mjs Stephens & Fisch (email) September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell R/bim: our software www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl R contributed library (www.r-project.org) library(bim) is cross-compatible with library(qtl) Bayesian module within WinQTLCart WinQTLCart output can be processed using R library Software history initially designed by JM Satagopan (1996) major revision and extension by PJ Gaffney (2001) whole genome multivariate update of effects; long range position updates substantial improvements in speed, efficiency pre-burnin: initial prior number of QTL very large upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 2003) epistasis in progress (H Wu, BS Yandell, N Yi 2004) September 2004 Jax Workshop © Brian S. Yandell

Jax Workshop © Brian S. Yandell many thanks Michael Newton Daniel Sorensen Daniel Gianola Liang Li Hong Lan Hao Wu Nengjun Yi David Allison Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada Alan Attie Jonathan Stoehr Gary Churchill Jaya Satagopan Fei Zou Patrick Gaffney Chunfang Jin Yang Song Elias Chaibub Neto Xiaodan Wei USDA Hatch, NIH/NIDDK, Jackson Labs September 2004 Jax Workshop © Brian S. Yandell