BayesNCSU QTL II: Yandell © 20051 Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? 2-6 2.Bayesian QTL mapping7-17 3.Markov chain sampling18-25.

Slides:

Advertisements

Similar presentations

VISG Polyploids Project Rod Ball & Phillip Wilcox Scion; Sammie Yilin Jia Plant and Food Research Gail Timmerman-Vaughan Plant and Food Research Nihal.

Advertisements

METHODS FOR HAPLOTYPE RECONSTRUCTION

Psychology 290 Special Topics Study Course: Advanced Meta-analysis April 7, 2014.

Bayesian Estimation in MARK

Bayesian Methods with Monte Carlo Markov Chains III

1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.

Bayesian statistics – MCMC techniques

April 2010UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,

Model SelectionSeattle SISG: Yandell © QTL Model Selection 1.Bayesian strategy 2.Markov chain sampling 3.sampling genetic architectures 4.criteria.

Sessão Temática 2 Análise Bayesiana Utilizando a abordagem Bayesiana no mapeamento de QTL´s Roseli Aparecida Leandro ESALQ/USP 11 o SEAGRO / 50ª RBRAS.

Quantitative Genetics

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Bayes Factor Based on Han and Carlin (2001, JASA).

Modeling Menstrual Cycle Length in Pre- and Peri-Menopausal Women Michael Elliott Xiaobi Huang Sioban Harlow University of Michigan School of Public Health.

April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,

Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )

Bayesian inference review Objective –estimate unknown parameter  based on observations y. Result is given by probability distribution. Bayesian inference.

Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

QTL 2: OverviewSeattle SISG: Yandell © Seattle Summer Institute 2010 Advanced QTL Brian S. Yandell, UW-Madison

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

Bayesian MCMC QTL mapping in outbred mice Andrew Morris, Binnaz Yalcin, Jan Fullerton, Angela Meesaq, Rob Deacon, Nick Rawlins and Jonathan Flint Wellcome.

Experimental Design and Data Structure Supplement to Lecture 8 Fall

October 2007Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison

October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S. Yandell University of Wisconsin-Madison

Lecture 13: Linkage Analysis VI Date: 10/08/02  Complex models  Pedigrees  Elston-Stewart Algorithm  Lander-Green Algorithm.

Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.

FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.

Lecture 24: Quantitative Traits IV Date: 11/14/02  Sources of genetic variation additive dominance epistatic.

Brian S. Yandell University of Wisconsin-Madison

June 1999NCSU QTL Workshop © Brian S. Yandell 1 Bayesian Inference for QTLs in Inbred Lines II Brian S Yandell University of Wisconsin-Madison

The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.

Tutorial I: Missing Value Analysis

Lecture 22: Quantitative Traits II

Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Chapter 2: Bayesian hierarchical models in geographical genetics Manda Sayler.

Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.

13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,

April 2008UW-Madison © Brian S. Yandell1 Bayesian QTL Mapping Brian S. Yandell University of Wisconsin-Madison UW-Madison,

June 1999NCSU QTL Workshop © Brian S. Yandell 1 Bayesian Inference for QTLs in Inbred Lines II Brian S Yandell University of Wisconsin-Madison

September 2002Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Quantitative Trait Loci using Markov chain Monte Carlo in Experimental Crosses.

NSF StatGen: Yandell © NSF StatGen 2009 Bayesian Interval Mapping Brian S. Yandell, UW-Madison overview: multiple.

April 2008Stat 877 © Brian S. Yandell1 Reversible Jump Details Brian S. Yandell University of Wisconsin-Madison April.

SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.

Bayesian Variable Selection in Semiparametric Regression Modeling with Applications to Genetic Mappping Fei Zou Department of Biostatistics University.

Plant Microarray Course

MCMC Output & Metropolis-Hastings Algorithm Part I

Identifying QTLs in experimental crosses

Model Selection for Multiple QTL

Jun Liu Department of Statistics Stanford University

Bayesian Model Selection for Multiple QTL

Bayesian Inference for QTLs in Inbred Lines I

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Bayesian Model Selection for Quantitative Trait Loci with Markov chain Monte Carlo in Experimental Crosses Brian S. Yandell University of Wisconsin-Madison.

Inferring Genetic Architecture of Complex Biological Processes Brian S

Bayesian Inference for QTLs in Inbred Lines

High Throughput Gene Mapping Brian S. Yandell

Bayesian Interval Mapping

Bayesian Model Selection for Multiple QTL

Bayesian Interval Mapping

Inferring Genetic Architecture of Complex Biological Processes Brian S

Note on Outbred Studies

Brian S. Yandell University of Wisconsin-Madison

Seattle SISG: Yandell © 2009

Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics

Bayesian Interval Mapping

Presentation transcript:

BayesNCSU QTL II: Yandell © Bayesian Interval Mapping 1.what is Bayes? Bayes theorem? Bayesian QTL mapping Markov chain sampling sampling across architectures26-32

BayesNCSU QTL II: Yandell © Reverend Thomas Bayes ( ) –part-time mathematician –buried in Bunhill Cemetary, Moongate, London –famous paper in 1763 Phil Trans Roy Soc London –was Bayes the first with this idea? (Laplace?) basic idea (from Bayes’ original example) –two billiard balls tossed at random (uniform) on table –where is first ball if the second is to its left (right)? 1. who was Bayes? what is Bayes theorem? priorpr(  ) = 1 likelihoodpr(Y |  )=  1–Y (1–  ) Y posteriorpr(  |Y)= ?  Y=0 Y=1 first second

BayesNCSU QTL II: Yandell © what is Bayes theorem? where is first ball if the second is to its left (right)? prior: probability of parameter before observing data –pr(  ) = pr( parameter ) –equal chance of being anywhere on the table posterior: probability of parameter after observing data –pr(  | Y ) = pr( parameter | data ) –more likely to left if first ball is toward the right end of table likelihood: probability of data given parameters –pr( Y |  ) = pr( data | parameter ) –basis for classical statistical inference Bayes theorem –posterior = likelihood * prior / pr( data ) –normalizing constant pr( Y ) often drops out of calculation  Y=0 Y=1

BayesNCSU QTL II: Yandell © where is the second ball given the first? priorpr(  ) = 1 likelihoodpr(Y |  )=  1–Y (1-  ) Y posteriorpr(  |Y)= ?  Y=0 Y=1 first ball second ball

BayesNCSU QTL II: Yandell © Bayes posterior for normal data large prior variancesmall prior variance actual mean prior mean actual mean prior mean

BayesNCSU QTL II: Yandell © modelY i =  + E i environmentE ~ N( 0,  2 ),  2 known likelihoodY ~ N( ,  2 ) prior  ~ N(  0,  2 ),  known posterior:mean tends to sample mean single individual  ~ N(  0 + B 1 (Y 1 –  0 ), B 1  2 ) sample of n individuals fudge factor (shrinks to 1) Bayes posterior for normal data

BayesNCSU QTL II: Yandell © Bayesian QTL mapping observed measurements –Y = phenotypic trait –X = markers & linkage map i = individual index 1,…,n missing data –missing marker data –Q = QT genotypes alleles QQ, Qq, or qq at locus unknown quantities of interest – = QT locus (or loci) –  = phenotype model parameters –m = number of QTL –M = genetic model = genetic architecture pr(Q|X, ) recombination model –grounded by linkage map, experimental cross –recombination yields multinomial for Q given X pr(Y|Q,  ) phenotype model –distribution shape (could be assumed normal) –unknown parameters  (could be non-parametric) after Sen Churchill (2001) M

BayesNCSU QTL II: Yandell © QTL MarkerTrait QTL Mapping (Gary Churchill) Key Idea: Crossing two inbred lines creates linkage disequilibrium which in turn creates associations and linked segregating QTL

BayesNCSU QTL II: Yandell © pr(Q|X, ) recombination model pr(Q|X, ) = pr(geno | map, locus)  pr(geno | flanking markers, locus) distance along chromosome Q?Q? recombination rates markers

BayesNCSU QTL II: Yandell © pr(Y|Q,  ) phenotype model pr( trait | genotype, effects ) trait = genetic + environment assume genetic uncorrelated with environment Y G QQ

BayesNCSU QTL II: Yandell © Bayesian priors for QTL missing genotypes Q –pr( Q | X, ) –recombination model is formally a prior effects  = ( G,  2 ) –pr(  ) = pr( G q |  2 ) pr(  2 ) –use conjugate priors for normal phenotype pr( G q |  2 ) = normal pr(  2 ) = inverse chi-square each locus may be uniform over genome –pr( | X ) = 1 / length of genome combined prior –pr( Q, , | X ) = pr( Q | X, ) pr(  ) pr( | X )

BayesNCSU QTL II: Yandell © Bayesian model posterior augment data (Y,X) with unknowns Q study unknowns ( ,,Q) given data (Y,X) –properties of posterior pr( ,,Q | Y, X ) sample from posterior in some clever way –multiple imputation or MCMC

BayesNCSU QTL II: Yandell © how does phenotype Y improve posterior for genotype Q? what are probabilities for genotype Q between markers? recombinants AA:AB all 1:1 if ignore Y and if we use Y?

BayesNCSU QTL II: Yandell © posterior on QTL genotypes full conditional for Q depends data for individual i –proportional to prior pr(Q | X i, ) weight toward Q that agrees with flanking markers –proportional to likelihood pr(Y i |Q,  ) weight toward Q=q so that group mean G q  Y i phenotype and prior recombination may conflict –posterior recombination balances these two weights

BayesNCSU QTL II: Yandell © posterior genotypic means G q

BayesNCSU QTL II: Yandell © posterior centered on sample genotypic mean but shrunken slightly toward overall mean prior: posterior: fudge factor: posterior genotypic means G q

BayesNCSU QTL II: Yandell © What if variance  2 is unknown? sample variance is proportional to chi-square –ns 2 /  2 ~  2 ( n ) –likelihood of sample variance s 2 given n,  2 conjugate prior is inverse chi-square –  2 /  2 ~  2 ( ) –prior of population variance  2 given,  2 posterior is weighted average of likelihood and prior –(  2 +ns 2 ) /  2 ~  2 ( +n ) –posterior of population variance  2 given n, s 2,,  2 empirical choice of hyper-parameters –  2 = s 2 /3, =6 –E(  2 |,  2 ) = s 2 /2, Var(  2 |,  2 ) = s 4 /4

BayesNCSU QTL II: Yandell © Markov chain sampling of architectures construct Markov chain around posterior –want posterior as stable distribution of Markov chain –in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well hard to sample (Q, ,, M) from joint posterior –update (Q, , ) from full conditionals for model M –update genetic model M

BayesNCSU QTL II: Yandell © Gibbs sampler idea toy problem –want to study two correlated effects –could sample directly from their bivariate distribution instead use Gibbs sampler: –sample each effect from its full conditional given the other –pick order of sampling at random –repeat many times

BayesNCSU QTL II: Yandell © Gibbs sampler samples:  = 0.6 N = 50 samples N = 200 samples

BayesNCSU QTL II: Yandell © MCMC sampling of (,Q,  ) Gibbs sampler –genotypes Q –effects  = ( G,  2 ) –not loci Metropolis-Hastings sampler –extension of Gibbs sampler –does not require normalization pr( Q | X ) = sum pr( Q | X, ) pr( )

BayesNCSU QTL II: Yandell © Metropolis-Hastings idea want to study distribution f(  ) –take Monte Carlo samples unless too complicated –take samples using ratios of f Metropolis-Hastings samples: –current sample value  –propose new value  * from some distribution g( ,  * ) Gibbs sampler: g( ,  * ) = f(  * ) –accept new value with prob A Gibbs sampler: A = 1 f()f() g(–*)g(–*)

BayesNCSU QTL II: Yandell © Metropolis-Hastings samples N = 200 samples N = 1000 samples narrow gwide gnarrow gwide g

BayesNCSU QTL II: Yandell © full conditional for locus cannot easily sample from locus full conditional pr( |Y,X, ,Q)= pr( | X,Q) = pr( Q | X, ) pr( ) / constant constant is very difficult to compute explicitly –must average over all possible loci over genome –must do this for every possible genotype Q Gibbs sampler will not work in general –but can use method based on ratios of probabilities –Metropolis-Hastings is extension of Gibbs sampler

BayesNCSU QTL II: Yandell © Metropolis-Hastings Step pick new locus based upon current locus –propose new locus from some distribution g( ) pick value near current one? (usually) pick uniformly across genome? (sometimes) –accept new locus with probability A otherwise stick with current value

BayesNCSU QTL II: Yandell © sampling across architectures search across genetic architectures M of various sizes –allow change in m = number of QTL –allow change in types of epistatic interactions methods for search –reversible jump MCMC –Gibbs sampler with loci indicators complexity of epistasis –Fisher-Cockerham effects model –general multi-QTL interaction & limits of inference

BayesNCSU QTL II: Yandell © search across genetic architectures think model selection in multiple regression –but regressors (QTL genotypes) are unknown linked loci are collinear regressors –correlated effects consider only main genetic effects here –epistasis just gets more complicated

BayesNCSU QTL II: Yandell © model selection in regression consider known genotypes Q at 2 known loci –models with 1 or 2 QTL jump between 1-QTL and 2-QTL models adjust parameters when model changes –  q1 estimate changes between models 1 and 2 –due to collinearity of QTL genotypes

BayesNCSU QTL II: Yandell © geometry of reversible jump 11 11 22 22

BayesNCSU QTL II: Yandell © geometry allowing Q and to change 11 11 22 22

BayesNCSU QTL II: Yandell © reversible jump MCMC idea Metropolis-Hastings updates: draw one of three choices –update m-QTL model with probability 1-b(m+1)-d(m) update current model using full conditionals sample m QTL loci, effects, and genotypes –add a locus with probability b(m+1) propose a new locus and innovate new genotypes & genotypic effect decide whether to accept the “birth” of new locus –drop a locus with probability d(m) propose dropping one of existing loci decide whether to accept the “death” of locus Satagopan Yandell (1996, 1998); Sillanpaa Arjas (1998); Stevens Fisch (1998) –these build on RJ-MCMC idea of Green (1995); Richardson Green (1997) 0L 1 m+1 m 2 …

BayesNCSU QTL II: Yandell © Gibbs sampler with loci indicators consider only QTL at pseudomarkers –every 1-2 cM –modest approximation with little bias use loci indicators in each pseudomarker –  = 1 if QTL present –  = 0 if no QTL present Gibbs sampler on loci indicators  –relatively easy to incorporate epistasis –Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005 Genetics) (see earlier work of Nengjun Yi and Ina Hoeschele)