Download presentation
Presentation is loading. Please wait.
Published byJason Thornton Modified over 9 years ago
1
June 1999NCSU QTL Workshop © Brian S. Yandell 1 Bayesian Inference for QTLs in Inbred Lines II Brian S Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell NCSU Statistical Genetics June 1999
2
NCSU QTL Workshop © Brian S. Yandell 2 Many Thanks Michael Newton Daniel Sorensen Daniel Gianola Jaya Satagopan Patrick Gaffney Fei Zou Tom Osborn David Butruille Marcio Ferrera Josh Udahl Pablo Quijada USDA Hatch Grants
3
June 1999NCSU QTL Workshop © Brian S. Yandell 3 Overview II quick review of trait model –single & multiple QTL –details of Gibbs sampler full conditionals –vector notation reversible jump MCMC –multiple regression –number of QTLs deconstructing Bayesian LODs
4
June 1999NCSU QTL Workshop © Brian S. Yandell 4 Quick Review of trait Model single QTL details of Gibbs sampler –normal priors & likelihoods mean, additive effects –inverse gamma prior for variance or inverse chi-square –vague priors lead to usual estimates as posterior means multiple QTL trait model –model with vector notation
5
June 1999NCSU QTL Workshop © Brian S. Yandell 5 Single QTL trait Model trait = mean + additive + error trait = effect_of_geno + error prob( trait | geno, effects )
6
June 1999NCSU QTL Workshop © Brian S. Yandell 6 Gibbs Sampler updates variance mean traits additive genos
7
June 1999NCSU QTL Workshop © Brian S. Yandell 7 Full Conditional for mean normal prior with large variance leads to normal posterior posterior mean posterior variance
8
June 1999NCSU QTL Workshop © Brian S. Yandell 8 Full Conditional for additive Effect normal prior with large variance leads to normal posterior posterior mean posterior variance
9
June 1999NCSU QTL Workshop © Brian S. Yandell 9 Full Conditional for variance inverse gamma prior with large v/a posterior distribution posterior mean
10
June 1999NCSU QTL Workshop © Brian S. Yandell 10 MCMC run for variance
11
June 1999NCSU QTL Workshop © Brian S. Yandell 11 Alternative for Variance: use Inverse Chi-square inverse chi-square prior with large d,v posterior distribution
12
June 1999NCSU QTL Workshop © Brian S. Yandell 12 Multiple QTL model trait = mean + add1 + add2 + error trait = effect_of_genos + error prob( trait | genos, effects )
13
June 1999NCSU QTL Workshop © Brian S. Yandell 13 Vector Notation for QTLs inner product for sum condense notation
14
June 1999NCSU QTL Workshop © Brian S. Yandell 14 Multiple loci vector of loci across linkage map careful bookkeeping during update –identifiability & bump hunting –possibility of two loci in one marker interval ordered loci are sufficient
15
June 1999NCSU QTL Workshop © Brian S. Yandell 15 Posterior: Multiple QTLs posterior = likelihood * prior / constant posterior( paramaters | data ) prob( genos, effects, loci | traits, map ) is proportional to
16
June 1999NCSU QTL Workshop © Brian S. Yandell 16 MCMC for Multiple QTLs construct Markov chain around posterior update one (or several) components at a time –update effects given geno types & traits –update loci given geno types & traits –update geno types give loci & effects update all terms for each locus at one time? –open questions of efficient mixing
17
June 1999NCSU QTL Workshop © Brian S. Yandell 17 MCMC Updates effects loci traits genos
18
June 1999NCSU QTL Workshop © Brian S. Yandell 18 MCMC Conditions construct Markov chain with stable distribution ergodic Markov chain –reversibile (detailed balance) –irreducible (can get from any value to any other) –aperiodic (no fixed pattern) –positive recurrent (chance to visit all possible values)
19
June 1999NCSU QTL Workshop © Brian S. Yandell 19 Reversible Jump MCMC basic idea of Green(1995) model selection in regression how many QTLs? –number of QTL is random –estimate the number m RJ-MCMC vs. Bayes factors other similar ideas
20
June 1999NCSU QTL Workshop © Brian S. Yandell 20 Jumping the Number of QTL model changes with number of QTL –almost analogous to stepwise regression –use reversible jump MCMC to change number book keeping helps in comparing models change of variables between models prior on number of QTL –uniform over some range –Poisson with prior mean
21
June 1999NCSU QTL Workshop © Brian S. Yandell 21 Posterior: Number of QTL posterior = likelihood * prior / constant posterior( paramaters | data ) prob( genos, effects, loci, m | traits, map ) is proportional to
22
June 1999NCSU QTL Workshop © Brian S. Yandell 22 Reversible Jump Choices action step: draw one of three choices update step with probability 1-b(m+1)-d(m) –update current model –loci, effects, geno types as before add a locus with probability b(m+1) –propose a new locus –innovate effect and geno types at new locus –decide whether to accept the “birth” of new locus drop a locus with probability d(m) –pick one of existing loci to drop –decide whether to accept the “death” of locus
23
June 1999NCSU QTL Workshop © Brian S. Yandell 23 Markov chain for number m add a new locus drop a locus update current model 01mm-1m+1... m
24
June 1999NCSU QTL Workshop © Brian S. Yandell 24 Jumping QTL number & loci
25
June 1999NCSU QTL Workshop © Brian S. Yandell 25 RJ-MCMC Updates effects loci traits genos add locus drop locus b(m+1) d(m)d(m) 1-b(m+1)-d(m)
26
June 1999NCSU QTL Workshop © Brian S. Yandell 26 Propose to Add a locus propose a new locus –similar proposal to ordinary update uniform chance over genome easier to avoid interval with another QTL –need geno types at locus & model effect innovate effect & geno types at new locus –draw geno types based on recombination (prior) no dependence on trait model yet –draw effect as in Green’s reversible jump adjust for collinearity modify other parameters accordingly check acceptance...
27
June 1999NCSU QTL Workshop © Brian S. Yandell 27 Propose to Drop a locus choose an existing locus –equal weight for all loci ? –more weight to loci with small effects ? “drop” effect & geno types at old locus –adjust effects at other loci for collinearity –this is reverse jump of Green (1995) check acceptance … –do not drop locus, effects & geno types –until move is accepted
28
June 1999NCSU QTL Workshop © Brian S. Yandell 28 Acceptance of Reversible Jump accept birth of new locus with probability min(1,A) accept death of old locus with probability min(1,1/A)
29
June 1999NCSU QTL Workshop © Brian S. Yandell 29 Acceptance of Reversible Jump move probabilities birth & death proposals Jacobian between models –fudge factor –see stepwise regression example mm+1
30
June 1999NCSU QTL Workshop © Brian S. Yandell 30 RJ-MCMC: Number of QTL
31
June 1999NCSU QTL Workshop © Brian S. Yandell 31 Posterior # QTL for 8-week Data 98% credible region for m : (1,3) based on 1 million steps with prior mean of 3
32
June 1999NCSU QTL Workshop © Brian S. Yandell 32 How Good is RJ-MCMC? simulations with 0, 1 or 2 QTL –strong effects (additive = 2, variance = 1) –linked loci 36cM apart differences with number of QTL –clear differences by actual number –works well with 100,000, better with 1M effect of Poisson prior mean –larger prior mean shifts posterior up –but prior does not take over
33
June 1999NCSU QTL Workshop © Brian S. Yandell 33 Posterior for Simulated Data 0,1 or 2 large QTL prior Poisson mean of 2 100,000 RJ-MCMC runs
34
June 1999NCSU QTL Workshop © Brian S. Yandell 34 Effect of Prior Mean
35
June 1999NCSU QTL Workshop © Brian S. Yandell 35 # QTL in Brassica Data 4-week & 8-week vernalization –log( days to flower) –105 lines, 10 markers –modest effects –evidence of 1 or 2 QTL using Bayes factors histograms of posterior number of QTL –depends somewhat on prior –mode is 1 or 2 QTL 90% credible sets –all include 2 QTL –include 1 QTL if prior not huge
36
June 1999NCSU QTL Workshop © Brian S. Yandell 36 #QTL for Brassica 8-week
37
June 1999NCSU QTL Workshop © Brian S. Yandell 37 Brassica #QTL 90% Credible Sets 8-week 4-week
38
June 1999NCSU QTL Workshop © Brian S. Yandell 38 Brassica #QTL Comparison
39
June 1999NCSU QTL Workshop © Brian S. Yandell 39 Reversible Jump II reversible jump MCMC details –can update model with m QTL –have basic idea of jumping models –now: careful bookkeeping between models RJ-MCMC & Bayes factors –Bayes factors from RJ-MCMC chain –components of Bayes factors
40
June 1999NCSU QTL Workshop © Brian S. Yandell 40 RJ-MCMC Updates effects loci traits genos add locus drop locus b(m+1) d(m)d(m) 1-b(m+1)-d(m)
41
June 1999NCSU QTL Workshop © Brian S. Yandell 41 Reversible Jump Idea expand idea of MCMC to compare models adjust for parameters in different models –augment smaller model with innovations –constraints on larger model calculus “change of variables” is key –add or drop parameter(s) –carefully compute the Jacobian consider stepwise regression –Mallick (1995) & Green (1995) –efficient calculation with Hausholder decomposition
42
June 1999NCSU QTL Workshop © Brian S. Yandell 42 Model Selection in Regression known regressors (e.g. markers ) –models with 1 or 2 regressors jump between models –centering regressors simplifies calculations
43
June 1999NCSU QTL Workshop © Brian S. Yandell 43 Slope Estimate for 1 Regressor recall least squares estimate of slope note relation of slope to correlation
44
June 1999NCSU QTL Workshop © Brian S. Yandell 44 2 Correlated Regressors slopes adjusted for other regressors
45
June 1999NCSU QTL Workshop © Brian S. Yandell 45 Gibbs Sampler for Model 1 mean slope variance
46
June 1999NCSU QTL Workshop © Brian S. Yandell 46 Gibbs Sampler for Model 2 mean slopes variance
47
June 1999NCSU QTL Workshop © Brian S. Yandell 47 Updates from 2->1 drop 2nd regressor adjust other regressor
48
June 1999NCSU QTL Workshop © Brian S. Yandell 48 Updates from 1->2 add 2nd slope, adjusting for collinearity adjust other slope & variance
49
June 1999NCSU QTL Workshop © Brian S. Yandell 49 Model Selection in Regression known regressors (e.g. markers ) –models with 1 or 2 regressors jump between models –augment with new innovation z
50
June 1999NCSU QTL Workshop © Brian S. Yandell 50 Change of Variables change variables from model 1 to model 2 calculus issues for integration –need to formally account for change of variables –infinitessimal steps in integration (db) –involves partial derivatives (next page)
51
June 1999NCSU QTL Workshop © Brian S. Yandell 51 Jacobian & the Calculus Jacobian sorts out change of variables –careful: easy to mess up here!
52
June 1999NCSU QTL Workshop © Brian S. Yandell 52 Geometry of Reversible Jump
53
June 1999NCSU QTL Workshop © Brian S. Yandell 53 QT additive Reversible Jump
54
June 1999NCSU QTL Workshop © Brian S. Yandell 54 Credible Set for additive 90% & 95% sets based on normal regression line corresponds to slope of updates
55
June 1999NCSU QTL Workshop © Brian S. Yandell 55 Efficient Updating of additive more computations when m > 2 want to avoid matrix inverses –decompose matrix instead –solve linear system of equations use linear algebra –Hausholder (QR) decomposition –LAPACK User’s Guide (1995, 2nd ed) Anderson et al., SIAM.
56
June 1999NCSU QTL Workshop © Brian S. Yandell 56 Hausholder (QR) Decomposition decomposition –G is upper triangular –F is orthogonal orthogonality design matrix
57
June 1999NCSU QTL Workshop © Brian S. Yandell 57 QR & Regression model error piece model piece estimators
58
June 1999NCSU QTL Workshop © Brian S. Yandell 58 Absorbing Old Model old model –m regressors –QR decomposition new model –m+1 regressor –use QR to absorb old model
59
June 1999NCSU QTL Workshop © Brian S. Yandell 59 Adjusted Slope Estimators old slopes –note m=1 case added slope –note sum of squares variance –note Jacobian new slopes
60
June 1999NCSU QTL Workshop © Brian S. Yandell 60 How To Infer loci ? if m is known, use fixed MCMC –histogram of loci –issue of bump hunting combining loci estimates in RJ-MCMC –some steps are from wrong model too few loci (bias) too many loci (variance/identifiability) –condition on number of loci subsets of Markov chain
61
June 1999NCSU QTL Workshop © Brian S. Yandell 61 Brassica 8-week Data locus MCMC with m=2
62
June 1999NCSU QTL Workshop © Brian S. Yandell 62 Jumping QTL number & loci
63
June 1999NCSU QTL Workshop © Brian S. Yandell 63 RJ-MCMC loci chain
64
June 1999NCSU QTL Workshop © Brian S. Yandell 64 Raw Histogram of loci
65
June 1999NCSU QTL Workshop © Brian S. Yandell 65 Conditional Histograms
66
June 1999NCSU QTL Workshop © Brian S. Yandell 66 Bayes Factors ratio of posterior odds to prior odds –RJ-MCMC gives posterior on number of QTL –prior is Poisson
67
June 1999NCSU QTL Workshop © Brian S. Yandell 67 #QTL for Brassica 8-week
68
June 1999NCSU QTL Workshop © Brian S. Yandell 68 RJ-Bayes Factors (8-week Brassica data)
69
June 1999NCSU QTL Workshop © Brian S. Yandell 69 Simulation Study of Prior Effect how dramatic is the effect of prior? simulations of 0, 1 or 2 QTL –QTL have large effect additive = 2, variance = 1 –2 QTL spaced 36cM apart –sample sized of 105 RJ-MCMC runs of 100,000
70
June 1999NCSU QTL Workshop © Brian S. Yandell 70 Effect of Prior Mean
71
June 1999NCSU QTL Workshop © Brian S. Yandell 71 Bayes Factor prior of 2 prior of 4
72
June 1999NCSU QTL Workshop © Brian S. Yandell 72 Computing Bayes Factors arithmetic mean –using samples from prior –mean across Monte Carlo or MCMC runs –can be inefficient if prior differs from posterior harmonic mean –using samples from posterior –more efficient but less stable –careful choice of weight h() close to posterior
73
June 1999NCSU QTL Workshop © Brian S. Yandell 73 Stable Bayes Factors Satagopan, Raftery & Newton (1999) –weighted harmonic mean –absorb variance (normal to t dist) replace by
74
June 1999NCSU QTL Workshop © Brian S. Yandell 74 Bayes Factors & LODs others have tried arithmetic & harmonic mean why not geometric mean? terms that are averaged are log likelihoods...
75
June 1999NCSU QTL Workshop © Brian S. Yandell 75 Bayesian LOD Bayesian “LOD” computed at each step –based on LR given sampled geno types and effects –can be larger or smaller than profile LOD –informal diagnostic of fit –combine to for geometric estimates of Bayes factors
76
June 1999NCSU QTL Workshop © Brian S. Yandell 76 Compare LODs scatter plot of loci and Bayesian LODs –same BLOD for all loci of step –overlay LOD from interval mapping (red) –overlay LOD from CIM (green) –vertical lines at true or inferred loci (blue) steps with higher BLOD –may have more likely genotypes –basis for MCMC step always up to higher likelihood sometimes down to lower likelihood
77
June 1999NCSU QTL Workshop © Brian S. Yandell 77 BLODs with no QTL LOD profile roughly zero most BLOD values should be negative –no pattern across linkage group distribution similar to rescaled chi- square with 1 df –-2log(LR) approximately chi-square –assignment of genotypes “irrelevant” RJ-MCMC skewed with inferred QTL –numbers indicate #QTL
78
June 1999NCSU QTL Workshop © Brian S. Yandell 78 Simulated Data with No QTL
79
June 1999NCSU QTL Workshop © Brian S. Yandell 79 BLOD: no QTL
80
June 1999NCSU QTL Workshop © Brian S. Yandell 80 RJ-MCMC BLOD: no QTL
81
June 1999NCSU QTL Workshop © Brian S. Yandell 81 BLODs with 1 QTL LOD peaks at correct locus most BLOD values near locus –some considerably larger than LOD –inferred genotype vs EM average non-central distribution of BLOD –rescaled non-central chi-square? RJ-MCMC dispersed from peak –locus proposal has local dispersion
82
June 1999NCSU QTL Workshop © Brian S. Yandell 82 LOD for 1 QTL
83
June 1999NCSU QTL Workshop © Brian S. Yandell 83 RJ-MCMC BLOD: 1 QTL
84
June 1999NCSU QTL Workshop © Brian S. Yandell 84 BLODs with 2 QTL incorrect fit of 1 QTL model –LOD peaks at ghost locus –BLOD values near LOD peaks correct 2-QTL model –IM misses, CIM gets loci –BLOD at loci –BLOD approx. 2x LOD simultaneous fit of both loci RJ-MCMC dispersed from peak –need to look conditional on m=2
85
June 1999NCSU QTL Workshop © Brian S. Yandell 85 LOD for 2 QTL with 1 Fit
86
June 1999NCSU QTL Workshop © Brian S. Yandell 86 LOD for 2 QTL with 2 Fit
87
June 1999NCSU QTL Workshop © Brian S. Yandell 87 RJ-MCMC BLOD: 2 QTL
88
June 1999NCSU QTL Workshop © Brian S. Yandell 88 Brassica BLODs 4-week clearer than 8-week –ghost QTL or smear for 8-week? MCMC with m=2 fairly clear RJ-MCMC dispersed –conditioning on m=2 similar to MCMC not shown –mixing models together –local proposal moves hamper mixing
89
June 1999NCSU QTL Workshop © Brian S. Yandell 89 Brassica 4-week BLOD Map
90
June 1999NCSU QTL Workshop © Brian S. Yandell 90 Brassica 8-week BLOD Map
91
June 1999NCSU QTL Workshop © Brian S. Yandell 91 4-week RJ-MCMC BLOD
92
June 1999NCSU QTL Workshop © Brian S. Yandell 92 8-week RJ-MCMC BLOD
93
June 1999NCSU QTL Workshop © Brian S. Yandell 93 The Art of MCMC convergence issues –burn-in period & when to stop proper mixing of the chain –smart proposals & smart updates frequentist approach –simulated annealing: reaching the peak –simulated tempering: heating & cooling the chain Bayesian approach –influence of priors on posterior –Rao-Blackwell smoothing bump-hunting for mixtures (e.g. QTL)
94
June 1999NCSU QTL Workshop © Brian S. Yandell 94 RJ-MCMC Software General MCMC software –U Bristol links http://www.stats.bris.ac.uk/MCMC/pages/links.html –BUGS (Bayesian inference Using Gibbs Sampling) http://www.mrc-bsu.cam.ac.uk/bugs/ Our MCMC software for QTLs –C code using LAPACK ftp://ftp.stat.wisc.edu/pub/yandell/revjump.tar.gz –coming soon: perl preprocessing (to/from QtlCart format) Splus post processing Bayes factor computation
95
June 1999NCSU QTL Workshop © Brian S. Yandell 95 RJ-MCMC Software Details input files –marker.dist –marker.mark –trait.y output file –result.write –result.error distances between loci (cM) marker genotypes (-1,1) –one line per marker trait phenotypes results errors if any
96
June 1999NCSU QTL Workshop © Brian S. Yandell 96 trait.y file missing value = -99.000000 8.097789 7.928476 12.538893 8.370929 8.347937 11.252023 10.687911 6.961958 12.318708 11.985510 11.841586 8.339165 9.347868 11.933833 7.593694 12.248663 13.183897 9.928043 8.326296...
97
June 1999NCSU QTL Workshop © Brian S. Yandell 97 marker.dist file 8.8 11.8 6.8 8.7 10.7 10.5 5.1 14.7
98
June 1999NCSU QTL Workshop © Brian S. Yandell 98 marker.mark file one row per marker, one column per line -1 -1 -1 1 -1 1 1... -1 -1 1 1 -1 1 1... -1 -1 1 -1 -1 1 1... -1 -1 1 -1 -1 1 -1... -1 1 1 1 -1 1 -1...
99
June 1999NCSU QTL Workshop © Brian S. Yandell 99 nval.dat file 1# 1=revjump,0=no 105 10# n=individuals markers 10000 10# N=MCMC_runs skips 2# m 0 0.5# mu sigmasq 0 0# initial b’s 3.5 7.5# initial loci 0 10 1 1# prior(mu)~N(0,10) prior(sigmasq)~IG(1,1) 0 10# prior(b)~N(0,1) 4# prior(m)~Poisson(4)
100
June 1999NCSU QTL Workshop © Brian S. Yandell 100 result.write file m mu b(1)..b(m) sigmasq lambda(1)..lambda(m) move LOD 2 3.0702 -0.0820 0.0975 0.2085 10.5301 83.7381 1 -0.3943 3 3.0512 0.0343 -0.1166 -0.0600 0.0853 18.4030 51.2329 76... 2 3.0400 -0.0415 -0.0946 0.0619 28.6062 74.0904 3 6.1284... 1 3.1045 -0.1412 0.0851 76.9420 3 6.1310 2 3.0669 -0.0434 -0.1576 0.1091 17.4566 80.7973 3 4.3184 propose accept birth 384474 100625 death 227999 100625 locus 387527 259705 m=0 m=1 m=2 m=3 m=4 m=5... 1936 428306 425105 126871 16703 1047 32 0 0 0
101
June 1999NCSU QTL Workshop © Brian S. Yandell 101 Bayes Factor References MA Newton & AE Raftery (1994) “Approximate Bayesian inference with the weighted likelihood bootstrap”, J Royal Statist Soc B 56: 3-48. RE Kass & AE Raftery (1995) “Bayes factors”, J Amer Statist Assoc 90: 773-795. JM Satagopan, MA Newton & AE Rafter (1999) ms in prep, mailto:satago@biosta.mskcc.org.
102
June 1999NCSU QTL Workshop © Brian S. Yandell 102 Reversible Jump MCMC References PJ Green (1995) “Reversible jump Markov chain Monte Carlo computation and Bayesian model determination”, Biometrika 82: 711-732. S Richardson & PJ Green (1997) “On Bayesian analysis of mixture with an unknown of components”, J Royal Statist Soc B 59: 731-792. BK Mallick (1995) “Bayesian curve estimation by polynomials of random order”, TR 95-19, Math Dept, Imperial College London. L Kuo & B Mallick (1996) “Bayesian variable selection for regression models”, ASA Proc Section on Bayesian Statistical Science, 170-175.
103
June 1999NCSU QTL Workshop © Brian S. Yandell 103 QTL Reversible Jump MCMC: Inbred Lines JM Satagopan & BS Yandell (1996) “Estimating the number of quantitative trait loci via Bayesian model determination”, Proc JSM Biometrics Section. DA Stephens & RD Fisch (1998) “Bayesian analysis of quantitative trait locus data using reversible jump Markov chain Monte Carlo”, Biometrics 54: 1334-1347. MJ Sillanpaa & E Arjas (1998) “Bayesian mapping of multiple quantitative trait loci from incomplete inbred line cross data”, Genetics 148: 1373-1388. R Waagepetersen & D Sorensen (1999) “Understanding reversible jump MCMC”, mailto:sorensen@inet.uni2.dk.
104
June 1999NCSU QTL Workshop © Brian S. Yandell 104 QTL Reversible Jump MCMC: Pedigrees S Heath (1997) “Markov chain Monte Carlo segregation and linkage analysis for oligenic models”, Am J Hum Genet 61: 748-760. I Hoeschele, P Uimari, FE Grignola, Q Zhang & KM Gage (1997) “Advances in statistical methods to map quantitative trait loci in outbred populations”, Genetics 147:1445-1457. P Uimari and I Hoeschele (1997) “Mapping linked quantitative trait loci using Bayesian analysis and Markov chain Monte Carlo algorithms”, Genetics 146: 735-743. MJ Sillanpaa & E Arjas (1999) “Bayesian mapping of multiple quantitative trait loci from incomplete outbred offspring data”, Genetics 151, 1605-1619.
105
June 1999NCSU QTL Workshop © Brian S. Yandell 105
106
June 1999NCSU QTL Workshop © Brian S. Yandell 106 MCMC run of loci for 2 QTL
107
June 1999NCSU QTL Workshop © Brian S. Yandell 107 Simulated Data effect Correlation
108
June 1999NCSU QTL Workshop © Brian S. Yandell 108
109
June 1999NCSU QTL Workshop © Brian S. Yandell 109
110
June 1999NCSU QTL Workshop © Brian S. Yandell 110 Samples from MCMC ergodic Markov chain –sample averages agree tend to mean of stable dist –but samples are dependent ordinary sample summaries –means, quantiles, histograms –high probability density (HPD) regions sample chain for long run (100,000-1,000,000) use time series methods to study chain –standard error of estimates –Monte Carlo error
111
June 1999NCSU QTL Workshop © Brian S. Yandell 111 QTL Bayesian Posterior posterior = likelihood * prior / constant posterior distribution is proportional to –likelihood of traits & genos given effects & locus –prior distribution of effects & locus posterior distribution of genotypes, effects & locus given traits and linkage map
112
June 1999NCSU QTL Workshop © Brian S. Yandell 112 Brassica napus Experimental Results two QTLs inferred on LG9 corroborated by Butruille (1998) now exploiting synteny with Arabidopsis thaliana
113
June 1999NCSU QTL Workshop © Brian S. Yandell 113 Brassica napus Experimental Design 3 vernalization treatments –none –4 week –8 week only consider 8 week Vernalization here non-flowering lines for none, 4-week skew suggests log transformation data average over 3 blocks (EU = line)
114
June 1999NCSU QTL Workshop © Brian S. Yandell 114 Scatter Plot of 2 loci Subset
115
June 1999NCSU QTL Workshop © Brian S. Yandell 115 RJ-MCMC Changing loci
116
June 1999NCSU QTL Workshop © Brian S. Yandell 116
117
June 1999NCSU QTL Workshop © Brian S. Yandell 117 Bayesian Inference for QTLs in Inbred Lines I Brian S Yandell University of Wisconsin-Madison www.stat.wisc.edu/~yandell NCSU Statistical Genetics June 1999
118
NCSU QTL Workshop © Brian S. Yandell 118 Overview Part I: Single QTL –Modeling a trait with a QTL –Likelihoods, Frequentists & Bayesians –Profile LODs & Bayesian Posteriors –Graphical Summaries –Testing & Estimation Part II: Multiple QTLs –Sampling from a Markov Chain –Issues for Multiple QTLs –Bayes Factors & Model Selection –Brassica data analysis
119
June 1999NCSU QTL Workshop © Brian S. Yandell 119 Part I: Single QTL Modelling a trait with a QTL –linear model for trait given geno type –recombination near loci for geno type Bayesian Posterior Likelihoods –EM & MCMC –Frequentists & Bayesians Review Interval Maps & Profile LODs Case Study: Simulated Single QTL
120
June 1999NCSU QTL Workshop © Brian S. Yandell 120 QTL Components observed data on individual –trait: field or lab measurement log( days to flowering ), yield, … –markers: from wet lab (RFLPs, etc.) linkage map of markers assumed known unobserved data on individual –geno: genotype (QQ=1/Qq=0/qq=-1) unknown model parameters –effects: mean, difference, variance –locus: quantitative trait locus (QTL)
121
June 1999NCSU QTL Workshop © Brian S. Yandell 121 Single QTL trait Model trait = mean + additive + error trait = effect_of_geno + error prob( trait | geno, effects )
122
June 1999NCSU QTL Workshop © Brian S. Yandell 122 Simulated Data with 1 QTL
123
June 1999NCSU QTL Workshop © Brian S. Yandell 123 Recombination and Distance no interference--easy approximation –Haldane map function –no interference with recombination all computations consistent in approximation –rely on given map marker loci assumed known –1-to-1 relation of distance to recombination –all map functions are approximate assume marker positions along map are known
124
June 1999NCSU QTL Workshop © Brian S. Yandell 124 markers, QTL & recombination rates distance along chromosome ? ?
125
June 1999NCSU QTL Workshop © Brian S. Yandell 125 Interval Mapping of QT geno type can express probabilities in terms of distance –locus is distance along linkage map –flanking markers sufficient if no missing data –could consider more complicated relationship prob( geno | locus, map ) = prob( geno | locus, flanking markers )
126
June 1999NCSU QTL Workshop © Brian S. Yandell 126 Basic Idea of Likelihood Use build likelihood in steps –build from trait & genotypes at locus –likelihood for individual i –log likelihood over individuals maximum likelihood (interval mapping) –EM method (Lander & Botstein 1989) –MCMC method (Guo & Thompson 1994) posterior distribution (Bayesian) –analytical methods (e.g. Carlin & Louis 1998) –MCMC method (Satagopan et al 1996)
127
June 1999NCSU QTL Workshop © Brian S. Yandell 127 Bayes Theorem posteriors and priors –prior: prob( parameters ) –posterior: prob( parameters | data ) posterior = likelihood * prior / constant posterior distribution is proportional to –likelihood of parameters given data –prior distribution of parameters
128
June 1999NCSU QTL Workshop © Brian S. Yandell 128 Bayesian Prior “prior” belief used to infer “posterior” estimates –higher weight for more probable parameter values based on prior knowledge –use previous study to inform current study weather prediction: tomorrow is much like today previous QTL studies on related organisms –historical criticism: can get “religious” about priors often want negligible effect of prior on posterior –pick non-informative priors all parameter values equally likely large variance on priors –always check sensitivity to prior
129
June 1999NCSU QTL Workshop © Brian S. Yandell 129 How to Study Posterior? exact methods –exact if possible –can be difficult or impossible to analyze approximate methods –importance sampling –numerical integration –Monte Carlo & other Monte Carlo methods –easy to implement –independent samples MCMC methods –handle hard problems –art to efficient use –correlated samples
130
June 1999NCSU QTL Workshop © Brian S. Yandell 130 QTL Bayesian Inference study posterior distribution of locus & effects –sample joint distribution locus, effects & genotypes –study marginal distribution of locus effects –overall mean, genotype difference, variance locus & effects together estimates & confidence regions –histograms, boxplots & scatter plots –HPD regions
131
June 1999NCSU QTL Workshop © Brian S. Yandell 131 Posterior for locus & effect
132
June 1999NCSU QTL Workshop © Brian S. Yandell 132 QTL Marginal Posterior posterior = likelihood * prior / constant posterior distribution is proportional to –likelihood of traits & genos given effects & locus –prior distribution of effects & locus is proportional to
133
June 1999NCSU QTL Workshop © Brian S. Yandell 133 Marginal Posterior Summary marginal posterior for locus & effects highest probability density (HPD) region –smallest region with highest probability –credible region for locus & effects HPD with 50,80,90,95% –range of credible levels can be useful –marginal bars and bounding boxes –joint regions (harder to draw)
134
June 1999NCSU QTL Workshop © Brian S. Yandell 134 HPD Region for locus & effect
135
June 1999NCSU QTL Workshop © Brian S. Yandell 135 QTL Bayesian Posterior posterior = likelihood * prior / constant posterior( paramaters | data ) prob( genos, effects, loci | trait, map ) is proportional to
136
June 1999NCSU QTL Workshop © Brian S. Yandell 136 Frequentist or Bayesian? Frequentist approach –fixed parameters range of values –maximize likelihood ML estimates find the peak –confidence regions random region invert a test –hypothesis testing 2 nested models Bayesian approach –random parameters distribution –posterior distribution posterior mean sample from dist –credible sets fixed region given data HPD regions –model selection/critique Bayes factors
137
June 1999NCSU QTL Workshop © Brian S. Yandell 137 Frequentist or Bayesian? Frequentist approach –maximize over mixture of QT genotypes –locus profile likelihood max over effects –HPD region for locus natural for locus –1-2 LOD drop work to get effects –approximate shape of likelihood peak Bayesian approach –joint distribution over QT genotypes –sample distribution joint effects & loci –HPD regions for joint locus & effects use density estimator
138
June 1999NCSU QTL Workshop © Brian S. Yandell 138 Interval Mapping for Quantitative Trait Loci profile likelihood (LOD) across QTL –scan whole genome locus by locus use flanking markers for interval mapping –maximize likelihood ratio (LOD) at locus best estimates of effects for each locus EM method (Lander & Botstein 1989)
139
June 1999NCSU QTL Workshop © Brian S. Yandell 139 Profile LOD for 1 QTL
140
June 1999NCSU QTL Workshop © Brian S. Yandell 140 Building trait Likelihood likelihood is mixture across possible geno types sum over all possible geno types at locus like( effects, locus | trait ) = sum of prob( trait, genos | effects, locus )
141
June 1999NCSU QTL Workshop © Brian S. Yandell 141 Likelihood over Individuals product of trait probabilities across individuals –product of sum across possible geno types like( effects, locus | traits, map ) = product of prob( trait | effects, locus, map )
142
June 1999NCSU QTL Workshop © Brian S. Yandell 142 Interval Mapping Tests profile LOD across possible loci in genome –maximum likelihood estimates of effects at locus –LOD is rescaling of L(effects, locus|y) test for evidence of QTL at each locus –LOD score (LR test) –adjust (?) for multiple comparisons
143
June 1999NCSU QTL Workshop © Brian S. Yandell 143 Interval Mapping Estimates confidence region for locus –based on inverting test of no QTL –2 LODs down gives approximate CI for locus –based on chi-square approximation to LR confidence region for effects –approximate CI for effect based on normal –point estimate from profile LOD
144
June 1999NCSU QTL Workshop © Brian S. Yandell 144 IM Confidence Region
145
June 1999NCSU QTL Workshop © Brian S. Yandell 145 How to Proceed? figure out how to construct posterior –Markov chain Monte Carlo run Markov chain summarize results diagnostics
146
June 1999NCSU QTL Workshop © Brian S. Yandell 146 MCMC Run for 1 locus Data
147
June 1999NCSU QTL Workshop © Brian S. Yandell 147 Part II: Multiple QTL Sampling from the Posterior –Markov chain Monte Carlo Gibbs sampler for effects Metropolis-Hastings for loci Issues for 2 QTL Bayes factors & Model Selection Simulated data for 0,1,2 QTL Brassica data on days to flowering
148
June 1999NCSU QTL Workshop © Brian S. Yandell 148 Posterior for effects start with joint distribution of effects –does not depend on locus given geno types –standard linear model problem is proportional to
149
June 1999NCSU QTL Workshop © Brian S. Yandell 149 How to Study Posterior? exact methods –exact if possible –can be difficult or impossible to analyze approximate methods –importance sampling –numerical integration –Monte Carlo & other Monte Carlo methods –easy to implement –independent samples MCMC methods –handle hard problems –art to efficient use –correlated samples
150
June 1999NCSU QTL Workshop © Brian S. Yandell 150 Ordinary Monte Carlo independent samples of joint distribution chaining (or peeling) of effects requires numerical integration –possible analytically here –very messy in general
151
June 1999NCSU QTL Workshop © Brian S. Yandell 151 Gibbs Sampler for effects set up Markov chain around posterior for effects sample from posterior by sampling from full conditionals –conditional posterior of each parameter given the other –update parameter by sampling full conditional update mean update additive update variance
152
June 1999NCSU QTL Workshop © Brian S. Yandell 152 Markov chain updates variance mean traits additive genos
153
June 1999NCSU QTL Workshop © Brian S. Yandell 153 Markov chain idea future events given the present state are independent of past events update chain based on current value can make chain arbitrarily complicated chain converges to stable pattern 01 1-q q p 1-p
154
June 1999NCSU QTL Workshop © Brian S. Yandell 154 Markov chain Monte Carlo can study arbitrarily complex models –need only specify how parameters affect each other –can reduce to specifying full conditionals construct Markov chain with “right” model –update some parameters given data and others –can fudge on “right” (importance sampling) –next step depends only on current estimates nice Markov chains have nice properties –sample summaries make sense –consider almost as random sample from distribution
155
June 1999NCSU QTL Workshop © Brian S. Yandell 155 MCMC run of mean & effect
156
June 1999NCSU QTL Workshop © Brian S. Yandell 156 Care & Use of MCMC sample chain for long run (100,000-1,000,000) –longer for more complicated likelihoods –use diagnostic plots to assess “mixing” standard error of estimates –use histogram of posterior –compute variance of posterior--just another summary studying the Markov chain –Monte Carlo error of series (Geyer 1992) time series estimate based on lagged auto-covariances –convergence diagnostics for “proper mixing”
157
June 1999NCSU QTL Workshop © Brian S. Yandell 157 MCMC Idea for QTLs construct Markov chain around posterior –want posterior as stable distribution of Markov chain –in practice, the chain tends toward stable distribution initial values may have low posterior probability burn-in period to get chain mixing well update one (or several) components at a time –update effects given geno types & traits –update locus given geno types & traits –update geno types give locus & effects
158
June 1999NCSU QTL Workshop © Brian S. Yandell 158 Markov chain updates effects locus traits genos
159
June 1999NCSU QTL Workshop © Brian S. Yandell 159 Full Conditional for genos full conditional for geno type depends on –effects via trait model –locus via recombination model can explicitly decompose by individual j –binomial (or trinomial) probability
160
June 1999NCSU QTL Workshop © Brian S. Yandell 160 Prior for locus no prior information on locus –uniform prior over genome –use framework map choose interval proportional to length then pick uniform position within interval prior information from other studies concentrate on credible regions use posterior of previous study as new prior
161
June 1999NCSU QTL Workshop © Brian S. Yandell 161 Full Conditional for locus cannot easily sample from locus full conditional cannot explicitly determine full conditional –difficult to normalize –need to consider all possible geno types over entire map Gibbs sampler will not work –but can get something proportional...
162
June 1999NCSU QTL Workshop © Brian S. Yandell 162 Metropolis-Hastings Step pick new locus based upon current locus –propose new locus from distribution q( ) pick value near current one? pick uniformly across genome? –accept new locus with probability a() Gibbs sampler is special case of M-H –always accept new proposal acceptance insures right stable distribution
163
June 1999NCSU QTL Workshop © Brian S. Yandell 163 MCMC run: 2 loci assuming only 1
164
June 1999NCSU QTL Workshop © Brian S. Yandell 164 Multiple QTL model trait = mean + add1 + add2 + error trait = effect_of_genos + error prob( trait | genos, effects )
165
June 1999NCSU QTL Workshop © Brian S. Yandell 165 Simulated Data with 2 QTL
166
June 1999NCSU QTL Workshop © Brian S. Yandell 166 Issues for Multiple QTL how many QTL influence a trait? –1, several (oligogenic) or many (polygenic)? –how many are supported by the data? searching for 2 or more QTL –conditional search (IM, CIM) –simultaneous search (MIM) epistasis (inter-loci interaction) –many more parameters to estimate –effects of ignored QTL
167
June 1999NCSU QTL Workshop © Brian S. Yandell 167 LOD for 2 QTL
168
June 1999NCSU QTL Workshop © Brian S. Yandell 168 Interval Mapping Approach interval mapping (IM) –scan genome for 1 QTL composite interval mapping (CIM) –scan for 1 QTL while adjusting for others –use markers as surrogates for other QTL multiple interval mapping (MIM) –search for multiple QTL
169
June 1999NCSU QTL Workshop © Brian S. Yandell 169 MCMC run with 2 loci
170
June 1999NCSU QTL Workshop © Brian S. Yandell 170 Bayesian Approach simultaneous search for multiple QTL use Bayesian paradigm –easy to consider joint distributions –easy to modify later for other types of data counts, proportions, etc. –employ MCMC to estimate posterior dist study estimates of loci & effects use Bayes factors for model selection –number of QTL
171
June 1999NCSU QTL Workshop © Brian S. Yandell 171 Effects for 2 Simulated QTL
172
June 1999NCSU QTL Workshop © Brian S. Yandell 172 Brassica napus Data 4-week & 8-week vernalization effect –log(days to flower) genetic cross of –Stellar (annual canola) –Major (biennial rapeseed) 105 F1-derived double haploid (DH) lines –homozygous at every locus (QQ or qq) 10 molecular markers (RFLPs) on LG9 –two QTLs inferred on LG9 (now chromosome N2) –corroborated by Butruille (1998) –exploiting synteny with Arabidopsis thaliana
173
June 1999NCSU QTL Workshop © Brian S. Yandell 173 Brassica 4- & 8-week Data
174
June 1999NCSU QTL Workshop © Brian S. Yandell 174 Brassica Data LOD Maps
175
June 1999NCSU QTL Workshop © Brian S. Yandell 175 4-week vs 8-week vernalization 4-week vernalization longer time to flower larger LOD at 40cM modest LOD at 80cM loci well determined cMadd 40.30 80.16 8-week vernalization shorter time to flower larger LOD at 80cM modest LOD at 40cM loci poorly determined cMadd 40.06 80.13
176
June 1999NCSU QTL Workshop © Brian S. Yandell 176 Brassica Credible Regions
177
June 1999NCSU QTL Workshop © Brian S. Yandell 177 Collinearity of QTLs multiple QT geno types are correlated –QTL linked on same chromosome –difficult to distinguish if close estimates of QT effects are correlated –poor identifiability of effects parameters –correlations give clue of how much to trust which QTL to go after in breeding? –largest effect ? –may be biased by nearby QTL
178
June 1999NCSU QTL Workshop © Brian S. Yandell 178 Brassica effect Correlations
179
June 1999NCSU QTL Workshop © Brian S. Yandell 179 Bayes Factors Which model (1 or 2 or 3 QTLs?) has higher probability of supporting the data? –ratio of posterior odds to prior odds –ratio of model likelihoods –related to likelihood ratio test of simple hypotheses
180
June 1999NCSU QTL Workshop © Brian S. Yandell 180 Bayes Factors & LR equivalent to LR statistic when –comparing two nested models –simple hypotheses (e.g. 1 vs 2 QTL) Bayes Information Criteria (BIC) in general –Schwartz introduced for model selection –penalty for different number of parameters p
181
June 1999NCSU QTL Workshop © Brian S. Yandell 181 Model Determination using Bayes Factors pick most plausible model –histogram for range of models –posterior distribution of models –use Bayes theorem –often assume flat prior across models posterior distribution of number of QTLs
182
June 1999NCSU QTL Workshop © Brian S. Yandell 182 Brassica Bayes Factors compare models for 1, 2, 3 QTL Bayes factor and - 2log(LR) large value favors first model 8-week vernalization only here
183
June 1999NCSU QTL Workshop © Brian S. Yandell 183 Computing Bayes Factors using samples from prior –mean across Monte Carlo or MCMC runs –can be inefficient if prior differs from posterior using samples from posterior –weighted harmonic mean more efficient but unstable –care in choosing weight h() close to posterior –increase stability: absorb variance (normal to t dist)
184
June 1999NCSU QTL Workshop © Brian S. Yandell 184 MCMC Software General MCMC software –U Bristol links http://www.stats.bris.ac.uk/MCMC/pages/links.html –BUGS (Bayesian inference Using Gibbs Sampling) http://www.mrc-bsu.cam.ac.uk/bugs/ Our MCMC software for QTLs –C code using LAPACK ftp://ftp.stat.wisc.edu/pub/yandell/revjump.tar.gz –coming soon: perl preprocessing (to/from QtlCart format) Splus post processing Bayes factor computation
185
June 1999NCSU QTL Workshop © Brian S. Yandell 185 Bayes & MCMC References CJ Geyer (1992) “Practical Markov chain Monte Carlo”, Statistical Science 7: 473-511 L Tierney (1994) “Markov Chains for exploring posterior distributions”, The Annals of Statistics 22: 1701-1728 (with disc:1728-1762). A Gelman, J Carlin, H Stern & D Rubin (1995) Bayesian Data Analysis, CRC/Chapman & Hall. BP Carlin & TA Louis (1996) Bayes and Empirical Bayes Methods for Data Analysis, CRC/ Chapman & Hall. WR Gilks, S Richardson, & DJ Spiegelhalter (Ed 1996) Markov Chain Monte Carlo in Practice, CRC/Chapman & Hall.
186
June 1999NCSU QTL Workshop © Brian S. Yandell 186 QTL References D Thomas & V Cortessis (1992) “A Gibbs sampling approach to linkage analysis”, Hum. Hered. 42: 63-76. I Hoeschele & P vanRanden (1993) “Bayesian analysis of linkage between genetic markers and quantitative trait loci. I. Prior knowledge”, Theor. Appl. Genet. 85:953-960. I Hoeschele & P vanRanden (1993) “Bayesian analysis of linkage between genetic markers and quantitative trait loci. II. Combining prior knowledge with experimental evidence”, Theor. Appl. Genet. 85:946-952. SW Guo & EA Thompson (1994) “Monte Carlo estimation of mixed models for large complex pedigrees”, Biometrics 50: 417-432. JM Satagopan, BS Yandell, MA Newton & TC Osborn (1996) “A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo”, Genetics 144: 805-816.
187
June 1999NCSU QTL Workshop © Brian S. Yandell 187 Brassica Data: 1 QTL
188
June 1999NCSU QTL Workshop © Brian S. Yandell 188 Brassica 8-week Data: 2 QTL
189
June 1999NCSU QTL Workshop © Brian S. Yandell 189 Brassica 8-week Data: 2 QTL
190
June 1999NCSU QTL Workshop © Brian S. Yandell 190 Geometry of Reversible Jump
191
June 1999NCSU QTL Workshop © Brian S. Yandell 191 Simulation of Jumps
192
June 1999NCSU QTL Workshop © Brian S. Yandell 192 Mean & Variance Estimates
193
June 1999NCSU QTL Workshop © Brian S. Yandell 193 Another Way to Jump many ways to parameterize models can augment both models
194
June 1999NCSU QTL Workshop © Brian S. Yandell 194 Simulated Data two loci, estimates from data Bayes factors for model selection
195
June 1999NCSU QTL Workshop © Brian S. Yandell 195 Posterior Number of loci (simulated after 8-week) distribution across m by prior mean
196
June 1999NCSU QTL Workshop © Brian S. Yandell 196 Bayes Factors (simulated after 8-week)
197
June 1999NCSU QTL Workshop © Brian S. Yandell 197 Sensitivity of Number to Prior
198
June 1999NCSU QTL Workshop © Brian S. Yandell 198 Number of loci Brassica 8-week distribution across m by prior mean
199
June 1999NCSU QTL Workshop © Brian S. Yandell 199 Sensitivity of Prior: Simulations
200
June 1999NCSU QTL Workshop © Brian S. Yandell 200 Profile LOD & LR profile LOD related to LR test at a locus –maximum likelihood estimates of effects –weighted average over all possible geno types
201
June 1999NCSU QTL Workshop © Brian S. Yandell 201 Bayes Factors Which model (1 or 2 or 3 QTLs?) has higher probability of supporting the data? –ratio of posterior odds to prior odds –ratio of model likelihoods –related to likelihood ratio test of simple hypotheses
202
June 1999NCSU QTL Workshop © Brian S. Yandell 202 Bayes Factors & LR equivalent to LR statistic when –comparing two nested models –simple hypotheses (e.g. 1 vs 2 QTL) Bayes Information Criteria (BIC) in general –Schwartz introduced for model selection –penalty for different number of parameters p
203
June 1999NCSU QTL Workshop © Brian S. Yandell 203 Model Determination using Bayes Factors pick most plausible model –histogram for range of models –posterior distribution of models –use Bayes theorem –often assume flat prior across models posterior distribution of number of QTLs
204
June 1999NCSU QTL Workshop © Brian S. Yandell 204 Computing Bayes Factors samples from prior distribution –mean across Monte Carlo or MCMC runs –can be inefficient if prior differs from posterior samples from posterior distribution –weighted harmonic mean more efficient but unstable –care in choosing weight h() close to posterior –increase stability absorbing variance (normal to t dist)
205
June 1999NCSU QTL Workshop © Brian S. Yandell 205 Brassica Bayes Factors compare models for 1, 2, 3 QTL Bayes factor (and - 2log(LR) ) arrows point to favored model
206
June 1999NCSU QTL Workshop © Brian S. Yandell 206 QTLs and Polygenes phenotype = design + QTLs + polygenes + error QTLs:quantitative trait loci polygenes:many genes of small effect spread throughout genome distinction is arbitrary, depending on sample size magnitude of effects design/cross/marker polymorphism analogy to multiple regression
207
June 1999NCSU QTL Workshop © Brian S. Yandell 207 Polygenes and Inbred Lines same (raw) genetic correlation across cohort: –1/2 for DH –2/3 for F2 –4/7 for F3 –1/2 for RI modified by specific information: –major & minor QTLs –marker surrogates for polygenes (CIM)
208
June 1999NCSU QTL Workshop © Brian S. Yandell 208 Composite QTL model trait = mean + add + dom + other + error trait = effect_of_geno + other + error prob( trait | genos, effects, other ) other ( ): other linked and unlinked QTLs, and polygenes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.