Download presentation
Presentation is loading. Please wait.
Published byHilary McKinney Modified over 9 years ago
1
Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761
2
Bayesian inference Computational phylogenetics CSC 10.-12.12.2006 Mikko Kolkkala
3
How to read a tree?
4
Bayesian inference Only very recently phylogenetical applications (”Why”? We’ll return to that…) Controversial philosophy Subjective probability concept; degrees of belief measured as probabilities A learning process Prior and posterior probabilities Spam filters Subjective! Quack!
5
p = probability D = Data Θ = model/hypothesis/parameters | = read: ”provided that" Conditional probability: ”|”
6
p( a six | loaded die ) 1/2 An example Suppose we have ten identical looking dice, nine ordinary, one die loaded so that a six appears with probability 1/2. Let’s pick one die randomly. The probability of it being loaded is (of course) 1/10 (= prior) Next, we roll the die once - and get a six: What is the probability that we have picked the loaded die now? p( loaded die ) 1/10 p(a six) 1/2 1/10 + 1/6 9/10 == 1/4 (= posterior) p( loaded die | a six ) =
7
An exercise A reliable test? Test for a rare disease (prevalence 0.1 %): Disease - positive result with probability 0.99 No disease - positive result with probability 0.05. What is the probability that the test is positive but the individual tested has not the disease? Answer: 0.98 (http://en.wikipedia.org/wiki/Bayesian_inference)
8
p(data | model) p(model) p(data) p(model | data) = “loaded die" model “a six" data
9
From dice to biology: Data: DNA-alignment Models: nucleotide substitution models tree shape and branch lengths p(data | model) p(model) p(data) p(model | data) =
10
Posterior distribution Prior distribution Likelihood function
11
If this Bayesian thing is so excellent why hasn’t It been used in phylogenetic analyses? No-one can solve the equations! Numerical solutions possible - but only with powerful computers
12
MCMC = Markov Chain Monte Carlo Parameters Tree topology Branch lenghts Probabilities for nucleotide substitutions “ ”Exploring the tree space” Parameter space Probability © Fredrik Ronqvist
13
Metropolis-Coupled Markov Chain Monte Carlo MCMCMC = (MC) 3 “Heated chains" “Flattened" parameter landscape
14
© John Huelsenbeck (MC) 3
15
© John Huelsenbeck (MC) 3
16
© John Huelsenbeck (MC) 3 Swap of states
17
p-values directly No need for bootstrapping
18
F81 JC HKY85 K80 K81 TrN TVM TIM SYM GTR Standard models Substitution types: 1-6 Nucleotide frequences: equal/ estimated from the data Invariable sites: no/ estimate Evolutionary rate: equal/ Γ-distributed "+I" "+G" ETC.
19
.aaa a.aa aa.a aaa. A A CG G T T C π A =π c =π g =π T =1/4 JC Jukes-Cantor GTR General time-reversible model 0.75.
20
Characters independet? No way. Time reversible:G C = C G ? RNA-genes
21
SSR-models (site-specific rates) Different evolutionary rate for 1./2./3. positions of codons Problematic (see: Buckley ym. 2001 Syst.Biol. 50:67-86) Coding regions
22
But – how to chooce the model? Well, nobody said it would be easy. 30 How many parameters Does it take to fit an elephant?
23
“What do you consider the largest map that would be really useful?" "About six inches to the mile." "Only six inches! […] We actually made a map of the country, on the scale of a mile to the mile!" (Lewis Carroll 1893)
24
Choosing a model AIC (Akaike information criterion) AICc (Consistent Akaike information criterion) BIC (Bayesian information criterion) Programs: Modeltest (bad) FindModel (plop!) MrAic ?
26
Redelings, B. D. & Suchard, M.A 2005: Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54: 401-418 Lunter, G. et al. 2005: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6:83 Most commonly used program: MrBayes Future? Alignment and phylogeny co-estimation BAli-Phy (Redeling & Suchard 2005) Beast (Lunter et al. 2005)
27
Sweden 24 987 cities World record Cities (N) 10 69 Routes (N!) 10! = 3 628 800 69! = 1.7 x 10 98 Travelling salesman Find the shortest route through cities (another NP-complete problem) 84.8 CPU years How about studying them all? With rate million routes / sec. it would take 5x10 84 years 24 987 24 987! = ?
28
Acknowledgements Fredrik Ronqvist John Huelsenbeck Wife and Mom
29
-Command-line interface -UNIX, Macintosh and PC platforms MrBayes Ronqvist, F. & Huelsenbeck, J. 2001: Bioinformatics 17: 754-755 (2005: v. 3.1.)
30
Homepage Manual Wiki, FAQ Mailing list (archives) MrBayes
31
Running the analysis All you have to do: Type execute filename.nex * at the MrBayes > prompt and press enter * Replace filename.nex with your nexus-file containing MrBayes commands (type full path if the file is not in the same folder as MrBayes program).
32
#nexus begin data; dimensions ntax=6 nchar=20; format datatype=dna; matrix Otus1 aaaaaaaaaaaaaaaaaaaa Otus2 aaaaaaaaaaaaaaaaaaaa Otus3 aaaaaaaaaaaaaaaaaaaa Otus4 cccccccccccccccccccc Otus5 gggggggggggggggggggg Otus6 tttttttttttttttttttt ; end; begin mrbayes; mcmcp ngen= 100000 samplefreq=100; mcmc; end; MrBayes – an example nexus file
33
A real thing:
34
MrBayes After the run Summarize the parameter values, type: sump burnin= Summarize the trees, type: sumt burnin= With a proper burnin value
35
burn-in (C) Fredrik Ronqvist
36
MrBayes After the run Burnin discards initial values before the analysis reached convergence (burnin=2500 if you have run a million generations, sampled every 100th of them, and want to discard the first 25%) Note: you have to run “enough” generations -Check the plot generated by sump; there should be no obvious trends -The standard deviation of split frequencies should be less than 0.01.
37
Restriction: Can handle only 24 substitution models Command for example: lset nst=6 rates=invgamma MrBayes Models Confused? Try typing: help lset Priors, command: prset Defaults (try help prset) should work fine for most analysis
38
Cladistic parsimony Prefer the tree with the fewest number of evolutionary steps – only parsimony informative sites count Otus1 aaaaaaaaaaaaaaaaaaaa Otus2 aaaaaaaaaaaaaaaaaaaa Otus3 aaaaaaaaaaaaaaaaaaaa Otus4 cccccccccccccccccccc Otus5 gggggggggggggggggggg Otus6 tttttttttttttttttttt Otus1 Otus2 Otus3 Otus4 Otus5 Otus6
39
Fain ja Houde 2004: Evolution 58: 2558-2573
40
Exercises: 1. Study program defaults with help command (e.g. lset and prset) 2. Run program with a few arbitrary sequences (e.g. palikka.nex) -Try sump and sumt commands with different burnin values -Study the files made by the program – where is the tree? 3. Run program with some real data (e.g. your own or birds.txt) -Align sequences -Put them into a nexus file -Try to find out how to select JC, K2P and GTR model with gamma-distributed rate variation and without with correction for invariable sites and without -Try the model suggested by FindModel (AIC-criterion) - MrBayes
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.