Presentation is loading. Please wait.

Presentation is loading. Please wait.

Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761.

Similar presentations


Presentation on theme: "Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761."— Presentation transcript:

1 Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761

2 Bayesian inference Computational phylogenetics CSC 10.-12.12.2006 Mikko Kolkkala

3 How to read a tree?

4 Bayesian inference Only very recently phylogenetical applications (”Why”? We’ll return to that…) Controversial philosophy Subjective probability concept; degrees of belief measured as probabilities A learning process Prior and posterior probabilities Spam filters Subjective! Quack!

5 p = probability D = Data Θ = model/hypothesis/parameters | = read: ”provided that" Conditional probability: ”|”

6 p( a six | loaded die ) 1/2 An example Suppose we have ten identical looking dice, nine ordinary, one die loaded so that a six appears with probability 1/2. Let’s pick one die randomly. The probability of it being loaded is (of course) 1/10 (= prior) Next, we roll the die once - and get a six: What is the probability that we have picked the loaded die now? p( loaded die ) 1/10 p(a six) 1/2 1/10 + 1/6 9/10 == 1/4 (= posterior) p( loaded die | a six ) =

7 An exercise A reliable test? Test for a rare disease (prevalence 0.1 %): Disease - positive result with probability 0.99 No disease - positive result with probability 0.05. What is the probability that the test is positive but the individual tested has not the disease? Answer: 0.98 (http://en.wikipedia.org/wiki/Bayesian_inference)

8 p(data | model) p(model) p(data) p(model | data) = “loaded die"  model “a six"  data

9 From dice to biology: Data: DNA-alignment Models: nucleotide substitution models tree shape and branch lengths p(data | model) p(model) p(data) p(model | data) =

10 Posterior distribution Prior distribution Likelihood function

11 If this Bayesian thing is so excellent why hasn’t It been used in phylogenetic analyses? No-one can solve the equations! Numerical solutions possible - but only with powerful computers

12 MCMC = Markov Chain Monte Carlo Parameters Tree topology Branch lenghts Probabilities for nucleotide substitutions “ ”Exploring the tree space” Parameter space Probability © Fredrik Ronqvist

13 Metropolis-Coupled Markov Chain Monte Carlo MCMCMC = (MC) 3 “Heated chains" “Flattened" parameter landscape

14 © John Huelsenbeck (MC) 3

15 © John Huelsenbeck (MC) 3

16 © John Huelsenbeck (MC) 3 Swap of states

17 p-values directly No need for bootstrapping

18 F81 JC HKY85 K80 K81 TrN TVM TIM SYM GTR Standard models Substitution types: 1-6 Nucleotide frequences: equal/ estimated from the data Invariable sites: no/ estimate Evolutionary rate: equal/ Γ-distributed "+I" "+G" ETC.

19 .aaa a.aa aa.a aaa. A A CG G T T C π A =π c =π g =π T =1/4 JC Jukes-Cantor GTR General time-reversible model 0.75.

20 Characters independet? No way. Time reversible:G  C = C  G ? RNA-genes

21 SSR-models (site-specific rates) Different evolutionary rate for 1./2./3. positions of codons Problematic (see: Buckley ym. 2001 Syst.Biol. 50:67-86) Coding regions

22 But – how to chooce the model? Well, nobody said it would be easy. 30 How many parameters Does it take to fit an elephant?

23 “What do you consider the largest map that would be really useful?" "About six inches to the mile." "Only six inches! […] We actually made a map of the country, on the scale of a mile to the mile!" (Lewis Carroll 1893)

24 Choosing a model AIC (Akaike information criterion) AICc (Consistent Akaike information criterion) BIC (Bayesian information criterion) Programs: Modeltest (bad) FindModel (plop!) MrAic ?

25

26 Redelings, B. D. & Suchard, M.A 2005: Joint Bayesian estimation of alignment and phylogeny. Syst. Biol. 54: 401-418 Lunter, G. et al. 2005: Bayesian coestimation of phylogeny and sequence alignment. BMC Bioinformatics 6:83 Most commonly used program: MrBayes Future? Alignment and phylogeny co-estimation BAli-Phy (Redeling & Suchard 2005) Beast (Lunter et al. 2005)

27 Sweden 24 987 cities World record Cities (N) 10 69 Routes (N!) 10! = 3 628 800 69! = 1.7 x 10 98 Travelling salesman Find the shortest route through cities (another NP-complete problem) 84.8 CPU years How about studying them all? With rate million routes / sec. it would take 5x10 84 years 24 987 24 987! = ?

28 Acknowledgements Fredrik Ronqvist John Huelsenbeck Wife and Mom

29 -Command-line interface -UNIX, Macintosh and PC platforms MrBayes Ronqvist, F. & Huelsenbeck, J. 2001: Bioinformatics 17: 754-755 (2005: v. 3.1.)

30 Homepage Manual Wiki, FAQ Mailing list (archives) MrBayes

31 Running the analysis All you have to do: Type execute filename.nex * at the MrBayes > prompt and press enter * Replace filename.nex with your nexus-file containing MrBayes commands (type full path if the file is not in the same folder as MrBayes program).

32 #nexus begin data; dimensions ntax=6 nchar=20; format datatype=dna; matrix Otus1 aaaaaaaaaaaaaaaaaaaa Otus2 aaaaaaaaaaaaaaaaaaaa Otus3 aaaaaaaaaaaaaaaaaaaa Otus4 cccccccccccccccccccc Otus5 gggggggggggggggggggg Otus6 tttttttttttttttttttt ; end; begin mrbayes; mcmcp ngen= 100000 samplefreq=100; mcmc; end; MrBayes – an example nexus file

33 A real thing:

34 MrBayes After the run Summarize the parameter values, type: sump burnin= Summarize the trees, type: sumt burnin= With a proper burnin value

35 burn-in (C) Fredrik Ronqvist

36 MrBayes After the run Burnin discards initial values before the analysis reached convergence (burnin=2500 if you have run a million generations, sampled every 100th of them, and want to discard the first 25%) Note: you have to run “enough” generations -Check the plot generated by sump; there should be no obvious trends -The standard deviation of split frequencies should be less than 0.01.

37 Restriction: Can handle only 24 substitution models Command for example: lset nst=6 rates=invgamma MrBayes Models Confused? Try typing: help lset Priors, command: prset Defaults (try help prset) should work fine for most analysis

38 Cladistic parsimony Prefer the tree with the fewest number of evolutionary steps – only parsimony informative sites count Otus1 aaaaaaaaaaaaaaaaaaaa Otus2 aaaaaaaaaaaaaaaaaaaa Otus3 aaaaaaaaaaaaaaaaaaaa Otus4 cccccccccccccccccccc Otus5 gggggggggggggggggggg Otus6 tttttttttttttttttttt Otus1 Otus2 Otus3 Otus4 Otus5 Otus6

39 Fain ja Houde 2004: Evolution 58: 2558-2573

40 Exercises: 1. Study program defaults with help command (e.g. lset and prset) 2. Run program with a few arbitrary sequences (e.g. palikka.nex) -Try sump and sumt commands with different burnin values -Study the files made by the program – where is the tree? 3. Run program with some real data (e.g. your own or birds.txt) -Align sequences -Put them into a nexus file -Try to find out how to select JC, K2P and GTR model with gamma-distributed rate variation and without with correction for invariable sites and without -Try the model suggested by FindModel (AIC-criterion) - MrBayes


Download ppt "Leptothorax gredosi Leptothorax racovitzae Camponotus herculeanus 0.99 0.58 0.99 0.96 0.76 0.91 1.00 0.58 1.00 0.99 0.91 Thomas Bayes 1702-1761."

Similar presentations


Ads by Google