Lecture 14 – Consensus Trees & Nodal Support

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

MCMC estimation in MlwiN
Bayesian inference “Very much lies in the posterior distribution” Bayesian definition of sufficiency: A statistic T (x 1, …, x n ) is sufficient for 
Bayesian Estimation in MARK
Sampling Distributions (§ )
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Course overview Tuesday lecture –Those not presenting turn in short review of a paper using the method being discussed Thursday computer lab –Turn in short.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Bayesian Inference Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis Technical.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Probabilistic methods for phylogenetic trees (Part 2)
Standard error of estimate & Confidence interval.
Lecture 8 – Searching Tree Space. The Search Tree.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Terminology of phylogenetic trees
BINF6201/8201 Molecular phylogenetic methods
Lecture 15 - Hypothesis Testing A. Competing a priori hypotheses - Paired-Sites Tests Null Hypothesis : There is no difference in support for one tree.
Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,
Tree Confidence Have we got the true tree? Use known phylogenies Unfortunately, very rare Hillis et al. (1992) created experimental phylogenies using phage.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Pearson Prentice-Hall, Inc.Chap 8-1 Statistics for Managers Using Microsoft® Excel 5th Edition.
A brief introduction to phylogenetics
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE.
Molecular Systematics
Testing alternative hypotheses. Outline Topology tests: –Templeton test Parametric bootstrapping (briefly) Comparing data sets.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Statistical Methods. 2 Concepts and Notations Sample unit – the basic landscape unit at which we wish to establish the presence/absence of the species.
Hypothesis Testing. Statistical Inference – dealing with parameter and model uncertainty  Confidence Intervals (credible intervals)  Hypothesis Tests.
Bayesian II Spring Major Issues in Phylogenetic BI Have we reached convergence? If so, do we have a large enough sample of the posterior?
Statistical Inference for the Mean Objectives: (Chapter 8&9, DeCoursey) -To understand the terms variance and standard error of a sample mean, Null Hypothesis,
Canadian Bioinformatics Workshops
Hierarchical Models. Conceptual: What are we talking about? – What makes a statistical model hierarchical? – How does that fit into population analysis?
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Markov Chain Monte Carlo in R
Lecture 15 - Hypothesis Testing
Lecture 14 – Consensus Trees & Nodal Support
MCMC Output & Metropolis-Hastings Algorithm Part I
Confidence Interval Estimation
Lecture 16 – Molecular Clocks
Hypothesis Testing and Confidence Intervals (Part 2): Cohen’s d, Logic of Testing, and Confidence Intervals Lecture 9 Justin Kern October 17 and 19, 2017.
Maximum likelihood (ML) method
Artificial Intelligence
CS 4/527: Artificial Intelligence
Bayesian inference Presented by Amir Hadadi
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
CAP 5636 – Advanced Artificial Intelligence
Discrete Event Simulation - 4
Summary and Recommendations
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Confidence Interval Estimation
Statistical NLP: Lecture 4
Ch13 Empirical Methods.
CS 188: Artificial Intelligence
CS 188: Artificial Intelligence Fall 2008
Lecture 8 – Searching Tree Space
Sampling Distributions (§ )
CS639: Data Management for Data Science
Lecture 11 – Increasing Model Complexity
Summary and Recommendations
Mathematical Foundations of BME Reza Shadmehr
Classical regression review
Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 12
Presentation transcript:

Lecture 14 – Consensus Trees & Nodal Support Consensus trees are best treated as visual summaries of the agreement and disagreement between (among) source trees, and consensus trees can be generated from any number of trees (> 1). I.A. Strict Consensus Trees.

I.B. Adams Consensus Trees They essentially work by congruence of three-taxon statements and simply relocate offending taxa. This is why they have more resolution than strict consensus trees.

I.C. Majority Rule Consensus Trees Nodes that occur in the majority of the source are left resolved in the MRConTree.

I.D. Consensus Trees are Not Phylogenies B 1 0 1 0 0 0 C 0 1 0 1 1 1 D 0 1 0 1 1 1 E 1 1 1 1 1 0 Characters 2 & 5 are homoplasious. Characters 1 & 3 are homplasious.

I.D. Consensus Trees are Not Phylogenies If we do the wrong thing, by treating the consensus tree as a phylogeny and map the characters onto it, all four of those characters are homoplasious. A B E C D Strict Consensus Tree is 10 steps. Therefore, one should never optimize characters or estimate branch lengths on a consensus tree.

therefore the each of these two groups has a decay index of 0. Nodal Support Decay Index The idea is that if an MP tree is, say 996 steps, and group (A,B,C) is found on that tree, we may wish to know how much longer is the best tree that doesn’t contain group (A,B,C). On one tree, Homo & Pan are sister taxa whereas on the other Pan and Gorilla are sisters; therefore the each of these two groups has a decay index of 0. The group (Homo, Pan, Gorilla) occurs on both. The shortest tree that doesn’t contain this group is 1012 steps. Therefore, the Homo/Pan/Gorilla node has a decay index of 16.

Decay Index - We can do this for each node. II. Nodal Support Decay Index - We can do this for each node. However, these are just numbers, and it’s very difficult to decide how large a decay index is meaningful.

Decay Index (Baron et al. 2017. Nature, 543:501) Nodal Support Decay Index (Baron et al. 2017. Nature, 543:501) “Very well supported, with Bremer support of 4.”

Nodal Support Bootstrap The method was developed by Efron in 1979 to develop confidence intervals in cases where the true underlying distribution of a variable can’t be assessed. The idea is that the distribution of the original sample (if it’s large enough) will convey much about the underlying true distribution. We can treat our original sample as if it is the true distribution and take repeated samples of our original sample to mimic the variability we would see if we could resample from the true distribution. In the case of phylogenies, we are interested in resampling characters from the original sample (i.e., the data) that we have collected.

Nodal Support Bootstrap

Nodal Support Bootstrap

Nodal Support Bootstrap 1) The size of the original sample is a critical factor in the performance of the bootstrap. This makes intuitive sense. The original sample is taken as a proxy for the underlying parametric distribution; it must be large enough to reflect relevant features of the distribution. 2) For large data sets, bootstraps can take a very long time. Felsenstein (1993 Phylip Manual) has suggested that the uncertainty captured by bootstrap resampling is much larger that the error associated with extensive branch swapping in estimating optimum tree for each bootstrap replicate. FastBoot analysis – starting tree with no branch swapping. Full searches – include model selection, parameter optimization, etc. Intermediate – No modsel or parameter optimization, cursory swapping. 3) As long as we’re using a consistent estimator of phylogeny, bootstrap values on nodes tend to be conservative as confidence intervals (Hillis & Bull, 1993. Syst. Biol., 42:182).

Bayesian Estimation of Nodal Support Bayes’ Theorem: P (Hi) is the prior probability of hypothesis i, (this is the quantification of prior knowledge). P (D | Hi) is the probability of the data, given hypothesis i (this is the regular likelihood function). where P (Hi | D) is the posterior probability of hypothesis i, given the data, D; The denominator is the product of these summed across all s competing hypotheses.

Nodal Support Bayesian Estimation of Nodal Support For phylogenies: which calculates the probability of tree i, given the data, the prior, and assuming the model of nucleotide substitution. So the P (ti), the prior probability of tree i, is usually set to 1/s, where s is the number of possible trees. This represents an admission of ignorance, and is called a flat prior or a uniform prior (or an uninformative prior). The summation in the denominator then is across all topologies.

Nodal Support Bayesian Estimation of Nodal Support Markov Chain Monte Carlo (MCMC) permits Bayesian estimation. We start the MCMC with some tree (ti), let’s say it’s a random tree. We then propose a change in the tree (generate ti+1) by proposing a random NNI. P (ti+1) = P (ti) = 1/s. So if R > 1, we accept the new tree. If R < 1, we draw a random probability (between 0 & 1). If this is < R, we accept the change, if not, we return to the previous state (tree). We’ll accept slightly worse trees more often than we’ll accept much worse trees.

Bayesian Estimation of Nodal Support If we run the chain long enough, eventually the frequencies of states in the chain (i.e., trees), will converge to their frequencies in the posterior probability distribution. First, we have to discard early generations of the chain (the burn-in). This is a first approximation.

Nodal Support Bayesian Estimation of Nodal Support Second, all that matters in theory is that all the potential manifestations of the states (i.e., all trees) can be reached by a series of single steps of our proposal mechanism (e.g., SPR branch swaps). MCMC will be ergodic. Convergence properties depend on proposal mechanisms Lakner et al. (2008. Syst. Biol. 57:86) evaluated seven types of topology proposals for several empirical data sets. These included local branch changes that collapse branch lengths, and variations on NNI, SPR, and TBR swaps. Adding these eSPR,& eTBR moves improves convergence.

Bayesian Estimation of Nodal Support Third, convergence diagnostics must be address the parameter of interest. Usually, that’s topology or nodal probabilities. Poor convergence Good convergence

Nodal Support Bayesian Estimation of Nodal Support Summarizing the posterior distribution of trees. 0.90 73/47 1.00 99/100

Nodal Support Bayesian Estimation of Nodal Support Posterior probabilities are often higher than bootstrap values. Under best-case scenario (correct model), both ML bootstrap values and BPP are conservative, but BPP are less so.

Bayesian Estimation of Nodal Support Need to place priors on all parameters that we treat as random variables. Parameters -------------------- Revmat 1 Statefreq 2 Shape 3 Pinvar 4 Topology 5 Brlens 6   1 -- Parameter = Revmat Prior = Dirichlet(1.00,1.00,1.00,1.00,1.00,1.00) 2 -- Parameter = Statefreq Prior = Dirichlet 3 -- Parameter = Shape Prior = Uniform(0.05,50.00) 4 -- Parameter = Pinvar Prior = Uniform(0.00,1.00) 5 -- Parameter = Topology Prior = All topologies equally probable a priori 6 -- Parameter = Brlens Prior = Branch lengths are unconstrained: Exponential(10.0)

Bayesian Estimation of Nodal Support We then need to propose changes to each variable. The chain will use the following moves: With prob. Chain will change 3.57 % param. 1 (revmat) with multiplier 3.57 % param. 2 (state frequencies) with Dirichlet proposal 3.57 % param. 3 (gamma shape) with multiplier 3.57 % param. 4 (prop. invariants) with beta proposal 53.57 % param. 5 (topology and branch lengths) with LOCAL 10.71 % param. 5 (topology and branch lengths) with extending TBR 10.71 % param. 6 (branch lengths) with multiplier 10.71 % param. 6 (branch lengths) with nodeslider So we can actually incorporate uncertainty in each of these parameters into our inference of interest.

Nodal Support Bayesian Estimation of Nodal Support So we need to understand the difference between Marginal and Joint likelihoods and probabilities (Holder & Lewis 2003). Joint: evaluated at MLE. Marginal: evaluated across distribution In each case, TA has higher joint likelihood than TB lower marginal likelihood