The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

CS 598AGB What simulations can tell us. Questions that simulations cannot answer Simulations are on finite data. Some questions (e.g., whether a method.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Random Variable A random variable X is a function that assign a real number, X(ζ), to each outcome ζ in the sample space of a random experiment. Domain.
Fair-Balance Paradox, Star- tree Paradox, and Bayesian Phylogenetics Ziheng Yang, Mol. Biol. Evol Presented by Caroline Uhler and Anna-Sapfo Malaspinas.
Phylogenetic Trees Lecture 4
Bayesian inference Gil McVean, Department of Statistics Monday 17 th November 2008.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
. Maximum Likelihood (ML) Parameter Estimation with applications to inferring phylogenetic trees Comput. Genomics, lecture 7a Presentation partially taken.
Distance Methods. Distance Estimates attempt to estimate the mean number of changes per site since 2 species (sequences) split from each other Simply.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Presenting: Assaf Tzabari
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
G. Cowan Lectures on Statistical Data Analysis 1 Statistical Data Analysis: Lecture 8 1Probability, Bayes’ theorem, random variables, pdfs 2Functions of.
CISC667, F05, Lec16, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (III) Probabilistic methods.
Probabilistic methods for phylogenetic trees (Part 2)
Introduction to Bayesian Parameter Estimation
Lecture 13 – Performance of Methods Folks often use the term “reliability” without a very clear definition of what it is. Methods of assessing performance.
Estimating Evolutionary Distances from DNA Sequences Lecture 14 ©Shlomo Moran, parts based on Ilan Gronau.
G. Cowan Lectures on Statistical Data Analysis Lecture 10 page 1 Statistical Data Analysis: Lecture 10 1Probability, Bayes’ theorem 2Random variables and.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
7. Bayesian phylogenetic analysis using MrBAYES UST Jeong Dageum Thomas Bayes( ) The Phylogenetic Handbook – Section III, Phylogenetic.
Input for the Bayesian Phylogenetic Workflow All Input values could be loaded as text file or typing directly. Only for the multifasta file is advised.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Molecular phylogenetics
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
G. Cowan Lectures on Statistical Data Analysis Lecture 3 page 1 Lecture 3 1 Probability (90 min.) Definition, Bayes’ theorem, probability densities and.
Tree Inference Methods
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
PAML: Phylogenetic Analysis by Maximum Likelihood Ziheng Yang Depart of Biology University College London
ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.
A brief introduction to phylogenetics
Lab3: Bayesian phylogenetic Inference and MCMC Department of Bioinformatics & Biostatistics, SJTU.
The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
- 1 - Bayesian inference of binomial problem Estimating a probability from binomial data –Objective is to estimate unknown proportion (or probability of.
Statistical stuff: models, methods, and performance issues CS 394C September 16, 2013.
1 Chapter 12 Probabilistic Reasoning and Bayesian Belief Networks.
Bayesian Phylogenetics. Bayes Theorem Pr(Tree|Data) = Pr(Data|Tree) x Pr(Tree) Pr(Data)
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 07: BAYESIAN ESTIMATION (Cont.) Objectives:
1 Probability and Statistical Inference (9th Edition) Chapter 5 (Part 2/2) Distributions of Functions of Random Variables November 25, 2015.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
Statistics Sampling Distributions and Point Estimation of Parameters Contents, figures, and exercises come from the textbook: Applied Statistics and Probability.
Univariate Gaussian Case (Cont.)
Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
Lecture 14 – Consensus Trees & Nodal Support
Distance-based phylogeny estimation
MCMC Stopping and Variance Estimation: Idea here is to first use multiple Chains from different initial conditions to determine a burn-in period so the.
Lecture 16 – Molecular Clocks
Ch3: Model Building through Regression
Multiple Alignment and Phylogenetic Trees
CS 581 Tandy Warnow.
The Most General Markov Substitution Model on an Unrooted Tree
LECTURE 09: BAYESIAN LEARNING
Lecture 14 – Consensus Trees & Nodal Support
Presentation transcript:

The star-tree paradox in Bayesian phylogenetics Bengt Autzen Department of Philosophy, Logic and Scientific Method LSE

Overview 1)Introduction 2)The facts 3)The (alleged) paradox 4)The star-tree paradox and the meaning of posterior probabilities of trees 5)Symmetry

1. Introduction What is the ‘star-tree paradox’ about? “The star-tree paradox refers to the conjecture that the posterior probabilities for […] the three rooted trees for three species […] do not approach 1/3 when the data are generated using the star tree and when the amount of data approaches infinity.” (Yang, 2007)

2. The facts 2.1 Phylogenetic estimation problem given three species a)The tree topologies: Star-tree T 0 and three binary trees T 1, T 2, and T 3 T 0 T

2. The facts (cont.) b) The (synthetic) data: Three DNA sequences, n nucleotides long, nucleotides are binary characters. Hence, 2^3 = 8 possible data configurations at a nucleotide site (‘site pattern’) or four site patterns xxx, xxy, yxx, and xyx, where x and y are any two different nucleotides. Data are summarized as counts of these four site patterns n 0, n 1, n 2, and n 3.

2. The facts (cont.) c) Model of nucleotide substitution: 2-state symmetric Markov process Probabilities of site patterns under tree T 1 :

2. The facts (cont.) d) Likelihood function for tree T 1 (with proportionality constant C):

2. The facts (cont.) Similarly for trees T 2 and T 3 :

2. The facts (cont.) e) Prior probabilities: The three binary trees T 1, T 2 and T 3 have equal prior probability 1/3. Hence, the star tree T 0 gets assigned 0 prior probability. The prior distribution on branch lengths t 0, t 1 is the same for each tree with a smooth joint probability density function that is bounded and everywhere nonzero (e.g. exponential prior (Yang and Rannala (2005)).

2. The facts (cont.) 2.2 Steel and Matsen’s theorem (Steel and Matsen (2007)): Consider sequences of length n generated by the star-tree with strictly positive edge length t and let n 0, n 1, n 2, and n 3 be the resulting data (in terms of site patterns). Further, the aforementioned assumptions regarding the process of nucleotide substitution and the prior probability distributions hold. Then…

2. The facts (cont.) Steel and Matsen’s theorem (cont.): For any ε>0, and each binary tree T i (i=1,2,3), the probability that n 0, n 1, n 2, and n 3 has the property that P(T i |n 0, n 1, n 2, n 3 ) > 1- ε does not converge to 0 as n tends to infinity.

2. The facts (cont.) 2.3 Simulation Results Yang (2007): For data sets of size n = 3*10^9 simulated under the star-tree the posterior probability distribution of the three binary trees fails to form a uniform distribution (1/3, 1/3, 1/3) for several data sets. That is, at least one of the three posterior probabilities is > 0.95 in 4.23% of data sets, and in 0.79% of data sets at least one of the three posterior probabilities is > In 17.3% of data sets at least one of the three posterior probabilities is <0.05 and in 2.6% of data sets at least one of the three posterior probabilities is <0.01.

3. The (alleged) paradox Question: What is paradoxical about the ‘star-tree paradox’? Steel and Matsen’s theorem as well as simulation results are in conflict with Yang’s criteria which a ‘reasonable’ Bayesian method should satisfy…

3. The (alleged) paradox Yang’s criteria (Yang, 2007): 1)The posterior probabilities of the three binary trees converge to the uniform distribution (1/3, 1/3, 1/3) when n tends to infinity if the ‘true’ tree is the star tree and only the three binary trees get assigned positive priors. 2)If a binary tree is the true tree, its posterior probability should converge to 1 when n tends to infinity.

3. The (alleged) paradox How to justify Yang’s criteria? Maybe they follow from the meaning of posterior probabilities of trees?

4. The star-tree paradox and the meaning of posterior probabilities of trees A suggested interpretation of PP of trees: “We use the case where the full model is correct – that is, where the analysis model matches the simulation model – to illustrate the interpretation of posterior probabilities for trees. When the data are simulated under the prior and when the full analysis model is correct, the posterior for a tree is the probability that the tree is true.” (Yang and Rannala, 2005, p. 457)

4. The star-tree … (cont.) What do YR mean by ‘probability that a tree is correct’? For a given tree with PP x, the frequency that the PP of the true tree (i.e. data generating tree) in an interval of length 0.2 containing PP x is called the ‘probability that the tree with PP x is correct’.

4. The star-tree … (cont.) Example: Trees with PP between 0.94 and 0.96 have all PP close to Among them, about 95% are the posterior probabilities of the true tree while others (about 5%) are posterior probabilities for one of the two incorrect trees.

4. The star-tree … (cont.) Problem with YR’s interpretation of PP: In the case of criterion 1) the simulation and the analysis model do not match! That is, the star-tree topology gets zero prior in the analysis model. The relation between PP of a tree and what Yang and Rannala call ‘probability that the tree is correct’ is an empirical phenomenon, not a conceptual necessity.

4. The star-tree … (cont.) Where does this leave us regarding the meaning of PP of trees? Prior and posterior probabilities as a (subjective) degrees of belief?

5. Symmetry A further justification for Yang’s criterion 1) might come from symmetry considerations. Aren’t the three binary trees – in an intuitive way - equally similar (or dissimilar) to the star tree?

5. Symmetry However, why should the symmetry of the problem result in the convergence of the PP of trees to the uniform distribution (1/3, 1/3, 1/3)? There are symmetries to be found in behaviour of the PP for trees when n tends to infinity, but they are of a different kind (see Matsen/Steel’s theorem).

References Steel, M. and Matsen, F. (2007): ‘The Bayesian “Star Paradox” Persists for Long Finite Sequences’, in Mol. Biol. Evol. 24(4) Yang, Z. (2007): ‘Fair-Balance Paradox, Star-tree Paradox, Bayesian Phylogenetics’, in Mol. Biol. Evol. 24(8) Yang, Z. and Rannala, B. (2005): ‘Branch- Length Prior influences Bayesian posterior Probability of Phylogeny’, in Syst. Biol. 54(3)