Consensus Consensus tree A consensus tree summarizes information common to two or more trees. bcdeabcdeabcdea.

Slides:



Advertisements
Similar presentations
Bootstrapping (non-parametric)
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Huffman code and ID3 Prof. Sin-Min Lee Department of Computer Science.
Chapter 13 – Boot Strap Method. Boot Strapping It is a computer simulation to generate random numbers from a sample. In Excel, it can simulate 5000 different.
Estimating the reliability of a tree Reconstructed phylogenetic trees are almost certainly wrong. They are estimates of the true tree. But how reliable.
Probability Theory Part 1: Basic Concepts. Sample Space - Events  Sample Point The outcome of a random experiment  Sample Space S The set of all possible.
Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Molecular Evolution Revised 29/12/06
Multiple Sequence Alignment (MSA) and Phylogeny. Clustal X.
Combining genes in phylogeny And How to test phylogeny methods … Tal Pupko Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences,
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
Multiple Sequence Alignment (MSA) and Phylogeny. One of the options to get multiple sequence Fasta file.
Sequence alignment: Removing ambiguous positions: Generation of pseudosamples: Calculating and evaluating phylogenies: Comparing phylogenies: Comparing.
Prosite UCSC Genome Browser MSAs and Phylogeny Exercise 2.
Phylogeny. Reconstructing a phylogeny  The phylogenetic tree (phylogeny) describes the evolutionary relationships between the studied data  The data.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Tree Evaluation Tree Evaluation. Tree Evaluation A question often asked of a data set is whether it contains ‘significant cladistic structure’, that is.
Lecture 24 Inferring molecular phylogeny Distance methods
Probabilistic methods for phylogenetic trees (Part 2)
Lecture III. Uniform Probability Measure I think that Bieren’s discussion of the uniform probability measure provides a firm basis for the concept of.
Processing & Testing Phylogenetic Trees. Rooting.
A Presentation on the Implementation of Decision Trees in Matlab
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Classification: Evaluation February 23,
EVALUATION David Kauchak CS 451 – Fall Admin Assignment 3 - change constructor to take zero parameters - instead, in the train method, call getFeatureIndices()
Maximum parsimony Kai Müller.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Terminology of phylogenetic trees
Learning what questions to ask. 8/29/03Decision Trees2  Job is to build a tree that represents a series of questions that the classifier will ask of.
1 Chapter Seven Introduction to Sampling Distributions Section 1 Sampling Distribution.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
16 September 2007 Coalescent Consequences for Consensus Cladograms J. H. Degnan 1, M. Degiorgio 2, D. Bryant 3, and N. A. Rosenberg 1,2 1 Dept. of Human.
Tree Confidence Have we got the true tree? Use known phylogenies Unfortunately, very rare Hillis et al. (1992) created experimental phylogenies using phage.
Phylogenetics and Coalescence Lab 9 October 24, 2012.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
From basic Concepts to Advanced applications Molecular Evolution & Phylogeny By Ofir Cohen The Bioinformatics Unit G.S. Wise Faculty of Life Science Tel.
Performance of Resampling Variance Estimation Techniques with Imputed Survey data.
Phylogeny GENE why is coalescent theory important for understanding phylogenetics (species trees)? coalescent theory lets us test our assumptions.
Phylogenetic Trees  Importance of phylogenetic trees  What is the phylogenetic analysis  Example of cladistics  Assumptions in cladistics  Frequently.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
The bootstrap, consenus-trees, and super-trees Phylogenetics Workhop, August 2006 Barbara Holland.
Bayes estimators for phylogenetic reconstruction Ruriko Yoshida.
Phylogenetic Analysis Gabor T. Marth Department of Biology, Boston College BI420 – Introduction to Bioinformatics Figures from Higgs & Attwood.
Supplementary Fig. S1. 16S RNA Neighbor-joining (NJ) tree of Brevibacterium metallicus sp. nov. NM2E3 T (in bold) and related species of genus Brevibacterium.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
Processing & Testing Phylogenetic Trees. Rooting.
CSCI 347, Data Mining Evaluation: Cross Validation, Holdout, Leave-One-Out Cross Validation and Bootstrapping, Sections 5.3 & 5.4, pages
Review of Statistical Terms Population Sample Parameter Statistic.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
Chapter 6 – Trees. Notice that in a tree, there is exactly one path from the root to each node.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Data Science Credibility: Evaluating What’s Been Learned
Keep all significant matches
Ungraded quiz Unit 6.
Cladistics (Ch. 22) Based on phylogenetics – an inferred reconstruction of evolutionary history.
Test for Mean of a Non-Normal Population – small n
Evolutionary history of related organisms
Summary and Recommendations
QQ Plot Quantile to Quantile Plot Quantile: QQ Plot:
Bootstrapping Jackknifing
Frio-Sabinal rivers (Cyprinella lepida) Nueces River (Cyprinella sp.
mtDNA Affinities of the Peoples of North-Central Mexico
Summary and Recommendations
But what if there is a large amount of homoplasy in the data?
Consensus Trees.
Phylogenetic analysis of AquK2P.
Evaluation David Kauchak CS 158 – Fall 2019.
Presentation transcript:

Consensus

Consensus tree A consensus tree summarizes information common to two or more trees. bcdeabcdeabcdea

Strict consensus Strict consensus includes only those groups that occur in all the trees being considered. bcdeabcdea bcdea bcdea Strict consensus

Problem: the split {ab} is found 2 out of 3 times, but is not shown in the strict consensus. bcdeabcdea bcdea bcdea Strict consensus

Majority-rule consensus Majority-rule consensus: splits that are found in the majority of the trees are shown. bcdeabcdea bcdea bcdea Majority-rule consensus

The percentage of the trees supporting each split are indicated. bcdeabcdea bcde 100 bcdea Majority-rule consensus a 67

Problem with Majority-rule consensus However in both trees if we consider only {b,c,d}, then in both trees b is closer to c than b to d, or c to d. bcde bcdae Majority-rule consensus= Strict consensus = a bcdea

Adams consensus Adams consensus will give the subtrees that are common to all trees. Adams consensus is useful where there are one or more sequences with unclear positions but there’s a subset of sequences that are common to all trees. bcdae Adams consensus bcdea cdaeb

Our goal is to evaluate the reliability of different clades. Problem with consensus of all the MP trees In other words, we do not want to rely just on one best tree, but rather estimate the support for each split based on many equally likely or highly likely trees.

Bootstrap (and jackknife)

Now we have a tree, but what is the robustness of this tree? African-1 Dugong African-2 Mam-3 African-ref Mam-4 Mam-6 Asian-2 African-3 Asian-3 Asian-1 Mam-5

Jackknife A. We create new data sets by sampling half of the characters. (random samples without replacement). We generate 100 pseudo-data sets. Note: we do not change the number of sequences, just the number of positions!

Jackknife B. We reconstruct a tree from each data set. POS: : TATTT 2 : CATTT 3 : CACTT N : AACTT POS: : TTTAT 2 : TAACC 3 : TAACC N : TGGGA POS: : TTGTA 2 : TAGAC 3 : TAAAC N : TGAGG Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4

C. We compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% In 67% of the data sets, the split between SP1+SP2 and the rest of the tree was found. Jackknife

Bootstrap The same as jackknife, but instead of sampling N/2 positions, we sample N positions with replacement.

Bootstrap A. Resample ( time) N 1 : ATCTG…A 2 : ATCTG…C 3 : ACTTA…C N : ACCTA…T N 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T N : AACTT…T 11244x N 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T N : TGGGA…T 47789…x N 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A N : AAAGG…C 15578…N

Bootstrap B. Reconstruct a tree from each data set N 1 : AATTT…T 2 : AATTT…G 3 : AACTT…T N : AACTT…T 11244x N 1 : TTTAT…T 2 : TAACC…G 3 : TAACC…T N : TGGGA…T 47789…x N 1 : AGGTA…T 2 : AGGAC…G 3 : AAAAC…A N : AAAGG…C 15578…N Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4

Bootstrap C. Compute the majority rule consensus. Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 Sp1 Sp2 Sp3 Sp4 67% 100% Remark: in a bootstrap tree branch lengths have no meaning.