Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.

Slides:



Advertisements
Similar presentations
Heuristic Search techniques
Advertisements

Algorithm Design Methods (I) Fall 2003 CSE, POSTECH.
G5BAIM Artificial Intelligence Methods
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
. Phylogenetic Trees (2) Lecture 13 Based on: Durbin et al 7.4, Gusfield , Setubal&Meidanis 6.1.
For Friday Finish chapter 5 Program 1, Milestone 1 due.
ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
Maximum Parsimony (MP) Algorithm. MP Algorithm  Character-based algorithm – does not use distances, but utilizes the character information in sequences.
Problem Set 2 Solutions Tree Reconstruction Algorithms
UPGMA and FM are distance based methods. UPGMA enforces the Molecular Clock Assumption. FM (Fitch-Margoliash) relieves that restriction, but still enforces.
Phylogenetic trees as a visualization tools for evolutionary classification.
1. 2 Rooting the tree and giving length to branches.
. Phylogenetic Trees - Parsimony Tutorial #12 Next semester: Project in advanced algorithms for phylogenetic reconstruction (236512) Initial details in:
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
Branch lengths Branch lengths (3 characters): A C A A C C A A C A C C Sum of branch lengths = total number of changes.
Maximum Parsimony.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
NJ was originally described as a method for approximating a tree that minimizes the sum of least- squares branch lengths – the minimum – evolution criterion.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Parsimony Anders Gorm Pedersen
. Phylogenetic Trees - Parsimony Tutorial #11 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.
Phylogenetic trees. ChimpHumanGorilla HumanChimpGorilla = ChimpGorillaHuman == GorillaChimp Trees.
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Counting evolutionary changes the parsimony method requires an algorithm that counts the number of evolutionary changes in a tree. Fitch W.M Syst.
Parsimony methods the evolutionary tree to be preferred involves ‘the minimum amount of evolution’ Edwards & Cavalli-Sforza Reconstruct all evolutionary.
Motif finding : Lecture 2 CS 498 CXZ. Recap Problem 1: Given a motif, finding its instances Problem 2: Finding motif ab initio. –Paradigm: look for over-represented.
Lecture 8 – Searching Tree Space. The Search Tree.
What Is Phylogeny? The evolutionary history of a group.
Maximum parsimony Kai Müller.
Terminology of phylogenetic trees
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
Distantly related organisms share structural similarities Function varies Explicable by common ancestry grasping leaping flying swimming running Homology.
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
1 Generalized Tree Alignment: The Deferred Path Heuristic Stinus Lindgreen
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Phylogenetics II.
Using traveling salesman problem algorithms for evolutionary tree construction Chantal Korostensky and Gaston H. Gonnet Presentation by: Ben Snider.
CSCI 4310 Lecture 6: Adversarial Tree Search. Book Winston Chapter 6.
For Friday Finish chapter 6 Program 1, Milestone 1 due.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Doug Raiford Lesson 9.  3 Approaches  Distance  Parsimony  Maximum Likelihood  Have already seen a distance method 12/18/20152Phylogenetics Part.
Phylogeny Ch. 7 & 8.
Optimization Problems
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Phylogenetic Trees - Parsimony Tutorial #13
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Maximum Parsimony Phenetic (distance based) methods are fast and often accurate but discard data and are not based on explicit character states at each.
Phylogenetic Trees - Parsimony Tutorial #12
Phylogenetic basis of systematics
Character-Based Phylogeny Reconstruction
Computer Science cpsc322, Lecture 14
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Lecture 8 – Searching Tree Space
Phylogeny.
Presentation transcript:

Parsimony and searching tree-space

The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived traits). More simply put, we assume that if species are similar it is usually due to common descent rather than due to chance.

Sometimes the data agrees ACCGCTTA ACTGCTTA ACTGCTAAACTGCTTA ACCCCTTA Time ACCCCATA ACCCCTTA ACCCCATA ACTGCTTA ACTGCTAA

Sometimes not ACCGCTTA ACTGCTTA ACTGCTAAACTGCTTC ACCCCTTA ACCCCTTC Time ACCCCATA ACCCCTTC ACCCCATA ACTGCTTC ACTGCTAA

Homoplasy When we have two or more characters that can’t possibly fit on the same tree without requiring one character to undergo a parallel change or reversal it is called homoplasy. ACCGCTTA ACTGCTTA ACTGCTAAACTGCTTC ACCCCTTA ACCCCTTC Time ACCCCATA

How can we choose the best tree? To decide which tree is best we can use an optimality criterion. Parsimony is one such criterion. It chooses the tree which requires the fewest mutations to explain the data. The Principle of Parsimony is the general scientific principle that accepts the simplest of two explanations as preferable.

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) (1,3),(2,4)

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) 0 (1,3),(2,4) 0 A A A A A A A A

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) 001 (1,3),(2,4) 002 C C T T C T C T CT CT T C

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) 0011 (1,3),(2,4) 0022 C C G G C G C G CG CG G C

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) (1,3),(2,4) T A T T T T A T T A A T

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) (1,3),(2,4) T T T A T T T A T A T A

S1 ACCCCTTC S2 ACCCCATA S3 ACTGCTTC S4 ACTGCTAA (1,2),(3,4) (1,3),(2,4) C A C A C C A A A C C A CA According to the parsimony optimality criterion we should prefer the tree (1,2),(3,4) over the tree (1,3),(2,4) as it requires the fewest mutations.

Maximum Parsimony The parsimony criterion tries to minimise the number of mutations required to explain the data The “Small Parsimony Problem” is to compute the number of mutations required on a given tree. For small examples it is straightforward to see how many mutations are needed Cat Dog Rat Mouse A A G G G A Cat Dog Rat A G G A A A Mouse

The Fitch algorithm For larger examples we need an algorithm to solve the small parsimony problem a b c d e f g h Site a A bA cC dC eG fG gT hA

The Fitch algorithm  Label the tips of the tree with the observed sequence at the site A A C C G G T A

The Fitch algorithm Pick an arbitrary root to work towards A A C C G G T A

The Fitch algorithm Work from the tips of the tree towards the root. Label each node with the intersection of the states of its child nodes. If the intersection is empty label the node with the union and add one to the cost A A C C G G T A A {A,T} A {C,G} C {A,C,G} Cost 4

Fitch continued… The Fitch algorithm also has a second phase that allocates states to the internal nodes but it does not affect the cost. To find the Fitch cost of an alignment for a particular tree we just sum the Fitch costs of all the sites.

The “large parsimony problem” The small parsimony problem – to find the score of a given tree - can be solved in linear time in the size of the tree. The large parsimony problem is to find the tree with minimum score. It is known to be NP-Hard.

How many trees are there? #species#unrooted binary tip- labelled trees 43 53*5=15 63*5*7=105 73*5*7*9= ,027, *10 20 n(2n-5)!! An exact search for the best tree, where each tree is evaluated according to some optimality criterion such as parsimony quickly becomes intractable as the number of species increases

Counting trees x 3 = 3 1 x 3 x 5 = 15

Search strategies Exact search  possible for small n only Branch and Bound  up to ~20 taxa Local Search - Heuristics  pick a good starting tree and use moves within a “neighbourhood” to find a better tree. Meta-heuristics  Genetic algorithms  Simulated annealing  The ratchet

Exact searches for small number of taxa (n<=12) it is possible to compute the score of every tree Branch and Bound searches also guarantee to find the optimal solution but use some clever rules to avoid having to check all trees. They may be effective for up to 25 taxa.

No need to evaluate this whole branch of the search tree, as no tree can have a score better than 9

The problem of local optima

Nearest Neighbor Interchange(NNI)

Subtree Pruning Regrafting (SPR)

Tree Bisection Reconnection(TBR)