Building Phylogenies Parsimony 2.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

Molecular Phylogeny Analysis, Part II. Mehrshid Riahi, Ph.D. Iranian Biological Research Center (IBRC), July 14-15, 2012.
Cladogram Building - 1 ß How complex is this problem anyway ? ß NP-complete:  Time needed to find solution in- creases exponentially with size of problem.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
“Inferring Phylogenies” Joseph Felsenstein Excellent reference
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
Phylogeny Tree Reconstruction
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
1. 2 Rooting the tree and giving length to branches.
Branch and Bound Similar to backtracking in generating a search tree and looking for one or more solutions Different in that the “objective” is constrained.
Heuristic search heuristic search attempts to find the best tree, without looking at all possible trees.
CIS786, Lecture 3 Usman Roshan.
Maximum Parsimony.
. Phylogenetic Trees Lecture 3 Based on: Durbin et al 7.4; Gusfield 17.
NJ was originally described as a method for approximating a tree that minimizes the sum of least- squares branch lengths – the minimum – evolution criterion.
Lecture 24 Inferring molecular phylogeny Distance methods
CENTER FOR BIOLOGICAL SEQUENCE ANALYSIS Phylogenetic Reconstruction: Parsimony Anders Gorm Pedersen
Ch 13 – Backtracking + Branch-and-Bound
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
Tree-Building. Methods in Tree Building Phylogenetic trees can be constructed by: clustering method optimality method.
Lecture 8 – Searching Tree Space. The Search Tree.
What Is Phylogeny? The evolutionary history of a group.
Maximum parsimony Kai Müller.
Terminology of phylogenetic trees
Why Models of Sequence Evolution Matter Number of differences between each pair of taxa vs. genetic distance between those two taxa. The x-axis is a proxy.
Parsimony and searching tree-space Phylogenetics Workhop, August 2006 Barbara Holland.
Phylogenetics Alexei Drummond. CS Friday quiz: How many rooted binary trees having 20 labeled terminal nodes are there? (A) (B)
1 Dan Graur Molecular Phylogenetics Molecular phylogenetic approaches: 1. distance-matrix (based on distance measures) 2. character-state.
PARSIMONY ANALYSIS and Characters. Genetic Relationships Genetic relationships exist between individuals within populations These include ancestor-descendent.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Design and Analysis of Algorithms - Chapter 111 How to tackle those difficult problems... There are two principal approaches to tackling NP-hard problems.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Exact methods for ALB ALB problem can be considered as a shortest path problem The complete graph need not be developed since one can stop as soon as in.
Lecture 2: Principles of Phylogenetics
Cladogram construction Thanks to Leandro Gaetano.
394C, Spring 2013 Sept 4, 2013 Tandy Warnow. DNA Sequence Evolution AAGACTT TGGACTTAAGGCCT -3 mil yrs -2 mil yrs -1 mil yrs today AGGGCATTAGCCCTAGCACTT.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Fabio Pardi PhD student in Goldman Group European Bioinformatics Institute and University of Cambridge, UK Joint work with: Barbara Holland, Mike Hendy,
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Tree Searching Methods Exhaustive search (exact) Branch-and-bound search (exact) Heuristic search methods (approximate) –Stepwise addition –Branch swapping.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
1 CAP5510 – Bioinformatics Phylogeny Tamer Kahveci CISE Department University of Florida.
Parsimony and searching tree-space. The basic idea To infer trees we want to find clades (groups) that are supported by synapomorpies (shared derived.
CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Adversarial Search 2 (Game Playing)
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Inferring a phylogeny is an estimation procedure.
#31 - Phylogenetics Character-Based Methods
Branch and Bound.
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Maximum parsimony with MEGA and PAUP
CS 581 Tandy Warnow.
Why Models of Sequence Evolution Matter
Lecture 8 – Searching Tree Space
Lecture 7 – Algorithmic Approaches
PARSIMONY ANALYSIS.
CS 394C: Computational Biology Algorithms
Presentation transcript:

Building Phylogenies Parsimony 2

Methods Distance-based Parsimony Maximum likelihood

Searching for an MP tree Exhaustive search (exact) Branch-and-bound search (exact) Heuristic search methods Stepwise addition Branch swapping Star decomposition

Exhaustive Enumeration Order the taxa: s1, s2, . . . , sn Build (unique) unrooted tree for s1, s2, s3 Try all possible places to add s4, and score each tree Try all places to add s5 to previous trees and score again . . .

Adding the 4th taxon [S05]

Adding the 5th taxon [S05]

[S05]

Branch and bound Similar to exhaustive search, except that we maintain Score of best tree obtained so far A lower bound on score of best tree that can be obtained from this point forward. If score of current tree exceeds the current best score, backtrack and takes the next available path. When a tip of the search tree is reached the tree is either optimal (and hence retained) or suboptimal (and rejected). When all paths leading from the initial 3-taxon tree have been explored, the algorithm terminates, and all most-parsimonious trees will have been identified.

Branch Swapping Local search approach: Define a “neighborhood” for a tree Neighbors are obtained by rearranging branches: cut and paste Instead of exhaustive exploration of tree space, just try neighbors.

Branch Swapping Nearest-Neighbor Interchange (NNI) Subtree Pruning and Regrafting (SPR) Tree Bisection and Reconnection (TBR)

Nearest-Neighbor Interchange

All 15 5-taxon trees, connected by NNIs

Subtree Pruning and Regrafting

Tree Bisection and Reconnection

Stepwise Addition A greedy method Start with 3-taxon tree Add taxa one at a time. Keep only the best tree found so far No guarantee of optimality, but may provide good starting point for search

A problem with parsimony: Long branch attraction Convergent evolution along long branches can confuse parsimony G A     G A     Incorrect!

Compatibility A set of characters is compatible if there exixts a tree where each character state emerges exactly once. a 1 A B C D c e f b a, b

Consistency index Homoplasy: Multiple emergence of the same state in a phylogeny Perfect fit (= compatible characters)  no homoplasy Let mi = min #(steps possible for site i) and si = min #(steps for site i given the tree) The consistency index is CI = mi / si (0  CI  1) CI measures amount of homoplasy in tree

The bootstrap A bootstrap sample is obtained by sampling sites randomly with replacement Obtain a data matrix with same number of taxa and number of characters as original one Construct phylogenies for samples For each branch in original tree, compute fraction of bootstrap samples in which that branch appears Assigns a bootstrap support value to each branch. Idea: If a grouping has a lot of support, it will be supported by at least some positions in most of the bootstrap samples Can be applied to other methods of phylogenetic reconstruction