BNFO 602 Phylogenetics Usman Roshan.

Slides:



Advertisements
Similar presentations
Parsimony Small Parsimony and Search Algorithms Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Advertisements

A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
Challenges in computational phylogenetics Tandy Warnow Radcliffe Institute for Advanced Study University of Texas at Austin.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Molecular Evolution Revised 29/12/06
Tree Reconstruction.
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
BNFO 602 Phylogenetics Usman Roshan.
5 - 1 Chap 5 The Evolution Trees Evolutionary Tree.
BME 130 – Genomes Lecture 26 Molecular phylogenies I.
BNFO 602, Lecture 3 Usman Roshan Some of the slides are based upon material by David Wishart of University.
CIS786, Lecture 3 Usman Roshan.
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
BNFO 602 Phylogenetics Usman Roshan. Summary of last time Models of evolution Distance based tree reconstruction –Neighbor joining –UPGMA.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Probabilistic methods for phylogenetic trees (Part 2)
Building Phylogenies Parsimony 2.
Building Phylogenies Distance-Based Methods. Methods Distance-based Parsimony Maximum likelihood.
CIS786, Lecture 4 Usman Roshan.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Phylogenetic trees Sushmita Roy BMI/CS 576
Lecture 8 – Searching Tree Space. The Search Tree.
Combinatorial and graph-theoretic problems in evolutionary tree reconstruction Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Phylogeny Estimation: Why It Is "Hard", and How to Design Methods with Good Performance Tandy Warnow Department of Computer Sciences University of Texas.
Terminology of phylogenetic trees
Molecular phylogenetics
Maximum Parsimony Input: Set S of n aligned sequences of length k Output: –A phylogenetic tree T leaf-labeled by sequences in S –additional sequences of.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Rec-I-DCM3: A Fast Algorithmic Technique for Reconstructing Large Evolutionary Trees Usman Roshan Department of Computer Science New Jersey Institute of.
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Phylogenetic Prediction Lecture II by Clarke S. Arnold March 19, 2002.
CS 173, Lecture B August 25, 2015 Professor Tandy Warnow.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Introduction to Phylogenetic Estimation Algorithms Tandy Warnow.
Maximum Likelihood Given competing explanations for a particular observation, which explanation should we choose? Maximum likelihood methodologies suggest.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Problems with large-scale phylogeny Tandy Warnow, UT-Austin Department of Computer Sciences Center for Computational Biology and Bioinformatics.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Distance-based methods for phylogenetic tree reconstruction Colin Dewey BMI/CS 576 Fall 2015.
CSCE555 Bioinformatics Lecture 13 Phylogenetics II Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Chapter AGB. Today’s material Maximum Parsimony Fixed tree versions (solvable in polynomial time using dynamic programming) Optimal tree search.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Phylogenetic Trees - Parsimony Tutorial #12
Introduction to Bioinformatics Resources for DNA Barcoding
The Disk-Covering Method for Phylogenetic Tree Reconstruction
Phylogenetic basis of systematics
Statistical tree estimation
Distance based phylogenetics
Inferring a phylogeny is an estimation procedure.
Challenges in constructing very large evolutionary trees
Character-Based Phylogeny Reconstruction
Multiple Alignment and Phylogenetic Trees
CIPRES: Enabling Tree of Life Projects
BNFO 602 Phylogenetics – maximum parsimony
CS 581 Tandy Warnow.
CS 581 Tandy Warnow.
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Lecture 8 – Searching Tree Space
Phylogeny.
CS 394C: Computational Biology Algorithms
September 1, 2009 Tandy Warnow
Algorithms for Inferring the Tree of Life
Tandy Warnow The University of Texas at Austin
Presentation transcript:

BNFO 602 Phylogenetics Usman Roshan

Summary of last time Models of evolution Distance based tree reconstruction Neighbor joining UPGMA

Why phylogenetics? Study of evolution Origin and migration of humans Origin and spead of disease Many applications in comparative bioinformatics Sequence alignment Motif detection (phylogenetic motifs, evolutionary trace, phylogenetic footprinting) Correlated mutation (useful for structural contact prediction) Protein interaction Gene networks Vaccine devlopment And many more…

Maximum Parsimony Character based method NP-hard (reduction to the Steiner tree problem) Widely-used in phylogenetics Slower than NJ but more accurate Faster than ML Assumes i.i.d.

Maximum Parsimony Input: Set S of n aligned sequences of length k Output: A phylogenetic tree T leaf-labeled by sequences in S additional sequences of length k labeling the internal nodes of T such that is minimized.

Maximum parsimony (example) Input: Four sequences ACT ACA GTT GTA Question: which of the three trees has the best MP scores?

Maximum Parsimony ACT GTA ACA ACT GTT ACA GTT GTA GTA ACA ACT GTT

Maximum Parsimony ACT GTA ACA ACT GTT GTA ACA ACT 2 1 1 2 GTT 3 3 GTT MP score = 7 MP score = 5 GTA ACA ACA GTA 2 1 1 ACT GTT MP score = 4 Optimal MP tree

Maximum Parsimony: computational complexity ACT ACA GTT GTA 1 2 MP score = 4 Finding the optimal MP tree is NP-hard Optimal labeling can be computed in linear time O(nk)

Local search strategies Phylogenetic trees Cost Global optimum Local optimum

Local search for MP Determine a candidate solution s While s is not a local minimum Find a neighbor s’ of s such that MP(s’)<MP(s) If found set s=s’ Else return s and exit Time complexity: unknown---could take forever or end quickly depending on starting tree and local move Need to specify how to construct starting tree and local move

Starting tree for MP Random phylogeny---O(n) time Greedy-MP

Greedy-MP Greedy-MP takes O(n^2k^2) time

Local moves for MP: NNI For each edge we get two different topologies Neighborhood size is 2n-6

Local moves for MP: SPR Neighborhood size is quadratic in number of taxa Computing the minimum number of SPR moves between two rooted phylogenies is NP-hard

Local moves for MP: TBR Neighborhood size is cubic in number of taxa Computing the minimum number of TBR moves between two rooted phylogenies is NP-hard

Local optima is a problem

Iterated local search: escape local optima by perturbation Local optimum

Iterated local search: escape local optima by perturbation Local optimum Perturbation Output of perturbation

Iterated local search: escape local optima by perturbation Local optimum Perturbation Local search Output of perturbation

ILS for MP Ratchet (Nixon 1999) Iterative-DCM3 (Roshan et. al. 2004) TNT (Goloboff et. al. 1999)

Maximum Likelihood Find the tree that has the highest likelihood. Problems: What is the likelihood of a tree with branch lengths and internal nodes? What if no internal nodes are given? Felsenstein’s algorithm What if no branch lengths are given?

Maximum Likelihood NP-hard like Maximum Parsimony (MP) Similar local search heuristics as MP