Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette.

Slides:



Advertisements
Similar presentations
Reconstructing Phylogenies from Gene-Order Data Overview.
Advertisements

Phylogenetic Tree A Phylogeny (Phylogenetic tree) or Evolutionary tree represents the evolutionary relationships among a set of organisms or groups of.
DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.
BALANCED MINIMUM EVOLUTION. DISTANCE BASED PHYLOGENETIC RECONSTRUCTION 1. Compute distance matrix D. 2. Find binary tree using just D. Balanced Minimum.
Large-Scale Phylogenetic Analysis Tandy Warnow Associate Professor Department of Computer Sciences Graduate Program in Evolution and Ecology Co-Director.
1 General Phylogenetics Points that will be covered in this presentation Tree TerminologyTree Terminology General Points About Phylogenetic TreesGeneral.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Molecular Evolution Revised 29/12/06
High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.
© Wiley Publishing All Rights Reserved. Phylogeny.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
BIOE 109 Summer 2009 Lecture 4- Part II Phylogenetic Inference.
Genome Rearrangement Phylogeny
In addition to maximum parsimony (MP) and likelihood methods, pairwise distance methods form the third large group of methods to infer evolutionary trees.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
CISC667, F05, Lec8, Liao CISC 667 Intro to Bioinformatics (Fall 2005) Multiple Sequence Alignment Scoring Dynamic Programming algorithms Heuristic algorithms.
Phylogenetic trees Sushmita Roy BMI/CS 576
Computing the Tree of Life The University of Texas at Austin Department of Computer Sciences Tandy Warnow.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Phylogenetic analyses Kirsi Kostamo. The aim: To construct a visual representation (a tree) to describe the assumed evolution occurring between and among.
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Phylogenetic Analysis. 2 Introduction Intension –Using powerful algorithms to reconstruct the evolutionary history of all know organisms. Phylogenetic.
Molecular phylogenetics
Molecular evidence for endosymbiosis Perform blastp to investigate sequence similarity among domains of life Found yeast nuclear genes exhibit more sequence.
Chapter 26: Phylogeny and the Tree of Life Objectives 1.Identify how phylogenies show evolutionary relationships. 2.Phylogenies are inferred based homologies.
Computer Science Research for The Tree of Life Tandy Warnow Department of Computer Sciences University of Texas at Austin.
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Computational Biology, Part D Phylogenetic Trees Ramamoorthi Ravi/Robert F. Murphy Copyright  2000, All rights reserved.
BINF6201/8201 Molecular phylogenetic methods
3- RIBOSOMAL RNA GENE RECONSTRUCITON  Phenetics Vs. Cladistics  Homology/Homoplasy/Orthology/Paralogy  Evolution Vs. Phylogeny  The relevance of the.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Sorting by Cuts, Joins and Whole Chromosome Duplications
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Introduction to Phylogenetic Trees
Building phylogenetic trees. Contents Phylogeny Phylogenetic trees How to make a phylogenetic tree from pairwise distances  UPGMA method (+ an example)
Introduction to Phylogenetics
Estimating Species Tree from Gene Trees by Minimizing Duplications
Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.
Phylogeny Ch. 7 & 8.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Algorithmic research in phylogeny reconstruction Tandy Warnow The University of Texas at Austin.
Algorithms research Tandy Warnow UT-Austin. “Algorithms group” UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
Genome Rearrangement By Ghada Badr Part I.
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Darwin’s Tree of Life, July million species Phylogenetic inference from genomic.
394C: Algorithms for Computational Biology Tandy Warnow Jan 25, 2012.
Bioinformatics Overview
20 years and 22 papers with Bernard Moret
Evolutionary genomics can now be applied beyond ‘model’ organisms
New Approaches for Inferring the Tree of Life
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Genome Rearrangement and Duplication Distance
Multiple Alignment and Phylogenetic Trees
Methods of molecular phylogeny
Patterns in Evolution I. Phylogenetic
Mattew Mazowita, Lani Haque, and David Sankoff
Multiple Genome Rearrangement
Chapter 19 Molecular Phylogenetics
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
MAGE: Models and Algorithms for Genome Evolution 2013
Rearrangement Phylogeny of Genomes in Contig form
Presentation transcript:

Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette 2 IBM T. J. Watson Research Center

Phylogeny Reconstruction of the evolutionary relationship of a collection of organisms, usually in the form of a tree.

Phylogenetic data Behavioral, morphological, metabolic, etc. Molecular data: sequence data, gene-order data etc. gene-order data

Why gene order data? Low error rate. Rare evolutionary events unlikely to cause “silent" changes; can help inferring millions of years.

Genomes rearrangements Inverted Transposition –7 –6 –5 –4 10 Inversion –8 –7 –6 – Transposition

Breakpoint distance  Breakpoints are number of adjacencies present in one genome, but not in the other –3 – For some datasets, a close-to-linear relationship between the breakpoints and evolutionary events may exist. Can be used for building phylogeny (Blanchette et al.).

Limitations of breakpoint The number of breakpoints created by a certain number of inversions may vary. Also, transpositions generally create more breakpoints than inversions. Computing the breakpoint phylogeny is NP-hard.

MPBE (Maximum Parsimony on Binary Encoding) A heuristic for the breakpoint phylogeny (Cosner et al. ). All ordered pairs of signed genes appearing consecutively are coded as binary features. Exponential time complexity, however, much faster than BPAnalysis.

Limitations May fail to find feasible solutions to the breakpoint phylogeny problem.

Observation: The closer is the evolution history, the more permutations (of different granularity) are in common –8 –7 –6 –5 – –3 –2 –7 –6 –5 –4 9 10

Maximal pi-pattern (Eres et al.) Matches permutations at different granularity. Polynomial time complexity.

pi-pattern Example : For S = and k=2 All pi-patterns are: ac, bc, abc, abcc acbcabacbcab abc Pattern with minimum k permutations

Cover P1 covers P2=> Every P1 has a P2 Every P2 is within a P1  Example In S = acbcab abc covers ac

Maximal pi-pattern pi-pattern which is not covered  Example In S = acbcab pi-patterns: ac, bc, abc, abcc Maximal pi-patterns: abc, abcc not covered by abcc

Results

Phylogeny for simulated evolution on synthetic data

12 genera of Campanulaceae and the outgroup tobacco

Tree1: MPBE tree

Tree2: Neighbor joining tree (using few different distances)

Tree3: Neighbor joining tree using permutation patterns  167 Maximal pi-patterns(from pi-patterns) used as binary feature  XOR Distance measure  Distance/Similarity matrix is created to find neighbor joining tree

Tree3 vs Tree2

Conclusion Permutation patterns may preserve more evolutionary information. Evolutionary events could be counted within permuted segments to develop a hybrid scheme. Current approaches remain unable to handle unequal gene content, which could be solved using maximal pi-patterns.