Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803) 777-8923.

Slides:



Advertisements
Similar presentations
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
Advertisements

Bioinformatics Phylogenetic analysis and sequence alignment The concept of evolutionary tree Types of phylogenetic trees Measurements of genetic distances.
22C:19 Discrete Structures Trees Spring 2014 Sukumar Ghosh.
. Class 9: Phylogenetic Trees. The Tree of Life Evolution u Many theories of evolution u Basic idea: l speciation events lead to creation of different.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Maximum Likelihood. Likelihood The likelihood is the probability of the data given the model.
Tree Reconstruction.
© Wiley Publishing All Rights Reserved. Phylogeny.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
. Computational Genomics 5a Distance Based Trees Reconstruction (cont.) Modified by Benny Chor, from slides by Shlomo Moran and Ydo Wexler (IIT)
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
. Phylogeny II : Parsimony, ML, SEMPHY. Phylogenetic Tree u Topology: bifurcating Leaves - 1…N Internal nodes N+1…2N-2 leaf branch internal node.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Realistic evolutionary models Marjolijn Elsinga & Lars Hemel.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
FPGA Acceleration of Phylogeny Reconstruction for Whole Genome Data Jason D. Bakos Panormitis E. Elenis Jijun Tang Dept. of Computer Science and Engineering.
. Class 9: Phylogenetic Trees. The Tree of Life D’après Ernst Haeckel, 1891.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Probabilistic Approaches to Phylogeny Wouter Van Gool & Thomas Jellema.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
UNIVERSITY OF SOUTH CAROLINA College of Engineering & Information Technology Bioinformatics Algorithms and Data Structures Chapter : Strings and.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Probabilistic methods for phylogenetic trees (Part 2)
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette.
Important Problem Types and Fundamental Data Structures
KNURE, Software department, Ph , N.V. Bilous Faculty of computer sciences Software department, KNURE The trees.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Chapter 9 – Graphs A graph G=(V,E) – vertices and edges
Molecular phylogenetics
Phylogenetic Analysis. General comments on phylogenetics Phylogenetics is the branch of biology that deals with evolutionary relatedness Uses some measure.
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
Sorting by Cuts, Joins and Whole Chromosome Duplications
Molecular phylogenetics 4 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Trees & Topologies Chapter 3, Part 1. Terminology Equivalence Classes – specific separation of a set of genes into disjoint sets covering the whole set.
Introduction to Phylogenetic Trees
Benjamin Loyle 2004 Cse 397 Solving Phylogenetic Trees Benjamin Loyle March 16, 2004 Cse 397 : Intro to MBIO.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
More statistical stuff CS 394C Feb 6, Today Review of material from Jan 31 Calculating pattern probabilities Why maximum parsimony and UPGMA are.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
Phylogeny Ch. 7 & 8.
Selecting Genomes for Reconstruction of Ancestral Genomes Louxin Zhang Department of Mathematics National University of Singapore.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
ASSEMBLY AND ALIGNMENT-FREE METHOD OF PHYLOGENY RECONSTRUCTION FROM NGS DATA Huan Fan, Anthony R. Ives, Yann Surget-Groba and Charles H. Cannon.
Probabilistic methods for phylogenetic tree reconstruction BMI/CS 576 Colin Dewey Fall 2015.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Statistical stuff: models, methods, and performance issues CS 394C September 3, 2009.
Building Phylogenies. Phylogenetic (evolutionary) trees Human Gorilla Chimp Gibbon Orangutan Describe evolutionary relationships between species Cannot.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Phylogenetic Trees - Parsimony Tutorial #12
WABI: Workshop on Algorithms in Bioinformatics
394C, Spring 2012 Jan 23, 2012 Tandy Warnow.
Distance based phylogenetics
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Maximum Likelihood Phylogenetic Reconstruction from High-Resolution Whole-Genome Data and a Tree of 68 Eukaryotes By: Yu Lin Fei Hu , Jijun Tang Bernard.
Multiple Alignment and Phylogenetic Trees
Recitation 5 2/4/09 ML in Phylogeny
Lecture 3: Genome Rearrangements and Duplications
BNFO 602 Phylogenetics Usman Roshan.
Mattew Mazowita, Lani Haque, and David Sankoff
Multiple Genome Rearrangement
Phylogeny.
September 1, 2009 Tandy Warnow
Important Problem Types and Fundamental Data Structures
Mutations.
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Rearrangement Phylogeny of Genomes in Contig form
Presentation transcript:

Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)

Outline Backgrounds Maximum Likelihood Methods for Phylogenetic Reconstruction Maximum Likelihood Methods for Ancestral Genome Inferrence Conclusions

Phylogenetic Reconstruction

Data Type Sequence Data DNA/RNA/Protein Sequences String on an alphabet of 4 or 20 characters Gene-Order Data

Simple Rearrangements

Rearrangement Phylogeny

Median Problem Goal: find M so that D AM +D BM +D CM is minimized NP hard for most metric distances

Binary Encoding

Biased Model Model of evolution: Duplications, insertions and deletions of syntenic blocks Rearrangements: inversions, translocations, fusions, fissions Binary sequences: 1(presence) vs. 0(absence) Adjacency: Pr (1 ->0) vs. Pr (0 -> 1) Gene content: Pr (1 -> 0) vs. Pr (0 -> 1) Strong bias: Pr (1 ->0) >> Pr (0 ->1) for adjacency Lose an existing adjacency: Pr (1->0)  1/O(n) Gain a new adjacency: Pr (0 -> 1)  1/O(n 2 )

ML Phylogenetic Reconstruction

Simulated Results

Ancestral Inference Step 1. Encoding gene orders into binary sequences. Step 2. Setup the biased transition model. Step 3. Arrange target ancestor to the root, and calculate the probabilities of character states for each character in the root. Step 4. Building the adjacency graph and use a greedy heuristic to assemble adjacencies into valid gene order for the target ancestor.

Probabilities are calculated with a bottom-up recursive manner, so the target ancestor is placed to the root to prevent information loss. Step 3 – Root Tree

Likelihood of a tree given sequence data at leaves can be computed (Felsenstein1981) XYZ W XYZ W Pick one tree Pick one site Step 3 –Probabilities of Adjacencies

Posterior probabilities of character states (0 and 1) can be calculated according to Yang (Yang1995). This is calculated by summing over all other ancestral states except root histories 4 histories + 4 histories Step 3 –Probabilities of Adjacencies

Independent adjacencies are assembled into valid gene order permutations by a greedy heuristic proposed by Jian Ma (Ma2007). Sort the edges by weight. Add the current heaviest edge to the path until a cycle is formed, then repeat the process until all vertices are traversed. Remove the lightest edge in each cycle. ( ) Step 4 – Assemble Adjacencies

Transition model and reroot procedure are necessary Simulation Result

PMAG was compared with InferCarsPro (Ma2011) and GRAPPA_DCJ(Xu2008) Results-2

Genome # Gene # Tree Diameter 1n2n3n4n PMAG Tests on Large Scale Dataset

ML on Binary Encoding is more accurate and thousands of times faster than other methods Binary encoding reduces the complexity and allows us to using existing methods for sequence data Biased transition model and rerooting procedure are very useful Future work: Extend PMAG to handle a more general model of evolution, including gene indel and duplication Missing Adjacencies? Conclusions

Thank You!