DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.

Slides:



Advertisements
Similar presentations
Reconstructing Phylogenies from Gene-Order Data Overview.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
An Algorithm for Constructing Parsimonious Hybridization Networks with Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
A Separate Analysis Approach to the Reconstruction of Phylogenetic Networks Luay Nakhleh Department of Computer Sciences UT Austin.
School of CSE, Georgia Tech
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Enhance the Understanding of Whole-Genome Evolution by Designing, Accelerating and Parallelizing Phylogenetic Algorithms Zhaoming Yin Advisor: David A.
Computational biology and computational biologists Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular and Molecular Biology.
Molecular Evolution Revised 29/12/06
High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Current Approaches to Whole Genome Phylogenetic Analysis Hongli Li.
CISC667, F05, Lec14, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Phylogenetic Trees (I) Maximum Parsimony.
Bioinformatics and Phylogenetic Analysis
Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
CIS786, Lecture 3 Usman Roshan.
FPGA Acceleration of Phylogeny Reconstruction for Whole Genome Data Jason D. Bakos Panormitis E. Elenis Jijun Tang Dept. of Computer Science and Engineering.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Phylogeny reconstruction BNFO 602 Roshan. Simulation studies.
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
Inferring Phylogeny using Permutation Patterns on Genomic Data 1 Md Enamul Karim 2 Laxmi Parida 1 Arun Lakhotia 1 University of Louisiana at Lafayette.
CIS786, Lecture 4 Usman Roshan.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Improved Gene Expression Programming to Solve the Inverse Problem for Ordinary Differential Equations Kangshun Li Professor, Ph.D Professor, Ph.D College.
Chapter 3 Parallel Algorithm Design. Outline Task/channel model Task/channel model Algorithm design methodology Algorithm design methodology Case studies.
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
Molecular phylogenetics 1 Level 3 Molecular Evolution and Bioinformatics Jim Provan Page and Holmes: Sections
Lecture 25 - Phylogeny Based on Chapter 23 - Molecular Evolution Copyright © 2010 Pearson Education Inc.
PERFORMANCE ANALYSIS cont. End-to-End Speedup  Execution time includes communication costs between FPGA and host machine  FPGA consistently outperforms.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
ECOE 456/556: Algorithms and Computational Complexity Lecture 1 Serdar Taşıran.
BioPerf: A Benchmark Suite to Evaluate High- Performance Computer Architecture on Bioinformatics Applications David A. Bader, Yue Li Tao Li Vipin Sachdeva.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Ch.6 Phylogenetic Trees 2 Contents Phylogenetic Trees Character State Matrix Perfect Phylogeny Binary Character States Two Characters Distance Matrix.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
Parallel & Distributed Systems and Algorithms for Inference of Large Phylogenetic Trees with Maximum Likelihood Alexandros Stamatakis LRR TU München Contact:
Algorithms research Tandy Warnow UT-Austin. “Algorithms group” UT-Austin: Warnow, Hunt UCB: Rao, Karp, Papadimitriou, Russell, Myers UCSD: Huelsenbeck.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
SupreFine, a new supertree method Shel Swenson September 17th 2009.
Problems with large-scale phylogeny Tandy Warnow, UT-Austin Department of Computer Sciences Center for Computational Biology and Bioinformatics.
CS 395T: Computational phylogenetics January 18, 2006 Tandy Warnow.
Application of Phylogenetic Networks in Evolutionary Studies Daniel H. Huson and David Bryant Presented by Peggy Wang.
Bioinformatics Overview
WABI: Workshop on Algorithms in Bioinformatics
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Character-Based Phylogeny Reconstruction
Data Structures and Algorithms in Parallel Computing
BNFO 602 Phylogenetics Usman Roshan.
BNFO 602 Phylogenetics – maximum parsimony
Greedy Algorithms And Genome Rearrangements
Sungho Kang Yonsei University
BNFO 602 Phylogenetics – maximum likelihood
BNFO 602 Phylogenetics Usman Roshan.
Boltzmann Machine (BM) (§6.4)
CS 394C: Computational Biology Algorithms
Algorithms for Inferring the Tree of Life
Traveling Salesman Problem by Genetic Algorithm
Parallel Programming in C with MPI and OpenMP
Parallel Exact Stochastic Simulation in Biochemical Systems
Presentation transcript:

DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov 11, 2013

Contribution Research Aspect -A framework to solve the maximum parsimonious tree with the input of unequal genome contents. -Proved Adequate subgraph theory is applicable in unequal contents data which reduces search space. -provide a benchmark for the HPC community. Engineering Aspect -Implement software with many state of the art features such as supertree method, GAS initialization method, spectral partition etc. -The software can produce a tree with not only topologies, but also type/number of different evolution events (visualization!).

Why Phylogenetic Tree Problem is Hard? For N genomes, there are (N-3)!! number of possible tree topologies. For each topology, we need to compute at least one different median, the possible median order are (g-2)!!. g is the number of genes. To validate each possible median, if the gene content has duplications, it’s NP hard. So the complexity type of computing the MP tree with uneuqal contents genomes is: NP hard over NP hard over NP hard!

Phylogenetic Tree This picture presents the phylogeny of the “12 Drosophila.” From

Maximum Parsimony Concept Of all possible topologies, the maximum parsimonious tree is the one that has the minimum total tree length

Genome Rearrangement

Genome Rearrangement In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution – 6 – – Inversion: Transposition: Inverted Transposition:

Genome Median Computation

,2,3 1,-3,-2 -2,-1,3 1,2,3 = 2 moves 2,-1,3 = 5 moves …..

Step 1: Spectral Partition

Step 2: Compute MP Tree for Each Sub-Disk

Step 2-1: How to Compute Median (BNB)

Step 2-2: How to Compute Median (LK) …………………. stop

Step 2-2: How to Evaluate Median 1 med 1, 2, 3, 3, 4, 6, 5 1, 2, 3, 4, 3, 6, 5 1, 2, 3, 4, 6, 3, 5 1, 2, 5, 4, 6, 3, 3 Dis(m,1)+Dis(m,2)+Dis(m,3) 2 3

Step 2-2: How to Evaluate Median 1, 2, 3, 3, 4, 6, 5 1, 2, 3, 4, 3, 5 Find a mapping first (NP hard) dis=1 1, 2, 3, 3, 4, 6, 5 -2, -1, 3, 3, 4, 5 Complete the loss (polynomial) dis =2 1, 2, 3, 4, 6, 5 -2, -1, 3, 4, 6, 5 Compute DCJ (polynomial) dis =3 1, 2, 3, 4, 6, 5

Step 3: Merge Disks Decomposition of The disks Construct a tree for each disk Merge the tree using A specific consensus method: Strict, majority etc… Disambiguation

Step 4: Initialization X 12 c b e d Init by insertion Which is local Init by prospection Which is global.

Step5: Iterative Refinement a b

Review Step 1: Spectral partition Step 2: Subtree construction Step 3: Supertree merge Step 4: Initialization of complete tree using General Adequate Subgraph (GAS) method. Step 5: Iterative Refinement until the complete tree converged.

Result—Simulated Data seed #Theta+ #gamma+ #phi operations We know the total number of evolution event in the model tree We grow our own tree

Result--Accuracy %of duplication 0.1 % of loss 0.1 Theta is % of inversion There are 8 species 2*8-3 =13edges. So the average accuracy is ~90%

Result – Real Data SCRaMbLE Matrix We can represent a SCRaMbLEd strain by its vector. The sign gives the orientation. The color encodes the position in the synthetic chromosome.

Result – Real Data #inversion:#insertion/deletion:#duplication

Parallel Method [Bader 05] Parallel search Load Balancing

Experimental Results (Parallel)

Why Many-core BnB? So many distributed memory MIP BnB frameworks (PICO, PEBBL, ALPS, COIN-OR). Load balance of distributed BnB is highly relied on Ramp up, run time load balancing is not efficient. But nowadays Peta-flops machines are mostly hybrid systems(distributed + many-core (or accelerators)).

Experimental Results (Intel Phi knapsack)