Tao Jiang Department of Computer Science

Slides:

Advertisements

Similar presentations

A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.

Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.

Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.

Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.

Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.

Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.

GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.

Phylogenetic reconstruction

Molecular Evolution Revised 29/12/06

M ulti P aranoid Automatic Clustering of Orthologs and Inparalogs Shared by Multiple Proteomes Andrey Alexeyenko Ivica Tamas Gang Liu Erik L.L. Sonnhammer.

Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.

Greedy Algorithms And Genome Rearrangements

Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.

The Statistical Significance of Max-gap Clusters Rose Hoberman David Sankoff Dannie Durand.

Heuristic alignment algorithms and cost matrices

Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?

CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.

Bioinformatics and Phylogenetic Analysis

Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.

Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.

Genome Rearrangements CSCI : Computational Genomics Debra Goldberg

Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?

Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.

1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.

Efficient Data Structures and a New Randomized Approach for Sorting Signed Permutations by Reversals Haim Kaplan and Elad Verbin.

7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.

TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.

Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina

Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.

1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:

Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)

BINF6201/8201 Molecular phylogenetic methods

Bioinformatics 2011 Molecular Evolution Revised 29/12/06.

Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.

Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)

Sorting by Cuts, Joins and Whole Chromosome Duplications

Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.

Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.

Significance Tests for Max-Gap Gene Clusters Rose Hoberman joint work with Dannie Durand and David Sankoff.

341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.

Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.

Genome Rearrangement By Ghada Badr Part I.

Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.

Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.

Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.

1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.

Tzvika Hartman Elad Verbin Bar Ilan University Tel Aviv University

CSCI2950-C Lecture 12 Networks

WABI: Workshop on Algorithms in Bioinformatics

CSCI2950-C Genomes, Networks, and Cancer

Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.

Comparative Genomics.

P-POD-PANTHER: update

Genome Rearrangement and Duplication Distance

CSE 5290: Algorithms for Bioinformatics Fall 2009

Greedy (Approximation) Algorithms and Genome Rearrangements

Lecture 3: Genome Rearrangements and Duplications

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

1 Department of Engineering, 2 Department of Mathematics,

CSCI2950-C Lecture 4 Genome Rearrangements

Mattew Mazowita, Lani Haque, and David Sankoff

Greedy Algorithms And Genome Rearrangements

Dr Tan Tin Wee Director Bioinformatics Centre

Consensus Partition Liang Zheng 5.21.

SEG5010 Presentation Zhou Lanjun.

CSCI2950-C Lecture 6 Genome Rearrangements and Duplications

JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE

MAGE: Models and Algorithms for Genome Evolution 2013

Presentation transcript:

A Combinatorial Approach to Genome-Wide Ortholog Assignment: Beyond Sequence Similarity Search Tao Jiang Department of Computer Science University of California, Riverside Joint work with X. Chen, Z. Fu, J. Zheng, V. Vacic, P. Nan, Y. Zhong, and S. Lonardi

Outline An introduction to orthology Existing ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Minimum Common Substring Partition Maximum Cycle Decomposition Experimental results Summary and future directions 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Minimum Common Substring Partition Maximum Cycle Decomposition Experimental Results Summary and future directions 9/17/2018

Orthology Homolog Paralog Ortholog mouse Gene family chicken Duplication Ortholog Speciation mouse chicken frog (from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html) 9/17/2018

Orthology a b Homolog Paralog Ortholog mouse Gene family chicken Duplication Ortholog Speciation mouse chicken frog (from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html) 9/17/2018

Orthology a b Homolog Paralog Ortholog mouse Gene family chicken Duplication Ortholog Speciation mouse chicken frog (from http://www.ncbi.nlm.nih.gov/Education/BLASTinfo/Orthology.html) 9/17/2018

Orthology – the more complicated picture Speciation 1 Gene duplication 1 B C Speciation 2 Speciation 2 B1 C1 B2 C2 C3 True exemplar is the direct descendant of the ancestral gene of a given set of inparalogs. A main ortholog pair is defined as two true exemplar genes of two co-orthologous gene sets. Gene duplication 2 Outparalogs evolved via a duplication prior to a given speciation event. B1 C1 A1 B1 C1 B2 C2 C3 Inparalogs evolved via a duplication posterior to a given speciation event. B2 C2 C3 G1 G2 G3 9/17/2018

Significance Orthologous genes in different species are evolutionary and functional counterparts. Many methods use orthologs in a critical way: Function inference Protein structure prediction Motif finding Phylogenetic analysis Pathway reconstruction and more ... Identification of orthologs, especially exemplar genes, is a fundamental and challenging problem. 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Minimum Common Substring Partition Maximum Cycle Decomposition Experimental Results Summary and future directions 9/17/2018

Existing Methods Methods based on sequence similarity BBH Inparanoid/Multiparanoid PhiGs COG/KOG OrthoMCL MGD TOGA/EGO KEGG HomoloGene Methods based on phylogenetic trees Reconciled tree Orthostrapper OrthologID RAP RIO PhyOP TreeFam Methods based that take into account gene locations Shared genomic synteny 9/17/2018

Observations Sequence similarity-based methods assume that the evolutionary rates of all genes in a homologous family are equal and thus the divergence time could be estimated by comparing the sequence of genes. Tree-based methods critically rely on the correctness of reconstructed gene and species trees. Global genome rearrangements are not considered in gene location-based methods. 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates NP-hard A low bound Minimum Common Substring Partition Maximum Cycle Decomposition Experimental Results Summary and future directions 9/17/2018

Molecular Evolution Local mutation Base substitution Base insertion Base deletion Global rearrangement and duplication Inversion/Reversal Translocation Transposition Fusion/Fission Duplication/Loss A complete ortholog assignment system should make use of information from both levels of molecular evolution. 9/17/2018

Genome Rearrangement Operations Reversal (inversion) 1 2 3 4 5 6 7 8 9 1 2 3 -6 -5 -4 7 8 9 Translocation 1 2 3 4 5 6 1 2 3 11 12 13 7 8 9 10 11 12 13 7 8 9 10 4 5 6 Fusion 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 Fission 9/17/2018

Example a1 b c a2 d e f g The ancestral genome Speciation a1 c a2 d e f g b reversal a1 b c a2 d e f g a3 duplication a1 c a2 d e f g b a4 duplication Genome a1 b c a2 d e f g a3 fission Genome Given the evolutionary scenario, main ortholog pairs and inparalogs could be identified in a straightforward way. 9/17/2018

The Parsimony Approach Identify homologs using sequence similarity search (e.g.) BLASTp. Reconstruct the evolutionary scenario on the basis of the parsimony principle: postulate the minimum possible number of rearrangement events and duplication events in the evolution of two closely related genomes since their splitting so as to assign orthologs. Ortholog assignment problem could be formulated as a problem of finding a most parsimonious transformation from one genome into the other, without explicitly inferring their ancestral genome. 9/17/2018

RD (Rearrangement-Duplication) Distance RD distance: denotes the number of rearrangement events in a most parsimonious transformation denotes the number of gene duplications in a most parsimonious transformation 9/17/2018

The key algorithmic problem -SRDD Two related (unichromosomal) genomes No inparalogs, i.e. no post-speciation duplications No gene losses, and thus equal gene content Only reversals have occurred Signed Reversal Distance with Duplicates How to find a shortest sequence of reversals Almost untouched in the literature Duplicated genes are present Generalizes the problem of sorting by reversal A high-throughput system for assigning orthologs on a genome scale. 9/17/2018

When there are no (post-speciation) duplications The most parsimonious rearrangement scenario may suggest the true orthology. 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates NP-hard A low bound Minimum Common Substring Partition Maximum Cycle Decomposition Experimental Results Summary and future directions 9/17/2018

Sorting by reversal Sorting a permutation into the identity by reversals Distinct genes only Signed vs. unsigned version 3 4 1 2 -2 -1 -4 -3 1 2 -4 -3 1 2 3 4 Sorting signed permutation 3 4 1 2 1 4 3 2 1 2 3 4 Sorting unsigned permutation 3 4 1 2 A permutation A high-throughput system for ssigning orthologs on a genome scale. 9/17/2018

Sorting signed permutation Hannenhalli-Pevzner (HP) theory Polynominal-time solvable Breakpoint graph Breakpoint, cycle, hurdle, fortress HP formula: 0 5 6 7 8 1 2 3 4 9 Breakpoint graph d = 3 – 1 + 1 + 0. 3 4 1 2 A permutation Hannenhalli and Pevzner, STOC, 178-187, 1995 9/17/2018

Sorting unsigned permutation NP-hard (Caprara, 1997) Breakpoint graph Maximum alternating cycle decomposition (NP-hard) 1.375-approximation (Berman, et al. 2002) 0 4 2 1 3 5 Breakpoint graph 0 4 2 1 3 5 Alternating cycle decomposition d = 3 – 1 + 1 + 0. 4 2 1 3 A permutation Caprara, RECOMB, 75-83, 1997 9/17/2018

A brief history signed unsigned Kececioglu and Sankoff (1995) 2-approximation Bafna and Pevzner (1996) 1.5-approximation 1.75-approximation Hannenhalli and Pevzner (1995) Polynomial Special cases – polynomial Caprara (1997) NP-hard Christie (1998) Bader, et al (2001) Linear – distance only Berman, et al (2002) 1.375-approximation d = 3 – 1 + 1 + 0. The work has also been extended to genomes with multiple chromosomes (Hannenhalli and Pevaner, 1995; Tesler, 2002; Ozery-Flato and Shamir, 2003) 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Computing Minimum Common Substring Partition Computing Maximum Cycle Decomposition Experimental results Summary and future directions 9/17/2018

SRDD – The exhaustive method Given genomes and , . : the set of all the possible ortholog assignments : the genome after orthologs have been assigned Assume one family with ten duplicated genes in each genome 9/17/2018

SRDD – Hardness SRDD is NP-hard, even when the maximum size of a gene family is limited to two. Reduction from the problem of sorting an unsigned permutation by reversals 3 4 1 2 An unsigned permutation +3 -3 +4 -4 +1 -1 +2 -2 A signed sequence with duplicates 1 2 3 4 +1 -1 +2 -2 +3 -3 +4 -4 No breakpoint No breakpoint +3 -3 Case 1: Case 2: 9/17/2018

SRDD – A lower bound Partial graph : the number of edges linking two nodes labeled by and , respectively The number of breakpoints: Let and be a pair of related genomes. Their reversal distance is lower bounded by +3 -1 -2 +1 +4 3h 3t 1t 1h 2t 2h 1h 1t 4h 4t 3h 3t 1h 1t 2h 2t 1h 1t 4h 4t +3 +1 +2 +1 +4 9/17/2018

(Sub)optimal assignment rules Rule one: a b c f d e Trivial Non-trivial a b c f d e Trivial Non-trivial / Rule two: a b c f d -e -d -b -c Trivial Non-trivial e a b c f d -e -d -b -c Trivial Non-trivial e / 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Computing Minimum Common Substring Partition Computing Maximum Cycle Decomposition Experimental results Summary and future directions 9/17/2018

The MCSP problem Minimum Common Substring Partition This may help eliminate many duplicates, but is different from syntenic blocks. Give two related genomes and , we have G: 3 1 2 -1 4 H: -4 1 2 3 1 G: 3 1 2 -1 4 H: -4 1 2 3 1 Without loss of generality that the first genes and the last genes of the two related genomes are identical and positive singletons, respectively 9/17/2018

Goldstein, Kolman, and Zheng, ISAAC, 473-484, 2004 MCSP - Hardness Let k-MCSP denote the version of MCSP where each gene family is of size at most k. The problem k-MCSP is NP-hard, for any k > 1. Petr Kolman gave a linear time O( )-approximation algorithm for k-MCSP (MFCS’05), and thus k-SRDD. The approximation ratio was recently improved to O(k). Goldstein, Kolman, and Zheng, ISAAC, 473-484, 2004 9/17/2018

MCSP – Pair-match graph A pair-match graph Single match v.s. pair match Incompatible pair-matches The maximum independent set problem on is equivalent to the minimum common substring partition problem, i.e., . G: 3 1 H: 3 1 G: 2 -1 H: 1 -2 G: -1 4 H: -1 4 G: 1 2 H: -2 -1 G: 3 1 2 -1 4 H: 3 1 -2 -1 4 )) , ( ) E V MIS n H G L Ã - = Goldstein, Kolman, and Zheng, ISAAC, 473-484, 2004 9/17/2018

MCSP – Approximation Algorithm APPROX-MCSP( , ) /* and are a pair of related genomes */ Construct the pair-match graph for and Find an approximation of the vertex cover of Identify segments based on the pair-matches in Output all the segments as a common substring partition If the common substring parititon found by the above algorithm APPROX-MCSP is , then where is the ratio of the approximation algorithm for vertex cover and is the genome size. In particular for 2-MCSP, the algorithm achieves an approximation ratio of 1.5. 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Computing Minimum Common Substring Partition Computing Maximum Cycle Decomposition Experimental results Summary and future directions 9/17/2018

Maximum cycle decomposition What if there still are some duplicates? Given any two genomes without duplicated genes, the (revised) HP formula for computing the rearrangement distance between the two genomes is as follows: Genome rearrangement distance: (Hannenhalli and Pevaner, 1995; Tesler, 2002; Ozery-Flato and Shamir, 2003) We could approximate the minimum rearrangement distance between two genomes by decomposing the complete-breakpoint graph to maximize , where is the number of cycles and paths and is the number of . 9/17/2018

MSOAR MSOAR is a high-throughput system for ortholog assignment between closely related genomes. MSOAR employs a heuristic algorithm to calculate the rearrangement/duplication (RD) distance between two genomes using the sub-optimal assignment rules, MCSP and MCD, which can be used to reconstruct a most parsimonious evolutionary scenario. MSOAR extends SOAR by allowing for multi-chromosomal genomes and the detection of inparalogs. 9/17/2018

“Noise” gene pair detection The previous steps determine a one-to-one gene matching between two genomes. Unmatched genes are removed and marked as inparalogs. Remove gene pairs whose deletion decreases the rearrangement distance by at least two. Since each pair incurs two duplications, the RD distance will not increase: These deleted genes form inparalogs. 9/17/2018

An outline of MSOAR Dataset A Dataset B Homology search: 1. Apply all-vs.-all comparison by BLASTp 2. Only select the blast hits with similarity score above cutoff 3. Keep up to five top bi-directional best hits List of orthologous gene pairs output Assign orthologs by minimizing RD distance: 1. Apply suboptimal rules 2. Apply minimum common partition 3. Maximum graph decomposition 4. Detect inparalogs by identifying “noise” gene pairs 2. Apply minimum common substring partition 3. Maximum cycle decomposition 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Computing Minimum Common Substring Partition Computing Maximum Cycle Decomposition Experimental results Summary and future directions 9/17/2018

Simulated data test Simulated genome : 100 distinct genes Simulated genome : Randomly perform reversals on to obtain another genome Experiments One: Randomly copy some genes and insert them back into Two: Randomly copy some genes and insert them back into and (Inserted genes are inparalogs by definition.) 9/17/2018

Simulated data test Randomly generate two genomes ( , , , ) Average on 20 random instances for each parameter set Our heuristic algorithm v.s. the iterated exemplar algorithm (Sankoff, Bioinformatics, 1999) 9/17/2018

Real data Homo sapiens: Mus musculus: Build 36.1 human genome assembly (UCSC hg18, March 2006) 20161 protein sequences in total Mus musculus: Build 36 mouse genome assembly (UCSC mm8, February 2006) 19199 protein sequences in total 9/17/2018

MSOAR vs. Inparanoid Validation: Official gene symbols extracted from the UniProt release 6.0 (September 2005) For 20161 human protein sequences and 19199 mouse protein sequences, MSOAR assigned 14362 orthologs between Human and Mouse, among which 11050 are true positives, 1748 are unknown pairs and 1508 are false positives, resulting in a sensitivity of 92.26% and a specificity of 87.99%. The comparison between MSOAR and Inparanoid (Remm et al., J. Mol. Biol., 2001) 9/17/2018

MSOAR vs. Inparanoid Human chromosome 20 Mouse chromosome 2 SNRPB STK35 TGM3 TGM6 ZNF343 TMC2 NOL5A IDH3B Snrpb Stk35 Tgm3 Tgm6 Tmc2 Nol5a Idh3b Mouse chromosome 2 The ortholog pair SNRPB (Human) and Snrpb (Mouse) are not bi-directional best hits, which could be missed by the sequence-similarity based ortholog assignment methods like Inparanoid. 9/17/2018

Number of main ortholog pairs assigned by MSOAR across the chromosome pairs 9/17/2018

An alignment between syntenic blocks and MSOAR blocks 9/17/2018

Validation by HCOP The HGNC Comparison of Orthology Predictions (HCOP) is a tool that integrates and displays the human-mouse orthology assertions made by Ensembl, Homologene, Inparanoid, PhIGS, MGD and HGNC. (http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/hcop.pl) 9/17/2018

Other validations By PANTHER protein sequence classification (ftp://ftp.pantherdb.org/sequence_classifications/) MSOAR identified 14083 ortholog pairs with valid Geneid between human and mouse, among which 11887 pairs have both orthologous genes in the same protein subfamily. 9/17/2018

Outline An introduction to orthology Previous ortholog assignment methods Ortholog assignment via genome rearrangement An introduction to genome rearrangement Computing signed reversal distance with duplicates Computing Minimum Common Substring Partition Computing Maximum Cycle Decomposition Experimental results Summary and future directions 9/17/2018

Summary and future work Presented a novel approach to assign orthologs between two genomes via genome rearrangement and gene duplication Introduced a rearrangement/duplication (RD) distance for genome comparisons Proposed a heuristic algorithm for assigning orthologs under maximum parsimony Developed a high-throughput system for ortholog assignment (MSOAR) Tested the system on simulated data and real genomic data of human and mouse MSOAR vs. Iterated exemplar algorithm MSOAR vs. Inparanoid Various validation methods Future directions More efficient algorithms for MCSP and MCD Refine the evolutionary model for MSOAR (transposition, tandem duplication, gene loss, etc.) Ortholog assignment for multiple genome comparison More explicit treatment of one-to-many and many-to-many orthology relationship 9/17/2018

References X. Chen, J. Zheng, Z. Fu, P. Nan, Y. Zhong, S. Lonardi, and T. Jiang. Computing the assignment of orthologous genes via genome rearrangement. Proc. 3rd Asia-Pacific Bioinformatics Conference (APBC), 2005, pp. 363-378. X. Chen, J. Zheng, Z. Fu, P. Nan, Y. Zhong, S. Lonardi, and T. Jiang. Assignment of orthologous genes via genome rearrangement. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB) 2-4, pp. 302-315, 2005. Z. Fu, X. Chen, V. Vacic, P. Nan, Y. Zhong, and T. Jiang. A parsimony approach to genome-wide ortholog assignment. Proc. 10th Annual International Conference on Research in Computational Molecular Biology (RECOMB), 2006, pp. 578-594. Z. Fu, X. Chen, V. Vacic, P. Nan, Y. Zhong, and T. Jiang. MSOAR: A High-throughput ortholog assignment system based on genome rearrangement. Submitted, 2007. Z. Fu and T. Jiang. Clustering of main orthologs for multiple genomes. To be presented at LSI Conference on Computational Systems Biology (CSB), 2007. 9/17/2018

Acknowledgement NSF DoE Genomes to Life (GtL) program National Key Project for Basic Research NSFC Changjiang Visiting Professorship, Tsinghua Univ. Discussion with Marek Chrobak, Petr Kolman, and Lan Liu on MCSP and MCIP 9/17/2018