School of CSE, Georgia Tech

Slides:



Advertisements
Similar presentations
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Advertisements

Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Enhance the Understanding of Whole-Genome Evolution by Designing, Accelerating and Parallelizing Phylogenetic Algorithms Zhaoming Yin Advisor: David A.
DCJUC: A Maximum Parsimony Simulator for Constructing Phylogenetic Tree of Genomes with Unequal Contents Zhaoming Yin Bader-Polo Joint Group Meeting, Nov.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Plant Molecular Systematics (Phylogenetics). Systematics classifies species based on similarity of traits and possible mechanisms of evolution, a change.
Summer Bioinformatics Workshop 2008 Comparative Genomics and Phylogenetics Chi-Cheng Lin, Ph.D., Professor Department of Computer Science Winona State.
Phylogenetic reconstruction
Comparative genomics Joachim Bargsten February 2012.
High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.
Tree Reconstruction.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Lecture 7 – Algorithmic Approaches Justification: Any estimate of a phylogenetic tree has a large variance. Therefore, any tree that we can demonstrate.
Sequence Similarity Searching Class 4 March 2010.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Bioinformatics and Phylogenetic Analysis
Large-Scale Global Alignments Multiple Alignments Lecture 10, Thursday May 1, 2003.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
FPGA Acceleration of Phylogeny Reconstruction for Whole Genome Data Jason D. Bakos Panormitis E. Elenis Jijun Tang Dept. of Computer Science and Engineering.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
Branch and Bound Algorithm for Solving Integer Linear Programming
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Building Phylogenies Parsimony 1. Methods Distance-based Parsimony Maximum likelihood.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
Phylogenetic trees Sushmita Roy BMI/CS 576
Sequence Alignment and Phylogenetic Prediction using Map Reduce Programming Model in Hadoop DFS Presented by C. Geetha Jini (07MW03) D. Komagal Meenakshi.
Multiple Sequence Alignments and Phylogeny.  Within a protein sequence, some regions will be more conserved than others. As more conserved,
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
Genome Rearrangement By Ghada Badr Part II. 2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
ARE THESE ALL BEARS? WHICH ONES ARE MORE CLOSELY RELATED?
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
GRAPPA: Large-scale whole genome phylogenies based upon gene order evolution Tandy Warnow, UT-Austin Department of Computer Sciences Institute for Cellular.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Recombination and Pedigrees Genealogies and Recombination: The ARG Recombination Parsimony The ARG and Data Pedigrees: Models and Data Pedigrees & ARGs.
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
CSE 5290: Algorithms for Bioinformatics Fall 2009
Multiple Alignment and Phylogenetic Trees
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Greedy Algorithms And Genome Rearrangements
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Multiple Genome Rearrangement
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

School of CSE, Georgia Tech Analysis of Real World NP-Complete Graph Problem: DCJ Median Algorithm to Find Ancestor of Genome of Three Zhaoming Yin School of CSE, Georgia Tech

Foundamentals Sequence alignment is a way of arranging the sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. ...

Foundamentals In 1980s Jeffrey Palmer studied evolution of plant organelles by comparing mitochondrial genomes of the cabbage and turnip, 99% similarity between genes, These surprisingly identical gene sequences differed in gene order, This study helped pave the way to analyzing genome rearrangements in molecular evolution. 1 2 3 4 5 6 7 8 9 10 Inversion: 1 2 –6 –5 -4 -3 7 8 9 10 Transposition: 1 2 7 8 3 4 5 6 9 10 Inverted Transposition: 1 2 7 8 –6 -5 -4 -3 9 10

Foundamentals Maximal Parsimony Phylogeny is to optimize each ancestral node of an unrooted phylogeny in terms of its three or more immediate neighbours, modern or ancestral, and to iterate across the tree until convergence of the objective function (to a local optimum) at all nodes.

Break Point Graph 1 2 -1 2 1 2 3 4 5 6 1 -5 -2 3 -6 -4 0/+1 1/-1 2/+2 3/-2 1/-1 0/+1 2/+2 3/-2 11/-6 0/+1 1/-1 2/+2 3/-2 4/+3 5/-3 6/+4 7/-4 8/+5 9/-5 10/+6 1 2 3 4 5 6 1 -5 -2 3 -6 -4

MBG/0-Matching -6 -3 -2 1 2 3 4 5 6 +5 +3 +1 1 -5 -2 3 -6 -4 -4 +4 +2 -1 1 3 5 -4 6 -2 +6 -5 1 -5 -3 2 -4 6

Subgraph/Decomposer 1 2 3 4 5 6 1 -5 -2 3 -6 -4 1 3 5 -4 6 -2 -3 -2 1 2 3 4 5 6 +5 +3 +1 1 -5 -2 3 -6 -4 -4 +4 +2 -1 1 3 5 -4 6 -2 Subgraph +6 -5 1 -5 -3 2 -4 6 H-crossing

Adequate Subgraph Definition: In an MBG for a set of genomes G, a connected subgraph H of size m is an adequate subgraph if cmax(H) ≥ 1/2mNG; it is strongly adequate if cmax(H) >1/2mNG. (m is the size of node in the subgraph, NG is the size of genome, which is 3 for the median of three problem). Property: A Adequate Subgraph is simple, if it does not contain another adequate subgraph. Lemma: A Adequate Subgraph is a decomposer.

Adequate Subgraph

Algorithm: AS1() for each v do if v[0]=v[1] or v[0]=v[2] or v[1]=v[2] major set for each v do if v[0]=v[1] or v[0]=v[2] or v[1]=v[2] these two points are AS; the edge conncecting them is major set; endif endfor

Adequate Subgraph √ √

Algorithm: AS2() c c c1 c2 c2 c1 (1) (2) c2 c1 c c1 c c2 (1) (2) for each color c do for each v do if v[c1][c]=v[c][c2](1) or v[c2][c]=v[c][c1] (2) or v[c2][c1]=v[c][c2] (3) or v[c1][c2]=v[c][c1] v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) or (2), major set is (v,v[c2]) and (v[c],v[c1]) or (3), major set is (v,v[c]) and (v[c1],v[c][c2]) or (4), major set is (v,v[c]) and (v[c2],v[c][c1]) endif endfor

Algorithm: AS2() c c2 c1 c1 c2 c2 c1 c c (1) (2) for each color c do for each v do if v[c1][c]=v[c][c1] and (v[c1]=v[c][c2] || v[c1]=v[c][c2) (1) or v[c1][c2]=v[c][c1] and (v[c1]=v[c][c2] || v[c1]=v[c][c2) (2) v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) or (2), major set is (v,v[c2]) and (v[c],v[c1]) endif endfor

Algorithm: AS2() for each color c do for each v do if v[c1][c]=v[c][c1] and (v[c2][c]=v[c][c2] and v[c1]!=v[c][c2] and v[c2] !v[c][c1] v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) endif endfor c2 c c1

Algorithm: AS2() In this case, there are two major sets for each color c do for each v do if v[c1][c]=v[c][c1] and type three is not find v,v[c],v[c1],v[c2] are AS; (1), major set is (v,v[c1]) and (v[c],v[c2]) and (v,v[c]) and (v[c1],v[c][c1]) endif endfor In this case, there are two major sets c c1

Adequate Subgraph √ √ √ √ √ √

Algorithm: AS4()--type 5-3-5 c2 c1 po1 core po2 c0 po11 po22 po0

Adequate Subgraph √ √ √ √ √ √ √ √ √ √

Algorithm: AS4()

Adequate Subgraph √ √ √ √ √ √ √ √ √ √ √ √ √

Algorithm: AS4()

Algorithm: AS4()

Adequate Subgraph √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √ √

Algorithm: Shrink() 11 5 3 8 4 7 6 2 1 10 9

Algorithm: Shrink() 11 5 2 5 3 4 6 1 7 6

Algorithm: Shrink() 11 5 2 5 3 4 6 1 7 6

Branch and Bound Algorithm

Branch and Bound Algorithm(1) If there is no brach that has the current upper bound, decrease it. No element in the memory, load others from disk.

Branch and Bound Algorithm(2) Get a intermediate sub- graph, and check if it could be trimed, or it is the final solution. If too much elems in the memory store them in the disk.

Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: therefore, the upperbound for circle number is:

Upperbound and Lowerbound-Upperbound DCJ distance between genomes obey triangular inequality. So: Given Three genomes G1 G2 G3, the median genome will have the distance between them: Because the distance is defined by: therefore, the upperbound for circle number is:

Best First Search Because best first search can ensure that the searching space is minimal. However, it needs much space to store the foot print. Which makes the branch and bound algorithm an I/O bound algorithm. 1 2 3 4 k k+1 k+1 5 6 7 7 3 1 8 9 10 9 5 2 10 6 4 8

Reference [1] Andrew Wei Xu and David Sankoff, Decompositions of multiple breakpoint graphs and rapid exact solutions to the median problem., K.A. Crandal l and J. Lagergren (Eds.): Proceedings of the Workshop on Algorithms in Bioinformatics, WABI 2008, Lecture Notes in Bioinformatics 5251,Springer. [2] Yancopoulos, S., Attie, O., Friedberg, R.: E?cient sorting of genomic permutations by translocation, inversion and block interchange. Bioinform. 21, 3340ĺC3346 (2005) [3] Andrew Wei Xu, A Fast and Exact Algorithm for the Median of three Problem: a Graph Decomposition Approach., Journal of computational biology, 2009, 16(10), 1-13.