Genome Rearrangement By Ghada Badr Part II. 2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome.

Slides:



Advertisements
Similar presentations
Lecture 15. Graph Algorithms
Advertisements

A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
ECE 667 Synthesis and Verification of Digital Circuits
MATH 224 – Discrete Mathematics
School of CSE, Georgia Tech
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
Parsimony based phylogenetic trees Sushmita Roy BMI/CS 576 Sep 30 th, 2014.
Chapter 3 The Greedy Method 3.
High-Performance Algorithm Engineering for Computational Phylogenetics [B Moret, D Bader] Kexue Liu CMSC 838 Presentation.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Genome Halving – work in progress Fulton Wang ACGT Group Meeting.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
FPGA Acceleration of Phylogeny Reconstruction for Whole Genome Data Jason D. Bakos Panormitis E. Elenis Jijun Tang Dept. of Computer Science and Engineering.
Genome Rearrangements CSCI : Computational Genomics Debra Goldberg
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Tirgul 13. Unweighted Graphs Wishful Thinking – you decide to go to work on your sun-tan in ‘ Hatzuk ’ beach in Tel-Aviv. Therefore, you take your swimming.
. Comput. Genomics, Lecture 5b Character Based Methods for Reconstructing Phylogenetic Trees: Maximum Parsimony Based on presentations by Dan Geiger, Shlomo.
FPGA Acceleration of Gene Rearrangement Analysis Jason D. Bakos Dept. of Computer Science and Engineering University of South Carolina Columbia, SC USA.
Building Phylogenies Parsimony 2.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
A Linear-Time Algorithm for Computing Inversion Distance between signed Permutations with an experimental Study David Bader, Bernard Moret, Mi Yan Presented.
Backtracking.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
Combinatorial and Statistical Approaches in Gene Rearrangement Analysis Jijun Tang Computer Science and Engineering University of South Carolina
A Simplified View of DCJ-Indel Distance Phillip Compeau A Simplified View of DCJ- Indel Distance Phillip Compeau University of California-San Diego Department.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Binary Encoding and Gene Rearrangement Analysis Jijun Tang Tianjin University University of South Carolina (803)
Genome Rearrangements Anne Bergeron, Comparative Genomics Laboratory Université du Québec à Montréal Belle marquise, vos beaux yeux me font mourir d'amour.
16. Lecture WS 2004/05Bioinformatics III1 V16 – genome rearrangement Important information – contained in the order in which genes occur on the genomes.
TCP Traffic and Congestion Control in ATM Networks
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2010.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
Sorting by Cuts, Joins and Whole Chromosome Duplications
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Evolutionary tree reconstruction
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
Flows in Planar Graphs Hadi Mahzarnia. Outline O Introduction O Planar single commodity flow O Multicommodity flows for C 1 O Feasibility O Algorithm.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Parsimony-Based Approaches to Inferring Phylogenetic Trees BMI/CS 576 Colin Dewey Fall 2015.
1 Alignment Matrix vs. Distance Matrix Sequence a gene of length m nucleotides in n species to generate an… n x m alignment matrix n x n distance matrix.
Subtree Prune Regraft & Horizontal Gene Transfer or Recombination.
Genome Rearrangement By Ghada Badr Part I.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Iterative Improvement for Domain-Specific Problems Lecturer: Jing Liu Homepage:
WABI: Workshop on Algorithms in Bioinformatics
Greedy Technique.
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Conservation of Combinatorial Structures in Evolution Scenarios
Courtsey & Copyright: DESIGN AND ANALYSIS OF ALGORITHMS Courtsey & Copyright:
Lecture 3: Genome Rearrangements and Duplications
Greedy Algorithms And Genome Rearrangements
Multiple Genome Rearrangement
Minimum Spanning Tree Algorithms
A Unifying View of Genome Rearrangement
Backtracking and Branch-and-Bound
Double Cut and Join with Insertions and Deletions
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Genome Rearrangement By Ghada Badr Part II

2  Genomes can be modeled by each gene can be assigned a unique number and is exactly found once in the genome. Genome Models permutations:  Signed Permutation: Each gene may be assigned + or - sign to indicate the strand it resides on.  Unsigned Permutation: If the corresponding strand is unknown.

3 Our problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another. Genome Rearrangement What are the Rearrangement events (Operation)?

4 Rearrangement operations affect gene order and gene content. There are various types: In case of single-chromosome genome: Inversions Transpositions Reverse transpositions Gene Duplications Gene loss In case of multiple-chromosomes genomes we add: Translocations fusions fissions Rearrangement Operations

5 Rearrangement Problems Our problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another. Any set of operations yields a distance between genomes, by counting the minimum number of operations needed to transform one genome into the other.

6 Rearrangement Problems Our problem: Given a set of genomes and a set of possible evolutionary events (operations), find a shortest set of events transforming those genomes into one another. Computing the distance d(  )  Computing one optimal sorting sequence of events. Two classical problems

7 Rearrangement Operations Can we have a unifying framework in which circular and linear chromosomes can coexist throughout evolving genomes? Can we have a unifying view of Genome Rearrangements? (Bergeron 2006) A Double Cut and Join Operation DCJ was introduced.

8 Rearrangement Operations - DCJ Double Cut-and-Join DCJ was first proposed by Yancopoulos et. al. (2005). Allows to model all the classical operations (inversions, translocations, fissions, fusions, transposition, and block interchanges) with a single operation. This general model accounts for the genomic evidence of the coexistence of both linear and circular chromosomes in many genomes. Both the DCJ sorting and distance problems can be solved in O(n) time by Bergeron et. al. (2006)

9 Rearrangement Operations - DCJ A gene a is an oriented sequence of DNA that starts with a tail at and ends with a head ah. Two consecutive genes do not necessarily have the same orientation, thus adjacency of two consecutive genes a and b, can be of four different types: {ah,bt},{ah,bh},{at,bt},{at,bh} , , ,  An extremity that is not adjacent to any other gene is called telomeres by a singleton set {ah} or {at}. We can use adjacencies to represent both genomes with multiple or uni-chromosomes.

10 Rearrangement Operations - DCJ A genome is a set of adjacencies and telomeres such that the tail or head of any gene appears in exactly one adjacency or telomere. Genome A: chr1: a c -d chr2: b e chr3: f g Replace each gene by two extremities at ah ct ch dh dt bt bh et eh ft fh gt gh Adjacencies: {ah, ct}{ch, dh} {bh, et} {fh, gt} Telomere:{at} {dt} {bt} {eh}{ft}{gh} A = {{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}} Example 

11 Rearrangement Operations - DCJ DCJ operations: {p,q}{r,s}{p,r}{s,q} or { p,s} {q,r} a)

12 Rearrangement Operations - DCJ DCJ operations: {p,q}{r} {p,r}{q} or{p}{q,r} b)

13 Rearrangement Operations - DCJ DCJ operations: {q} {r} {q,r} c)

14 Rearrangement Operations - DCJ DCJ operations: Genome A: chr1: a c -d chr2: b e chr3: f g {ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh} {ah,ct}{fh, gt} -->{ah,fh}{ct,gt} Genome A: chr1: a -f chr2: b e chr3: d -c g {ah,ct}{fh, gt} -->{ah,gt}{ct,fh} Genome A: chr1: a g chr2: b e chr3: f c -d Example: Adjacencies and telomeres are:  

15 Problem: Given two genomes A and B defined on the same set of genes, find a shortest sequence of DCJ operations that transforms A into B. The length of such a sequence is called the DCJ distance between A and B, dcj(A,B). DCJ sorting and Distance problems

16 DCJ sorting and Distance problems Example: Genome A: chr1: a c -d chr2: b e chr3: f g Genome B: chr 1: a b c d chr 2: e f g Replace each gene by two extremities at ah ct ch dh dt bt bh et eh ft fh gt gh at ah bt bh ct ch dt dh et eh ft fh gt gh   A= {{ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh}} B = {{at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh}} Get adjacencies and telomeres for each genome:

17 DCJ sorting and Distance problems Greedy Algorithm to sort by DCJ: {ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh} {at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh} {ah, bt}{ch, dh} {bh, et} {fh, gt} {at} {dt} {ct} {eh}{ft}{gh} {ah, bt} {ch, dh} {bh, ct} {fh, gt} {at} {dt} {et} {eh}{ft}{gh} {ah, bt} {ch, dt} {bh, ct} {fh, gt} {at} {dh} {et} {eh} {ft}{gh} Genome A: chr1: a c -d chr2: b e chr3: f g Genome A: chr1: a b e chr2: c -d chr3: f g Genome A: chr1: a b c -d chr2: e chr3: f g Genome A: chr1: a b c d chr2: e chr3: f g Genome B: chr1: a b c d chr2: e f g

18 DCJ sorting and Distance problems Optimal and O(n) time.

19 DCJ sorting and Distance problems {ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh} {at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh} Vertices: adjacencies and telomeres Edges: connect an edge from A to B between adjacencies or telomers that have common elements. Adjacency Graph (bipartite graph): Graph can be easily constructed in O(n) time and space

20 DCJ sorting and Distance problems In each iteration: the algorithm increments C by one or I by two {at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh} Adjacency Graph (bipartite graph): IF SORTED {at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh} When sorted: n = C + I/2 dcj(A,B)  n

21 DCJ sorting and Distance problems Adjacency Graph (bipartite graph): {ah, ct}{ch, dh} {bh, et} {fh, gt} {at} {dt} {bt} {eh}{ft}{gh} {at}{ah, bt}{bh, ct}{ch, dt}{dh} {et} {eh,ft} {fh,gt} {gh} 1 cycle 4 odd paths 1 even path dcj(A,B) = n - (cycles + oddPath/2) = 7-1-4/2 = 4

22  Genome rearrangements events are rare, these changes of gene orders enable biologists to reconstruct histories far back in time.  Extend the notion of genome rearrangement distance to the optimal positioning of Steiner points in the appropriate space of a given distance metric.  Two phylogenetic versions of the Steiner Problem (the first inside the other):  Inner problem: optimizing internal nodes of a given tree, where n leaves are labeled.  Outer problem: optimizing over all trees with n leaves. Genome Rearrangement and phylogeny

23  We will discuss the inner problem defined as follows: Given a fixed phylogeny (tree) T, together with a set of K permutations (genome), each of size n corresponding to the terminal (leaf) nodes. Find a set of permutations corresponding to the internal nodes such that the total weight w(T) is minimized, where w(T) is defined as: w(T) = ∑ d(x,y) for all (x,y) in T Here d(.,.) is the genome rearrangement distance metric defined on pairs of permutations. Genome Rearrangement and phylogeny

24  Consider a heuristic for the problem of computing the internal nodes, where T is a star on three vertices.  We will study a more basic problem, the median problem.  Divide the problem on an arbitrary binary tree into a number of overlapping median problems and apply the median algorithm iteratively to search for a heuristic solution to the original problem.  internal nodes retain biological meaning, and edges represent transitions between states of genome. Genome Rearrangement and phylogeny

25  The median-based method for phylogeny reconstruction was first proposed by Sankoff and Blanchette (1998).  The idea is to build the global solution by aggregating local solutions for the simplest problem: Find a Steiner point M of three genomes.  After an initialization step, the algorithm iterates over a tree, repeatedly resetting the permutations of internal nodes to the medians of their three neighbors. Continue till a convergence occurs. Median Problem

26 Median Problem  The median of three or simply the median problem: Find a permutation  such that the sum of distances is minimized between  and each of the starting permutation  = {       }.  Find a permutation M that minimizes the median score S(  ), where: S(  ) = d 1, M + d 2,M + d 3,M

27 Median Problem Constructing phylogeny from medians

28 Median Problem  The median problem: Find a permutation  such that the sum of distances is minimized between  and each of the starting permutation  = {       }.  What are the distance measures that we can use?  Distances: breakpoint, reversal …  A breakpoint median has no straightforward biological interpretation and they are not unique.  Breakpoint medians score poorly compared to reversal medians.

29  Reversal median Problem: Find a solution to the median problem using the reversal distance.  Find a permutation  such that the sum of reversal distances is minimized between  and each of the starting genomes.  The reversal median is NP-hard problem.  Why? Reversal Median

30 Vertices: all permutations of n = 3. Edges: connect an edge between  1 and  2 if reversal distance d(  1,  2 ) = 1. Reversal Median Reversal graph for n = 3

31  distance d(  i,  k ) = shortest path between v 1 and v 2.  Finding the median is equivalent to finding the minimum Steiner tree for the graph. Reversal Median Reversal graph for n = 3

32  The graph is huge |V| = n!.2 n  A feasible graph-search algorithm is not possible! Reversal Median Reversal graph for n = 3  What technique we can use to develop an algorithm for this kind of problems?

33  We will study a branch-and-bound algorithm by Adam Siepel  This algorithm depends only on the availability of a rapidly computable distance metric. Reversal Median

34  The median score S  of a set of equally sized permutations  = {       }, separated by distances d 1,2, d 1,3, and d 2,3, obeys these bounds: d 1,2 + d 1,3 + d 2,3  S   min { (d 1,2 +d 2,3 ),(d 1,2 +d 1,3 ), (d 2,3 +d 1,3 ) } 2 Reversal Median

35 Assume that  is in the shortest path between   and the median M, and is separated from        by distances d 1, , d 1, , and d 2, , the median score S  d 2,  + d 3,  + d 2,3 d 1,  +  S   d 1,  +min{ (d 2,  +d 3,  ),(d 3,  +d 2,3 ), (d 2,3 +d 2,  ) } 2 Reversal Median

36 Algorithm (sketch):  Establish upper and lower bounds using a rapid reversal distance algorithm, M min and M max.  Start with one of the three permutations, say  .  Assume the median is M =  .  Push the corresponding vertex v in a priority stack s for the best scoring vertices.  While s is not empty  Pop the most promising vertex v from s.  If best score of v  M max then stop  Generate all possible vertices  that can be obtained from v by single reversal.  For each possible unmarked   Calculate bound for the previous equation  min,  max.  If  max = M min then M =  and stop. (median is found)  Add  to stack s only if  max < M max (pruning)  update M max =  max if  max < M max.  End for loop.  End while loop. Reversal Median

37 Algorithm (sketch):  Establish upper and lower bounds using a rapid reversal distance algorithm, M min and M max.  Start with one of the three permutations, say  .  Assume the median is M =  .  Push the corresponding vertex v in a priority stack s for the best scoring vertices.  While s is not empty  Pop the most promising vertex v from s.  If best score of v  M max then stop  Generate all possible vertices  that can be obtained from v by single reversal.  For each possible unmarked   Calculate bound for the previous equation  min,  max.  If  max = M min then M =  and stop. (median is found)  Add  to stack s only if  max < M max (pruning)  update M max =  max if  max < M max.  End for loop.  End while loop. Reversal Median

38 Algorithm (sketch):  Establish upper and lower bounds using a rapid reversal distance algorithm, M min and M max.  Start with one of the three permutations, say  .  Assume the median is M =  .  Push the corresponding vertex v in a priority stack s for the best scoring vertices.  While s is not empty  Pop the most promising vertex v from s.  If best score of v  M max then stop  Generate all possible vertices  that can be obtained from v by single reversal.  For each possible unmarked   Calculate bound for the previous equation  min,  max.  If  max = M min then M =  and stop. (median is found)  Add  to stack s only if  max < M max (pruning)  update M max =  max if  max < M max.  End for loop.  End while loop. Reversal Median

39 Algorithm (sketch):  Establish upper and lower bounds using a rapid reversal distance algorithm, M min and M max.  Start with one of the three permutations, say  .  Assume the median is M =  .  Push the corresponding vertex v in a priority stack s for the best scoring vertices.  While s is not empty  Pop the most promising vertex v from s.  If best score of v  M max then stop  Generate all possible vertices  that can be obtained from v by single reversal.  For each possible unmarked   Calculate bound for the previous equation  min,  max.  If  max = M min then M =  and stop. (median is found)  Add  to stack s only if  max < M max (pruning)  update M max =  max if  max < M max.  End for loop.  End while loop. Reversal Median

40 Algorithm (sketch):  Establish upper and lower bounds using a rapid reversal distance algorithm, M min and M max.  Start with one of the three permutations, say  .  Assume the median is M =  .  Push the corresponding vertex v in a priority stack s for the best scoring vertices.  While s is not empty  Pop the most promising vertex v from s.  If best score of v  M max then stop  Generate all possible vertices  that can be obtained from v by single reversal.  For each possible unmarked   Calculate bound for the previous equation  min,  max.  If  max = M min then M =  and stop. (median is found)  Add  to stack s only if  max < M max (pruning)  update M max =  max if  max < M max.  End for loop.  End while loop. Reversal Median

41 Algorithm (sketch):  Establish upper and lower bounds using a rapid reversal distance algorithm, M min and M max.  Start with one of the three permutations, say  .  Assume the median is M =  .  Push the corresponding vertex v in a priority stack s for the best scoring vertices.  While s is not empty  Pop the most promising vertex v from s.  If best score of v  M max then stop  Generate all possible vertices  that can be obtained from v by single reversal.  For each possible unmarked   Calculate bound for the previous equation  min,  max.  If  max = M min then M =  and stop. (median is found)  Add  to stack s only if  max < M max (pruning)  update M max =  max if  max < M max.  End for loop.  End while loop. Reversal Median O(n 3d ) with d = min{ d 1,2 + d 1,3 + d 2,3 } With faster average running time

42 Conclusions  Described Double Cut and Join DCJ operation: A unifying view of genome rearrangements.  Presented a branch and bound median-based approach for building phylogeny using reversal distance.  Many other problems in genome rearrangement as “Genome halving problem”

43 Genome Halving ac eg fb d a b cd eg f a b c deg f

44 Genome Halving ac eg fb d a b cd eg f a b c deg f Duplication

45 Genome Halving ac eg fb d a b cd eg f a b c deg f abcdegf abcdegf

46 Genome Halving ac eg fb d a b cd eg f a b c deg f abcdegf abcdegf

47 Genome Halving ac eg fb d a b cd eg f a b c deg f abcdegf abcdegf abcdegf

48 Genome Halving ac eg fb d abcdegf

49 Genome Halving ac eg fb d abcdegf abcdegf abcdegf

50 Genome Halving ac eg fb d a b cd eg f a b c deg f abcdegf abcdegf abcdegf

51 Genome Halving a b cd eg f a b c deg f abcdegf abcdegf

52 Genome Halving a b cd eg f a b c deg f abcdegf abcdegf

53 References 1. Bergeron A., A very elementary presentation of the Hannenhalli-Pevzner theory. Discrete Applied Mathematics, vol. 146, , Marília D. V. Braga. Exploring the solution space of sorting by reversals when analyzing genome rearrangements. PhD thesis, University of Claude Bernard, Guillaume Fertin, Anthony Labarre, Irena Rusu, Eric Tannier, Stephan Vialette. Combinatorics of Genome Rearrangements. The MIT Press,Cambridge, England, Siepel A. Exact algorithms for the reversal median problem. Master Thesis, University of New Mexico, Yancopoulos S., Attie O., Friedberg R. Efficient sorting of genomic permutations by translocation, inversion and block exchange. Bioinformatics 21, Anne Bergeron, Julia Mixtacki, Jens Stoye. A unifying view of Genome Rearrangements. WABI 2006, LNBI 4175, , Julia Mixtacki. Genome halving under DCJ revisited. Lecture Notes in Computer Science, Richard C. Deonier, Simon Tavere, Michael S. Waterman. Computational Genome Analysis, an introduction. Springer, Neil C. Jones, Pavel A. Pevzner. An introduction to bioinformatics algorithms. MIT press, 2004.