Genome Rearrangements CSCI 7000-005: Computational Genomics Debra Goldberg

Slides:



Advertisements
Similar presentations
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Advertisements

Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
School of CSE, Georgia Tech
(CHAPTER 8- Brooker Text) Chromosome Structure & Recombination Nov 1 & 6, 2007 BIO 184 Dr. Tom Peavy.
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
Gene an d genome duplication Nadia El-Mabrouk Université de Montréal Canada.
LECTURE 22 LARGE-SCALE CHROMOSOME CHANGES II  chapter 15  overview  chromosome number  chromosome structure  problems.
Sorting Cancer Karyotypes by Elementary Operations Michal Ozery-Flato and Ron Shamir School of Computer Science, Tel Aviv University.
Bioinformatics Chromosome rearrangements Chromosome and genome comparison versus gene comparison Permutations and breakpoint graphs Transforming Men into.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
Introduction to Bioinformatics Algorithms Greedy Algorithms And Genome Rearrangements.
Discrete Structures & Algorithms EECE 320 : UBC : Spring 2009 Matei Ripeanu 1.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Genome Rearrangements. Basic Biology: DNA Genetic information is stored in deoxyribonucleic acid (DNA) molecules. A single DNA molecule is a sequence.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
Genome Rearrangement SORTING BY REVERSALS Ankur Jain Hoda Mokhtar CS290I – SPRING 2003.
Genome Rearrangements, Synteny, and Comparative Mapping CSCI 4830: Algorithms for Molecular Biology Debra S. Goldberg.
1 Genome Rearrangements João Meidanis São Paulo, Brazil December, 2004.
Phylogenetic Tree Construction and Related Problems Bioinformatics.
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
Pancakes With A Problem Steven Rudich The chef at our place is sloppy, and when he prepares a stack of pancakes they come out all different sizes.
Genome Analysis Determine locus & sequence of all the organism’s genes More than 100 genomes have been analysed including humans in the Human Genome Project.
Pancakes With A Problem! Great Theoretical Ideas In Computer Science Vince Conitzer COMPSCI 102 Fall 2007 Lecture 1 August 27, 2007 Duke University.
CompSci 102 Spring 2012 Prof. Rodger January 11, 2012.
Pancakes With A Problem! Great Theoretical Ideas In Computer Science Anupam Gupta CS Fall 2O05 Lecture 1 Aug 29 th, 2OO5 Aug 29 th, 2OO5 Carnegie.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Greedy Algorithms And Genome Rearrangements An Introduction to Bioinformatics Algorithms (Jones and Pevzner)
Semester Project: Greedy Algorithms and Genome Rearrangements August/17/2012 Name: Xuanyu Hu Professor: Elise de Doncker.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Greedy Algorithms And Genome Rearrangements
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Sorting by Cuts, Joins and Whole Chromosome Duplications
Reconstructing Genomic Architectures of Tumor Genomes Pavel Pevzner and Ben Raphael Department of Computer Science & Engineering University of California,
Andrew’s Leap 2011 Pancakes With A Problem Steven Rudich.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Gene: A sequence of nucleotides coding for protein Gene Prediction Problem: Determine the beginning and end positions of genes in a genome Gene Prediction:
Comparative genomics Haixu Tang School of Informatics.
Pancakes With A Problem! Great Theoretical Ideas In Computer Science Steven Rudich CS Spring 2005 Lecture 3 Jan 18, 2005 Carnegie Mellon University.
1 Chapter 5-1 Greedy Algorithms Slides by Kevin Wayne. Copyright © 2005 Pearson-Addison Wesley. All rights reserved.
Genome Rearrangement By Ghada Badr Part I.
Introduction to Bioinformatics Algorithms Chapter 5 Greedy Algorithms and Genome Rearrangements By: Hasnaa Imad.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
Outline Today’s topic: greedy algorithms
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
Lecture 4: Genome Rearrangements. End Sequence Profiling (ESP) C. Collins and S. Volik (UCSF Cancer Center) 1)Pieces of tumor genome: clones ( kb).
Lecture 2: Genome Rearrangements. Outline Cancer Sequencing Transforming Cabbage into Turnip Genome Rearrangements Sorting By Reversals Pancake Flipping.
CSCI2950-C Genomes, Networks, and Cancer
Original Synteny Vincent Ferretti, Joseph H. Nadeau, David Sankoff, 1996 Presented by: Suzy Sun.
Pancakes With A Problem!
Genome Rearrangement and Duplication Distance
Genome Rearrangements
CSE 5290: Algorithms for Bioinformatics Fall 2011
CSE 5290: Algorithms for Bioinformatics Fall 2009
Mutations Changes in the genetic material Gene Mutations
Prepared by Chen & Po-Chuan 2016/03/29
Greedy (Approximation) Algorithms and Genome Rearrangements
Lecture 3: Genome Rearrangements and Duplications
CSCI2950-C Lecture 4 Genome Rearrangements
Mattew Mazowita, Lani Haque, and David Sankoff
Greedy Algorithms And Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
JAKUB KOVÁĆ, ROBERT WARREN, MARÍLIA D.V. BRAGA and JENS STOYE
Presentation transcript:

Genome Rearrangements CSCI : Computational Genomics Debra Goldberg

Turnip vs Cabbage Share a recent common ancestor Look and taste different

Turnip vs Cabbage Comparing mtDNA gene sequences yields no evolutionary information 99% similarity between genes These surprisingly identical gene sequences differed in gene order This study helped pave the way to analyzing genome rearrangements in molecular evolution

Turnip vs Cabbage: Different mtDNA Gene Order Gene order comparison:

Turnip vs Cabbage: Different mtDNA Gene Order Gene order comparison:

Turnip vs Cabbage: Different mtDNA Gene Order Gene order comparison:

Turnip vs Cabbage: Different mtDNA Gene Order Gene order comparison:

Turnip vs Cabbage: Gene Order Comparison Before After Evolution is manifested as the divergence in gene order

Transforming Cabbage into Turnip

Reversals , 2, 3, -8, -7, -6, -5, -4, 9, 10 Blocks represent conserved genes. In the course of evolution or in a clinical context, blocks 1,…,10 could be misread as 1, 2, 3, -8, -7, -6, -5, -4, 9, 10. 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 Blocks represent conserved genes.

Types of Mutations

Types of Rearrangements Reversal

Types of Rearrangements Translocation

Types of Rearrangements Fusion Fission

What are the similarity blocks and how to find them? What is the architecture of the ancestral genome? What is the evolutionary scenario for transforming one genome into the other? Unknown ancestor ~ 75 million years ago Mouse (X chrom.) Human (X chrom.) Genome rearrangements

Why do we care?

SKY (spectral karyotyping)

Robertsonian Translocation 1314

Robertsonian Translocation Translocation of chromosomes 13 and 14 No net gain or loss of genetic material: normal phenotype. Increased risk for an abnormal child or spontaneous pregnancy loss 1314

Philadelphia Chromosome 9 22

Philadelphia Chromosome Seen in about 90% of patients with Chronic myelogenous leukemia (CML) A translocation between chromosomes 9 and 22 (part of 22 is attached to 9) 922

Inversion 16

Inversion Chromosome on the right is inverted near the centromere 16

Colon cancer

Colon Cancer

Comparative Mapping: Mouse vs Human Genome Humans and mice have similar genomes, but their genes are ordered differently ~245 rearrangements –Reversals –Fusions –Fissions –Translocations

Waardenburg’s Syndrome: Mouse Provides Insight into Human Genetic Disorder Characterized by pigmentary dysphasia Gene implicated linked to human chromosome 2 It was not clear where exactly on chromosome 2

Waardenburg’s syndrome and splotch mice A breed of mice (with splotch gene) had similar symptoms caused by the same type of gene as in humans Scientists identified location of gene responsible for disorder in mice Finding the gene in mice gives clues to where same gene is located in humans

Reversals: Example  =  

Reversals: Example  =  

Reversals and Gene Orders Gene order represented by permutation   1   i-1  i  i  j-1  j  j  n   1   i-1  j  j  i+1  i  j  n Reversal  ( i, j ) reverses (flips) the elements from i to j in  ,j)

Reversal Distance Problem Goal: Given two permutations, find shortest series of reversals to transform one into another Input: Permutations  and  Output: A series of reversals  1,…  t transforming  into  such that t is minimum t - reversal distance between  and  d( ,  ) = smallest possible value of t, given  

Sorting By Reversals Problem Goal: Given a permutation, find a shortest series of reversals that transforms it into the identity permutation (1 2 … n ) Input: Permutation  Output: A series of reversals  1, …  t transforming  into the identity permutation such that t is minimum min t =d(  ) = reversal distance of 

Sorting by reversals Example: 5 steps

Sorting by reversals Example: 4 steps What is the reversal distance for this permutation? Can it be sorted in 3 steps?

Pancake Flipping Problem Chef prepares unordered stack of pancakes of different sizes The waiter wants to sort (rearrange) them, smallest on top, largest at bottom He does it by flipping over several from the top, repeating this as many times as necessary Christos Papadimitrou and Bill Gates flip pancakes

Sorting By Reversals: A Greedy Algorithm Example: permutation  = First three elements are already in order prefix(  ) = length of already sorted prefix – prefix(  ) = 3 Idea: increase prefix(  ) at every step

Doing so,  can be sorted Number of steps to sort permutation of length n is at most (n – 1) Greedy Algorithm: An Example

Greedy Algorithm: Pseudocode SimpleReversalSort(  ) 1 for i  1 to n – 1 2 j  position of element i in  (i.e.,  j = i) 3 if j ≠i 4    *  (i, j) 5 output  6 if  is the identity permutation 7 return

Analyzing SimpleReversalSort SimpleReversalSort does not guarantee the smallest number of reversals and takes five steps on  = : Step 1: Step 2: Step 3: Step 4: Step 5:

But it can be sorted in two steps:  = –Step 1: –Step 2: So, SimpleReversalSort(  ) is not optimal Optimal algorithms are unknown for many problems; approximation algorithms used Analyzing SimpleReversalSort (cont ’ d)

Approximation Algorithms These algorithms find approximate solutions rather than optimal solutions The approximation ratio of an algorithm A on input  is: A(  ) / OPT(  ) where A(  ) -solution produced by algorithm A OPT(  ) - optimal solution of the problem

Approximation Ratio / Performance Guarantee Approximation ratio (performance guarantee) of algorithm A: max approximation ratio of all inputs of size n –For algorithm A that minimizes objective function (minimization algorithm): max |  | = n A(  ) / OPT(  ) –For maximization algorithm: min |  | = n A(  ) / OPT(  )

 =    2  3 …  n-1  n A pair of elements  i and  i + 1 are adjacent if  i+1 =  i + 1 For example:  = (3, 4) or (7, 8) and (6,5) are adjacent pairs Adjacencies and Breakpoints

There is a breakpoint between any adjacent element that are non-consecutive:  = Pairs (1,9), (9,3), (4,7), (8,2) and (2,5) form breakpoints of permutation  b(  ) - # breakpoints in permutation   Breakpoints: An Example

Adjacency & Breakpoints An adjacency – consecutive A breakpoint – not consecutive π = adjacencies breakpoints Extend π with π 0 = 0 and π n+1 = n+1

Add  0 =0 and  n + 1 =n+1 at ends of  Example: Extending with 0 and 10 Note: A new breakpoint was created after extending Extending Permutations  =  =

 Each reversal eliminates at most 2 breakpoints.  = b(  ) = b(  ) = b(  ) = b(  ) = 0 Reversal Distance and Breakpoints This implies: reversal distance ≥ #breakpoints / 2

Sorting By Reversals: A Better Greedy Algorithm BreakPointReversalSort(  ) 1 while b(  ) > 0 2 Among all possible reversals, choose reversal  minimizing b(   ) 3     (i, j) 4 output  5 return Problem: this algorithm may work forever

Strips Strip: an interval between two consecutive breakpoints in a permutation –Decreasing strip: elements in decreasing order (e.g. 6 5) –Increasing strip: elements in increasing order (e.g. 7 8) –Consider single-element strips decreasing except strips 0 and n+1 are increasing

Reducing the Number of Breakpoints Theorem 1: If permutation  contains at least one decreasing strip, then there exists a reversal  which decreases the number of breakpoints (i.e. b(   ) < b(  ) )

Things To Consider For  = b(  ) = 5 –Choose decreasing strip with the smallest element k in  (k = 2 in this case)

Things To Consider For  = b(  ) = 5 –Choose decreasing strip with the smallest element k in  (k = 2 in this case)

Things To Consider For  = b(  ) = 5 –Choose decreasing strip with the smallest element k in  (k = 2 in this case) –Find k – 1 in the permutation

Things To Consider For  = b(  ) = 5 –Choose decreasing strip with the smallest element k in  (k = 2 in this case) –Find k – 1 in the permutation –Reverse segment between k and k-1: b(  ) = 4

Reducing the Number of Breakpoints Again –If there is no decreasing strip, there may be no reversal  that reduces the number of breakpoints (i.e. b(   ) ≥ b(  ) for any reversal  ). –By reversing an increasing strip ( # of breakpoints stay unchanged ), we will create a decreasing strip at the next step. Then the number of breakpoints will be reduced in the next step (theorem 1).

Things To Consider (cont’d) There are no decreasing strips in , for:  = b(  ) = 3   (3,4)= b(  ) = 3  (3,4) does not change the # of breakpoints  (3,4) creates a decreasing strip thus guaranteeing that the next step will decrease the # of breakpoints.

ImprovedBreakpointReversalSort ImprovedBreakpointReversalSort(  ) 1 while b(  ) > 0 2 if  has a decreasing strip 3 Among all possible reversals, choose reversal  that minimizes b(   ) 4 else 5 Choose a reversal  that flips an increasing strip in  6     7 output  8 return

ImprovedBreakPointReversalSort is an approximation algorithm with a performance guarantee of at most 4 –It eliminates at least one breakpoint in every two steps; at most 2b(  ) steps –Approximation ratio: 2b(  ) / d(  ) –Optimal algorithm eliminates at most 2 breakpoints in every step: d(  )  b(  ) / 2 –Performance guarantee: ( 2b(  ) / d(  ) )  [ 2b(  ) / (b(  ) / 2) ] = 4 Performance Guarantee

Signed Permutations Up to this point, reversal sort algorithms sorted unsigned permutations But genes have directions… so we should consider signed permutations 5’5’ 3’3’  =

Signed Permutation Genes are directed fragments of DNA Genes in the same position but different orientations do not have same gene order These two permutations are not equivalent gene sequences

Signed permutations are easier! Polynomial time (optimal) algorithm is known