Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens.

Slides:



Advertisements
Similar presentations
Goal: a graph representation of the topology of a gray scale image. The graph represents the hierarchy of the lower and upper level sets of the gray level.
Advertisements

6.896: Topics in Algorithmic Game Theory Lecture 21 Yang Cai.
Great Theoretical Ideas in Computer Science
Introduction to Algorithms Quicksort
A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions Tzvika Hartman Weizmann Institute.
Sorting by reversals Bogdan Pasaniuc Dept. of Computer Science & Engineering.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Greedy Algorithms CS 466 Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix of.
Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.
To show that the Kauffman Bracket is unchanged under each of the three Reidemeister moves. First explain the basics of knot theory. Then show you what.
Label Placement and graph drawing Imo Lieberwerth.
Computational Methods for Management and Economics Carla Gomes Module 8b The transportation simplex method.
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Basic Feasible Solutions: Recap MS&E 211. WILL FOLLOW A CELEBRATED INTELLECTUAL TEACHING TRADITION.
Computability and Complexity 23-1 Computability and Complexity Andrei Bulatov Search and Optimization.
Complexity 15-1 Complexity Andrei Bulatov Hierarchy Theorem.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Introduction Sorting permutations with reversals in order to reconstruct evolutionary history of genome Reversal mutations occur often in chromosomes where.
Greedy Algorithms And Genome Rearrangements
Genome Rearrangements CIS 667 April 13, Genome Rearrangements We have seen how differences in genes at the sequence level can be used to infer evolutionary.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Great Theoretical Ideas in Computer Science.
Vertex Cut Vertex Cut: A separating set or vertex cut of a graph G is a set SV(G) such that S has more than one component. Connectivity of G ((G)): The.
Of Mice and Men Learning from genome reversal findings Genome Rearrangements in Mammalian Evolution: Lessons From Human and Mouse Genomes and Transforming.
Computational Complexity, Physical Mapping III + Perl CIS 667 March 4, 2004.
Transforming Cabbage into Turnip: Polynomial Algorithm for Sorting Signed Permutations by Reversals Journal of the ACM, vol. 46, No. 1, Jan 1999, pp
Chapter 4: Straight Line Drawing Ronald Kieft. Contents Introduction Algorithm 1: Shift Method Algorithm 2: Realizer Method Other parts of chapter 4 Questions?
7-1 Chapter 7 Genome Rearrangement. 7-2 Background In the late 1980‘s Jeffrey Palmer and colleagues discovered a remarkable and novel pattern of evolutionary.
Circle Graph and Circular Arc Graph Recognition. 2/41 Outlines Circle Graph Recognition Circular-Arc Graph Recognition.
1 Physical Mapping --An Algorithm and An Approximation for Hybridization Mapping Shi Chen CSE497 04Mar2004.
The Node Voltage Method
 Jim has six children.  Chris fights with Bob,Faye, and Eve all the time; Eve fights (besides with Chris) with Al and Di all the time; and Al and Bob.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Genome Rearrangements Tseng Chiu Ting Sept. 24, 2004.
1 A Simpler 1.5- Approximation Algorithm for Sorting by Transpositions Combinatorial Pattern Matching (CPM) 2003 Authors: T. Hartman & R. Shamir Speaker:
Prof. Swarat Chaudhuri COMP 482: Design and Analysis of Algorithms Spring 2012 Lecture 10.
UNC Chapel Hill Lin/Foskey/Manocha Minimum Spanning Trees Problem: Connect a set of nodes by a network of minimal total length Some applications: –Communication.
Genome Rearrangements Unoriented Blocks. Quick Review Looking at evolutionary change through reversals Find the shortest possible series of reversals.
Genome Rearrangements [1] Ch Types of Rearrangements Reversal Translocation
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chap ~
Chap. 7 Genome Rearrangements Introduction to Computational Molecular Biology Chapter 7.1~7.2.4.
Greedy Algorithms CS 498 SS Saurabh Sinha. A greedy approach to the motif finding problem Given t sequences of length n each, to find a profile matrix.
Data Structures & Algorithms Graphs
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Chapter 10 Graph Theory Eulerian Cycle and the property of graph theory 10.3 The important property of graph theory and its representation 10.4.
Two Discrete Optimization Problems Problem: The Transportation Problem.
Connectivity and Paths 報告人:林清池. Connectivity A separating set of a graph G is a set such that G-S has more than one component. The connectivity of G,
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Graphs Part II Lecture 7. Lecture Objectives  Topological Sort  Spanning Tree  Minimum Spanning Tree  Shortest Path.
Genome Rearrangement By Ghada Badr Part I.
Genome Rearrangements. Turnip vs Cabbage: Look and Taste Different Although cabbages and turnips share a recent common ancestor, they look and taste different.
1 Genome Rearrangements (Lecture for CS498-CXZ Algorithms in Bioinformatics) Dec. 6, 2005 ChengXiang Zhai Department of Computer Science University of.
NPC.
15.082J & 6.855J & ESD.78J September 30, 2010 The Label Correcting Algorithm.
Circuits Chapter 27 Copyright © 2014 John Wiley & Sons, Inc. All rights reserved.
An Algorithm for the Consecutive Ones Property Claudio Eccher.
LINKED LISTS.
The minimum cost flow problem
Chapter 5. Optimal Matchings
Planarity Testing.
Lecture 3: Genome Rearrangements and Duplications
Multi-Way Search Trees
CSCI2950-C Lecture 4 Genome Rearrangements
Greedy Algorithms And Genome Rearrangements
Lectures on Graph Algorithms: searching, testing and sorting
5.4 T-joins and Postman Problems
Minimum Spanning Trees
Presentation transcript:

Genome Rearrangements …and YOU!! Presented by: Kevin Gaittens

Overview Bio background Definitions and Set-up Reality-Desire Good Components Bad Components Fin

Biological Bakground Comparing entire genomes across species Need “distance” measure Interested in larger differences than just single insertions/deletions etc. Genome Rearrangements – chromosome piece (gene) being moved or copied to another location or transferring to another chromosome altogether

Definitions Block – section of genome possibly containing more than one gene; one unit Homologous – when two blocks contain the same genes. Homologous blocks have the same number label Reversal – reversing a series of blocks and also their orientations; distance is measured in number of reversals

Example of Reversal Red – right orientation Black – left orientation

Goals Want shortest number of reversals to transform one genome to another –Parsimony assumption – assume Nature changes optimally Desire polynomial time solution Oriented has a poly-time solution, unoriented NP-hard

Example Add circle if orientation changes

One solution

Breakpoints Act as a minimum Happens in the case of: –first/last label in original not the first/last label in the target –OR 2 labels are consecutive in original, but not in target –OR consecutive in original and target but duel orientation is different between blocks …5 4… and …5 4… –NOTE: If a pair of labels is an exact reversal in the target, there is NO breakpoint …4 5… and …5 4… do not have a breakpoint

Breakpoints for Last Example Goal reminder: is different than first of target No breakpoint between 1 and 2 since exact reversal in target 2 and 3 not consecutive in target 3 and 4 match, thus no breakpoint 5 is different from last in target 4 and 5 are not consecutive in target

Mathy Stuff :o) Let L be finite set of labels L 0 = U { a, a } for all a in L | x | -> remove arrows Ex: | a | = | a | = a

Cont’d Oriented permutation over L is a mapping α: [1..n] -> L 0 such that for any a ε L, there is exactly one i ε [1..n] with |α(i)| =a Basically, permutation “picks” an orientation for each label. If a is picked, then a will not be

Example n = 4 L = {1, 2, 3, 4} α = ( 2, 1, 4, 3 ) So α(3) = 4

Identity Permutation Special case Permutation I such that I(i) = i for all i between 1 and n For n = 3, I = ( 1 2 3)

Reversals Let i and j be two indices with 1 ≤ i, j ≤ n [i,j] indicates a reversal affecting elements α(i) through α(j)

Example Given α = ( 2, 3, 4, 1) α[2,3] = ( 2, 4, 3, 1) Note: similar to boxing scheme used earlier

More Math! In general: Α[i, j](k) = α(i + j – k) if i ≤ k ≤ j α(k) otherwise α(k) means reversal of orientation of α(k)

Sorting by Reversals Is the main goal Given 2 permutations α and β, seek minimum number of reversals to transform α into β Αp 1 p 2 p 3 …p t = β where p 1, p 2,…, p t are reversals t is called the reversal distance of α with respect to β and denoted by d β (α)

Sorting con’t Look for reversals that “make progress” towards β d β (αp) < d β (α) or d β (αp) = d β (α) - 1

Breakpoints Add labels L and R to α to get “extended version” One example of a α is: (L, 2, 3, 1, 6, 5, 4, R) If B is identity, then breakpoints at…

Breakpoints none at 5 4, reverse pair 4 5 is in β L R L R 2 is not the first block of β 2 and 3 are consecutive, but the orientations are different than what they need and are not a complete reversal 3 and 1 are not consecutive in β 1 and 6 are not consecutive in β 6 and 5 are consecutive, but not a complete reversal (orientation of 6 prevents it) 4 is not the final block in β

Breakpoints con’t Can remove at most 2 breakpoints with each reversal Thus, b(α) – b(αp) ≤ 2 This also means that b(α)/2 ≤ d(α) This is a lower bound for d(α)

Bps cont’d b(α)/2 is lower bound However, this is rarely achievable Want a better lower bound Look to something called reality-desire diagram

Reality-Desire Happens when 2 labels are adjacent, but do not “want” to be adjacent Reality – neighbor a certain label has in α Desire – neighbor the label has in β

Diagram Oriented labels can be viewed as a battery Positive terminal at tip of arrow Negative at tail - a +

Example ααpααp Desire Reality

Example Extended α: L R Replace labels by terminals & reality edges: L R Add desire edges

Diagram To create diagram of reality-desire: –Arrange all terminal nodes around a circle with L and R at the top –L to the left of R and all other nodes following α counterclockwise –Reality edges will be along circumference –Desire edges will be the chords

Diagram of Reality-Desire Happens where not breakpoint

Interpretation Number of cycles in RD(α) is c β (α) and is number of connected parts c β (β) has no breakpoints Notice c β (β)=n+1 –Why?

Effects of a Reversal Let (s,t) and (u,v) be two reality edges characterizing a reversal p with (s,t) preceding in the permutation α. Then RD(αp) differs from RD(α) by: 1. Reality edges (s,t) and (u,v) are replaced by (s,u) and (t,v) 2. Desire edges remain unchanged 3. The section of the circle going from node t to node u, including these extremities, in counterclockwise direction, is reversed.

Our Example Reversing (-1,-4) and (+4, +5)

Definitions Let e and f be two reality edges belonging to the same cycle in RD(α) If orientations induced by e and f coincide, they are convergent –Walk counterclockwise from start of e (passing through desire edges) until you reach the beginning of f. If the end of f is still counterclockwise, then converge Divergent otherwise

Walking Convergent Still counterclockwise (+3,+2) to (-1,-4)

How Reversals Affect Cycles If e and f belong to different cycles, c(αp)=c(α) -1

If e and f belong to the same cycles and converge c(αp)=c(α)

If e and f belong to the same cycles and diverge c(αp)=c(α) +1

Summary If e and f: belong to different cycles, c(αp)=c(α) -1 belong to same cycle & converge, c(αp)=c(α) belong to same cycle & diverge, c(αp)=c(α)+1

Lower Bound Since number of cycles changes by at most 1 per reversal, can get a new lower bound for reversals Suppose αp 1 p 2..p t =β --c β (αp 1 p 2...p t )=c β (β)=n+1 c β (αp 1 ) – c β (α) ≤ 1 c β (αp 1 p 2 ) – c β (αp 1 ) ≤ 1 … c β (αp 1...p t ) – c β (αp 1...p t-1 ) ≤ 1

Lower Bound Add to get n+1 – c β (α) ≤ t If p 1,p 2,...,p t is an optimal sorting, then t=d β (α) n+1 – c β (α) ≤ d β (α) Very good lower bound

Good/Bad Cycles A cycle is “good” if it has two divergent reality edges If not, it is considered “bad” Good cycles have at least two desire edges that cross –Not all cycles that have crossing edges are good Call cycles “proper” if they have at least four edges

Good/Bad cont’d If we only have good cycles, lower bound d(α) ≥ n+1 – c(α) is an equality How could it be possible for it to be an equality if there are a few bad cycles mixed in to start?

Interleave Twisting another cycle while breaking another is only possible if the two cycles are such that some desire edge from one of the cycles crosses some desire edge from the other These two cycles “interleave” in this case

Interleave

Interleaving Graph Important to verify which cycles interleave with which other cycles Take as nodes the proper cycles of RD(α) Two nodes adjacent iff the cycles interleave Connected components are classified as good or bad If a component contains all bad cycles, it is bad. Otherwise, it is said to be good

RD to Interleave Gray filled-in circles are good cycles

Choosing a Reversal

C is the only good cycle Let e = (L, +3), f=(-3,-4), g=(-1,+2) f & g converge, so not a good choice

e and g e and g diverge and produce 2 good components with 1 cycle each

e and f e and f produce a single good component with two cycles

Reversal Choosing cont’d A reversal characterized by two divergent edges of the same cycle is a sorting reversal iff its application does not lead to the creation of bad components So reversing e & f or e & g are both acceptable

Bad Components Good components can be sorted as in previous slide First step in dealing with bad components is to classify them Component Y “separates” components X and Z if all chords in RD(α) that link a terminal in X to one in Z cross a desire edge of Y

E separates F and D What are some other separations?

Definitions Hurdle – bad component that does not separate two bad components Nonhurdle – bad component that separates two bad components

Definitions cont’d X protects nonhurdle Y if removal of X would cause Y to become a hurdle –If anytime Y separates 2 bad components, X is one of them Superhurdle – hurdle that protects a nonhurdle Simple hurdle – does not protect a nonhurdle F protects E

Classification

Formula for Reversal Distance d(α) = n + 1 – c(α) + h(α) + f(α) h(α) = number of hurdles f(α) = 0 or 1 1 if α is a fortress A nonhurdle will become a hurdle at some point

Fortress A fortress is a permutation where there are an odd number of hurdles and all of them are super hurdles. They require an extra reversal since a nonhurdle will become a hurdle at some point

Definitions X and Y are “opposite” hurdles when we find the same number of hurdles when walking around the circle counterclockwise from X to Y as we do clockwise. Note: only when even number hurdles

Hurdle Cutting Reverse edges in same component Used only with simple hurdles

Final Algorithm While α not B: If there is a good component in RD(α) then pick two divergent edges in this component ensuring that it does not create a bad component Else if h(α) is even then return merging of two opposite hurdles else if there is a simple hurdle return a reversal cutting this hurdle else //fortress return merging of any two hurdles

Fortress Handling Fortress, so choose any 2 hurdles and merge C is good C A B

Complexity Construction RD(α) takes linear Finding the cycles is O(n) For each cycle, determine good/bad –This is O(n) per cycle, so O(n 2 ) total Determining interleaving can be done in O(n 2 ) Counting hurdles etc. can be done linearly with the other knowledge

Complexity cont’d Figuring out a Sorting Reversal for good components is the worst since need ensure we don’t create bad components Since reversal is identified with a pair of edges, O(n 2 ) reversals. For each one, O(n 2 ) time checking the resulting permutation. O(n 4 ) total We need to do this d β (α) times so O(n 5 ) all together

Final Slide, Huzzah! Found accurate distance measure for genome movements Found a poly-time solution for solving the problem Played with fun graphs