Presentation is loading. Please wait.

Presentation is loading. Please wait.

Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam.

Similar presentations


Presentation on theme: "Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam."— Presentation transcript:

1 Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam

2 Outline Greedy approach to Motif searching Genome rearrangements Sorting by Reversals Greedy algorithms for sorting by reversals Approximation algorithms Breakpoint Reversal sort

3 Greedy motif searching Developed by Gerald Hertz and Gary Stormo in 1989 CONSENSUS is the tool based on greedy algorithm Faster than Brute force and Simple motif search algorithms An approximation algorithm with an unknown approximation ratio

4 Greedy motif search – Psuedocode

5 Greedy motif search – Steps Input – DNA Sequence, t (# sequences), n (length of one sequence), l (length of motif to search) Output – set of starting points of l-mers Performs an exhaustive search using hamming distance on first two sequences of the DNA Forms a 2 x l seed matrix with the two closest l-mers Scans the rest of t-2 sequences to find the l-mer that best matches the seed and add it to the next row of the seed matrix

6 Complexity Exhaustive search on first two sequences require l(n-l+1) 2 operations which is O(ln 2 ) The sequential scan on t-2 sequences requires l(n-l+1)(t-2) operations which is O(lnt) Thus running time of greedy motif search is O(ln 2 + lnt) If t is small compared to n algorithm behaves O(ln 2 )

7 Consensus tool Greedy motif algorithm may miss the optimal motif Consensus tool saves large number of seed matrices Consensus tool can check sequences in random Consensus tool is less likely to miss the optimal motif

8 Genome rearrangements Gene rearrangements results in a change of gene ordering Series of gene rearrangements can alter genomic architecture of a species 99% similarity between cabbage and turnip genes Fewer than 250 genomic rearrangements since divergence of human and mice

9

10

11 History of Chromosome X Rat Consortium, Nature, 2004

12 Types of Rearrangements Reversal 1 2 3 4 5 61 2 -5 -4 -3 6 Translocation 4 1 2 3 4 5 6 1 2 6 4 5 3 1 2 3 4 5 6 1 2 3 4 5 6 Fusion Fission

13 Greedy algorithms in Gene Rearrangements Biologists are interested in finding the smallest number of reversals in an evolutionary sequence gives a lower bound on the number of rearrangements and the similarity between two species Two greedy algorithms used - Simple reversal sort - Breakpoint reversal sort

14 Gene Order Gene order is represented by a permutation   1  ------  i-1  i  i+1 ------  j-1  j  j+1 -----  n Reversal  ( i, j ) reverses (flips) the elements from i to j in   ( i, j ) ↓  1  ------  i-1  j  j-1 ------  i+1  i  j+1 -----  n

15 Reversal example  = 1 2 3 4 5 6 7 8  (3,5) ↓ 1 2 5 4 3 6 7 8  (5,6) ↓ 1 2 5 4 6 3 7 8

16 Reversal distance problem Goal: Given two permutations, find the shortest series of reversals that transforms one into another Input: Permutations  and  Output: A series of reversals  1,…  t transforming  into  such that t is minimum t - reversal distance between  and  d( ,  ) - smallest possible value of t, given  and 

17 Sorting by reversal Goal : Given a permutation, find a shortest series of reversals that transforms it into the identity permutation. Input: Permutation π Output : A series of reversals  1,…  t transforming  into identity permutation, such that t is minimum

18 Sorting by reversal - Greedy algorithm If sorting permutation  = 1 2 3 6 4 5, the first three elements are already in order so it does not make any sense to break them. The length of the already sorted prefix of  is denoted prefix(  ) – prefix(  ) = 3 This results in an idea for a greedy algorithm: increase prefix(  ) at every step

19 Simple Reversal sort – Psuedocode A very generalized approach leads to analgorithm that sorts by moving ith element to ith position SimpleReversalSort(  ) 1 for i  1 to n – 1 2 j  position of element i in  (i.e.,  j = i) 3 if j ≠i 4    *  (i, j) 5 output  6 if  is the identity permutation 7 return

20 Example – SimpleReversalSort not optimal Input – 612345 612345 ->162345 ->126345 ->123645->123465 - -> 123456 Greedy SimpleReversalSort takes 5 steps where as optimal solution only takes 2 steps 612345 -> 543216 -> 123456 An example of SimpleReversalSort is ‘Pancake Flipping problem’

21 Approximation Ratio These algorithms produce approximate solution rather than an optimal one Approximation ratio is of an algorithm A is given by A(  ) / OPT(  ) – For algorithm A that minimizes objective function (minimization algorithm): max |  | = n A(  ) / OPT(  ) – For maximization algorithm: min |  | = n A(  ) / OPT(  )

22 Breakpoints – A different face of greed In a permutation  =  1  ----  n - if  i and  i+1 are consecutive numbers it is an adjacency - if  i and  i+1 are not consecutive numbers it is a breakpoint Example:  = 1 | 9 | 3 4 | 7 8 | 2 | 6 5 Pairs (1,9), (9,3), (4,7), (8,2) and (2,6) form breakpoints Pairs (3,4) (7,8) and (6,5) form adjacencies b(  ) - # breakpoints in permutation p Our goal is to eliminate all breakpoints and thus forming the identity permutation

23 Breakpoint Reversal Sort – Steps Put two elements  0 =0 and  n + 1 =n+1 at the ends of  Eliminate breakpoints using reversals Each reversal eliminates at most 2 breakpoints This implies reversal distance ≥ #breakpoints/2  = 2 3 1 4 6 5 0 2 3 1 4 6 5 7 b(  ) = 5 0 1 3 2 4 6 5 7 b(  ) = 4 0 1 2 3 4 6 5 7 b(  ) = 2 0 1 2 3 4 5 6 7 b(  ) = 0 Not efficient as it may run forever

24 Psuedocode – Breakpoint reversal Sort BreakPointReversalSort(  ) 1 while b(  ) > 0 2 Among all possible reversals, choose reversal  minimizing b(   ) 3     (i, j) 4 output  5 return

25 Using strips A strip is an interval between two consecutive breakpoints in a permutation Decreasing strip: strip of elements in decreasing order Increasing strip: strip of elements in increasing order 0 1 9 4 3 7 8 2 5 6 10 A single-element strip can be declared either increasing or decreasing. We will choose to declare them as decreasing with exception of the strips with 0 and n+1

26 Reducing breakpoints Choose the decreasing strip with the smallest element k in  Find K-1 in the permutation Reverse the segment between k and k-1 Eg:  = 1 4 6 5 7 8 3 2 0 1 4 6 5 7 8 3 2 9 b(  ) = 5 0 1 2 3 8 7 5 6 4 9 b(  ) = 4 0 1 2 3 4 6 5 7 8 9 b(  ) = 2 0 1 2 3 4 5 6 7 8 9

27 ImprovedBreakpointReversalSort Sometimes permutation may not contain any decreasing strips So an increasing strip has to be reversed so that it becomes a decreasing strip Taking this into consideration we have an improved algorithm ImprovedBreakpointReversalSort(  ) 1 while b(  ) > 0 2 if  has a decreasing strip 3 Among all possible reversals, choose reversal  that minimizes b(   ) 4 else 5 Choose a reversal  that flips an increasing strip in  6     7 output  8 return

28 Example – ImprovedBreakPointSort There are no decreasing strips in , for:  = 0 1 2 | 5 6 7 | 3 4 | 8 b(  ) = 3   (6,7) = 0 1 2 | 5 6 7 | 4 3 | 8 b(  ) = 3  (6,7) does not change the # of breakpoints  (6,7) creates a decreasing strip thus guaranteeing that the next step will decrease the # of breakpoints.

29 Approximation Ratio - ImprovedBreakpointReversalSort Approximation ratio is 4 – It eliminates at least one breakpoint in every two steps; at most 2b(  ) steps – Approximation ratio: 2b(  ) / d(  ) – Optimal algorithm eliminates at most 2 breakpoints in every step: d(  )  b(  ) / 2 – Performance guarantee: ( 2b(  ) / d(  ) )  [ 2b(  ) / (b(  ) / 2) ] = 4

30 References An Introduction to Bioinformatics Algorithms - Neil C.Jones and Pavel A.Pevzner http://bix.ucsd.edu/bioalgorithms/slides.php# Ch5 http://bix.ucsd.edu/bioalgorithms/slides.php# Ch5

31 Questions


Download ppt "Greedy Algorithms CS 6030 by Savitha Parur Venkitachalam."

Similar presentations


Ads by Google