Final presentation Final presentation Tandem Cyclic Alignment
Sequence Alignment Needleman-Wunch Algorithm – global alignment, fixed gap penalty Waterman-Smith-Beyer Algorithm – local alignment, affine gap penalty function Gotoh ’ s algorithm – local alignment, affine gap penalty function
Needleman-Wunch Algorithm (Global Alignment)
Waterman-Smith-Beyer Algorithm (Local Alignment)
Goth ’ s Algorithm – (Local Alignment) Consider the gapless sequences a and b. Let g(k) = k be an affine gap penalty function and let w(a i,b j ) be a cost function. D is the distance matrix. P is the matrix with the minimal distances for all alignments with b o ending in a gap. Q is the matrix with the minimal distances for all alignments with a o ending in a gap.
Gotoh ’ s Algorithm Uses dynamic programming with three matrices (instead of 1). Traceback – need to track movement through all three matrices.
Tandem Repeats Tandem repeats are a special class of repeats with very short repeat units. Each repeat unit is frequently of a few nucleotides long. For example, one tandem repeat in human comprises of hundreds of copies of a 6-nucleotide repeat TTAGGG. These are often called microsatellites. In eukaryotic genomes, repeats with longer repeating units of up to 25 nucleotides (called minisatellites) are also abundant. They are located mostly in non- transcribed regions.
Finding Tandem Repeats A straightforward approach to look for tandem repeats with repeat unit of length k is to look for consecutive exact occurrences of a pattern of length k. This can be accomplished efficiently. However, it is often the case that some of the repeat units are mutated. We will need to allow for mismatches when looking for these imperfect repeats. It becomes much more difficult to obtain an efficient algorithm as the number of mismatches allowed increases.
Finding Tandem Repeats by Alignment If the dominating repeating pattern is known, another way to locate imperfect repeats is by solving the following alignment problem: Let p be a pattern of length m (repeat unit) and s be a sequence of length n (search string). Let p n be the concatenation of p with itself n times. Finding an imperfect tandem repeat is equivalent to finding an optimal local alignment between p n and s. … ppp s
Local alignment S P P
Wraparound Method Wraparound Method O(mn) When aligning a sequence with tandem repeats, use the ‘ wrap around ’ method to minimize calculations. When implementing the wrap around method, look at the section with tandem repeats separately. Write the repeated sequence only once in the similarity matrix. Align as usual except when reaching the end of the repeated sequence, use that value as the first value in the next row and repeat this procedure.
Wraparound Method
Wraparound Algorithm When developing a dynamic programming implementation for the wraparound algorithm, there is a problem with determining the Q matrix. In order to define Q i,1, it is necessary to know Q i,|b|. Hence, there must be two passes to correctly detemine Q
Wraparound Method
Cyclic global alignment O(n 2 m) Given sequences X and Y – –Find the best scoring alignment of X [i] vs Y over all possible i, 1<=i<=|X|,where all of Y and exactly one whole (cyclically permuted) copy of X must occur in the alignment. Y X
The Maes algorithm for cyclic global alignment O(nmlog n)
Non-crossing alignments
Tandem Cyclic Alignment Y X*
An example
No alignment crosses “the same" alignment more than once
Proof
O(nmlog n). CC+1C-1 Y XXXXXXX
Bounded wraparound dynamic programming