Download presentation
1
Alignment II Dynamic Programming
2
Pair-wise sequence alignments
Idea: Display one sequence above another with spaces inserted in both to reveal similarity A: C A T - T C A - C | | | | | B: C - T C G C A G C
3
Two types of alignment S = CTGTCGCTGCACG T = TGCCGTG Global alignment
Local alignment CTGTCG-CTGCACG -TGC-CG-TG---- CTGTCGCTGCACG-- TGC-CGTG
4
Global alignment: Scoring
CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches: Mismatch penalty: Space penalty: score(A) = w – x - y w = #matches x = #mismatches y = #spaces
5
Global alignment: Scoring
Reward for matches: 10 Mismatch penalty: 2 Space penalty: 5 C T G T C G – C T G C - T G C – C G – T G - Total = 11
6
Optimum Alignment The score of an alignment is a measure of its quality Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score The similarity between X and Y, denoted sim(X,Y), is the maximum score of an alignment of X and Y
7
Alignment algorithms Global: Needleman-Wunsch Local: Smith-Waterman
NW and SW use dynamic programming Variations: Gap penalty functions Scoring matrices
8
Global Alignment: Algorithm
9
Theorem. C(i,j) satisfies the following relationships:
Initial conditions: Recurrence relation: For 1 i n, 1 j m:
10
Justification S1 S2 . . . Si-1 Si T1 T2 . . . Tj-1 Tj
C(i-1,j-1) + w(Si,Tj) C(i-1,j) S1 S Si — T1 T Tj-1 Tj C(i,j-1)
11
Example Case 1: Line up Si with Tj i - 1 i S: C A T T C A C
T: C - T T C A G j -1 j Case 2: Line up Si with space i - 1 i S: C A T T C A - C T: C - T T C A G - j Case 3: Line up Tj with space i S: C A T T C A C - T: C - T T C A - G j -1 j
12
Computation Procedure
C(i-1,j) C(i-1,j-1) C(i,j-1) C(i,j) C(n,m)
13
λ C T C G C A G C λ -5 -10 -15 -20 -25 -30 -35 10 5 C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A C +10 for match, -2 for mismatch, -5 for space
14
λ C T C G C A G C λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 * C A T T C A C Traceback can yield both optimum alignments
15
End-gap free alignment
Gaps at the start or end of alignment are not penalized Match: +2 Mismatch and space: -1 Best global Best end-gap free Score = 1 Score = 9
16
Motivation: Shotgun assembly
Shotgun assembly produces large set of partially overlapping subsequences from many copies of one unknown DNA sequence. Problem: Use the overlapping sections to ”paste” the subsequences together. Overlapping pairs will have low global alignment score, but high end-space free score because of overlap.
17
Motivation: Shotgun assembly
18
Algorithm Same as global alignment, except:
Initialize with zeros (free gaps at start) Locate max in the last row/column (free gaps at end)
19
λ C T C G C A G C λ C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A G +10 for match, -2 for mismatch, -5 for gap
20
Local Alignment: Motivation
Ignoring stretches of non-coding DNA: Non-coding regions are more likely to be subjected to mutations than coding regions. Local alignment between two sequences is likely to be between two exons. Locating protein domains: Proteins of different kind and of different species often exhibit local similarities Local similarities may indicate ”functional subunits”.
21
Local alignment: Example
S = g g t c t g a g T = a a a c g a Match: +2 Mismatch and space: -1 Best local alignment: g g t c t g a g a a a c – g a - Score = 5
22
Local Alignment: Algorithm
C [i, j] = Score of optimally aligning a suffix of s with a suffix of t. Initialize top row and leftmost column to zero.
23
λ C T C G C A G C λ 1 2 C A T T C A C +1 for a match, -1 for a mismatch, -5 for a space
24
Some Results Most pairwise sequence alignment problems can be solved in O(mn) time. Space requirement can be reduced to O(m+n), while keeping run-time fixed [Myers88]. Highly similar sequences can be aligned in O(dn) time, where d measures the distance between the sequences [Landau86].
25
Reducing space requirements
O(mn) tables are often the limiting factor in computing large alignments There is a linear space technique that only doubles the time required [Hirschberg77]
26
λ C T C G C A G C λ C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A G IDEA: We only need the previous row to calculate the next
27
Linear-space Alignments
mn + ½ mn + ¼ mn + 1/8 mn + 1/16 mn + … = 2 mn
28
Affine Gap Penalty Functions
Gap penalty = h + gk where k = length of gap h = gap opening penalty g = gap continuation penalty Can also be solved in O(nm) time using dynamic programming
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.