Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignment 11/24/2018.

Similar presentations


Presentation on theme: "Sequence Alignment 11/24/2018."— Presentation transcript:

1 Sequence Alignment 11/24/2018

2 Motivation: Types Two sequences of same length, some characters are different (Database search) Aagtacggaga aagcaccgaga Two seq are of different length, possible gaps in one of them (Database search) Aaccaccgaga Aa-caccgaga 11/24/2018

3 Motivation: Types Match longest prefix of one with the suffix of the other (fragment assembly) Aaacgtcgata gatacgatg Local alignment: longest substring matching over two sequences (homolog search) Gatacgatgctagtttacg agagcgatgcataattcgaatga 11/24/2018

4 Motivation: Types Multiple sequence alignment
(page 71) (Comparative studies of sequences) 11/24/2018

5 Formalizing sequence comparison
Either a character matches with the corresponding character in an an alignment (+1), Or, it does not (-1), Or, a gap needs to be inserted (-2) 11/24/2018

6 Global Alignment Smith-Waterman (1981) Dynamic programming algorithm
Scoring matrix for alignment ( p 31) Initializing boundaries of the scoring matrix for gaps in front of either string Meaning of an entry to the matrix Corner element is the final score 11/24/2018

7 Global Alignment Three alternatives in each iteration
Ordering of calculation: row or column-wise The algorithm (p 52) Recursive recovery process from corner element (constant m and n, the string lengths) Variable len returned by the algorithm Convention for tie braking 11/24/2018

8 Local alignment Alignment will stop anywhere
So, the min score is zero, even on boundaries Best local alignment is where the score is max in the matrix Recovery starts from that max value, stops at a zero value 11/24/2018

9 Semi-global (as-required alignment) alignment
Four alternatives: penalty-less gaps in front of string s, in front of t, at the back of s, back of t) Prefix-suffix matching by playing with alternatives E.g., suffix of s with prefix of t: gaps at the back of s but in the front of t 11/24/2018

10 Semi-global alignment
Example: p 56 Gaps in front: zeros in row or column representing the string Gaps at the back: recovery starts from the max of row or column representing the string Above may be combined as required Exercise: how to combine for matching suffix of s with prefix of t 11/24/2018

11 Generalized gap penalty
Multiple gaps with the same penalty as that of one or by some formula w(k) Each block matching gaps is to be considered as one unit (like a char) Boundary (first row and col) initialization with w(k) 11/24/2018

12 Generalized gap penalty
Three matrices interplaying: one for character matching with p(I,j) One for gaps in s One for gaps in t Formula on p 63 11/24/2018

13 Affine gap penalty Generalized gap penalty, with
W(k) = h + gk, first gap costs more h+g Formula changes slightly with known w(k) block gap-matrices compares only previous elements: complexity reduces 11/24/2018

14 Multiple sequence alignment
Function for each column: character or gap for each sequence Combinatorics: 2^k –1, for k sequences (-1 for not putting gaps in all columns) But . . . 11/24/2018

15 Multiple sequence alignment
Order of arguments for the function should not matter: f(I,-,v) = f(I,v,-) Score pairwise on a column Combinatorics: (k choose 2) For k=10, 2^k-1 = 1111, kC2=45 We need gap to gap scoring now 11/24/2018

16 Multiple sequence alignment
Total score can be measured either way: Sum over all columns, Or, Sum over all pairs of sequences If p(-, -) = 0, then both the scoring above is same 11/24/2018

17 Multiple sequence alignment
Consider 3 sequence alignment s1, s2, and s3 (I, j, k)-th entry of the scoring matrix is for aligning s1[1..I], s2[1..j], s3[1..k] 3D matrix (n x m x l) dimension, for |s1|=n, |s2|=m, |s3|=l 11/24/2018

18 Multiple sequence alignment
Each entry in scoring matrix will be at a corner of a 3D box Optimal score is calculated over all other 7 corners (max): A[I-1, j,k], A[I, j-1, k], A[I,j, k-1], A[I-1, j-1, k], A[I-1, j, k-1], A[I, j-1, k-1], A[I-1, j-1, k-1] [Vector(I,j,k) - bit-vector] In each case sum-of-pair scores are to be added for the column [EXAMPLE] Initialization: (-4)I 1<=I<=n, for two gaps against substrings of s1, likewise for s2 and s3 11/24/2018

19 Multiple sequence alignment
For k sequences, k-dimensional matrix Each entry is a calculation over 2^k –1 other corners of the “box” Formula page 72 11/24/2018

20 Alignment improvements
Alignment could be from the back also: S[I+1..n], t[j+1..m] Front and back alignment could be combined to “cut” alignment: compute the two matrices, add them, align according to the added matrix 11/24/2018

21 Alignment improvements
When the length of two sequences are comparable and expectation is to have good global alignment: Retrieval is mostly along the diagonal Computation can focus around a strip (fixed (k) number) around diagonal: k-band More efficient Usage of relevant cells only 11/24/2018

22 Multiple sequence alignment: Star alignment
One sequence at center: all others are pairwise aligned against it Which sequence to put at the center? Try each: create a 2D similarity matrix for all pairs, pick up the best (least of summed) row [page 79] 11/24/2018

23 Multiple sequence alignment: Tree alignment
A spanning tree out of the sequences: nodes are sequences Each edge labels the similarity between pair of nodes Total tree cost, or aggregate over edges should be max Star is a special tree 11/24/2018

24 PAM matrix for matching residues
11/24/2018

25 BLAST search engine 11/24/2018


Download ppt "Sequence Alignment 11/24/2018."

Similar presentations


Ads by Google