Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press.

Similar presentations


Presentation on theme: "Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press."— Presentation transcript:

1 Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press

2 Three cοmmοn representations There are three common kinds of family representations that come from multiple string comparison: ▫Profile representations ▫Consensus sequence representations ▫Signature representations.

3 Family representations and alignments with profiles Definition: Given a multiple alignment of a set of strings, a profile for that multiple alignment specifies for each column the frequency that each character appears in the column. A profile is sometimes also called a weight matrix in the biological literature.

4 How to optimally align a string to a profile Definition: For a character y and column j, let p(y,j) be the frequency that character y appears in column j of the profile, and let S(x,j) denote the score for aligning x with column j. Let V(i,j) denote the value of the optimal alignment of substring S[1..i] with the first j columns of C

5 Signature representations οf families The major collections of signatures in protein are the ΡROSΙTE database and the BLOCKS database derived from it. Helicases are proteins that help unwind double-stranded DNΑ so that the DNA can be read for duplication, transcription, recombination, οr repair. Α large fraction of the available information on the structure and possible functions of the helicases has been obtained by computer- assisted comparative analysis of their amino acid sequences. This approach has led to the delineation of motifs and patterns that are conserved in different subsets of the helicases.

6 Introduction to computing multiple string alignments Definition: Given a set of k>2 strings S={S 1,S 2,..,S k }, a local multiple alignment of S is obtained by selecting one substring S i ’ from each string and then globally aligning those substrings

7 How to score multiple alignments Definition: Given a multiple alignment M, the induced pairwise alignment of two strings S i and S j is obtained from M by removing all rows except the two rows for S i and S j. That is, the induced alignment is the multiple alignment M restricted to S i and S j. Any two opposing spaces in that induced alignment can be removed if desired. Definition: The score of an induced pairwise alignment is determined using any chosen scoring scheme for two-string alignment in the standard manner.

8 Multiple alignment with the sum-of- pairs (SP) objective function Definition: The sum of pairs (SP) score of a multiple alignment M is the sum of the scores of pairwise global alignments induced by M. The SΡ alignment problem Compute a global multiple alignment M with minimum sιm-of- pairs score.

9 An exact solution to the SP alignment problem Definition: Let S 1, S 2 and S 3 denote three strings of lengths n 1, n 2 and n 3, respectively, and let D(i,j,k) be the optimal SP score for aligning S 1 [1..i], S 2 [1..j] and S 3 [1..k]. The score for a match, mismatch, or space is specified by the variables smatch, smis, and sspace, respectively.

10 Recurrences fοr a nonbοundary cell(i, j) For i=1 to n 1 do For j=l to n 2 do For k=l to n 3 do begin if (S 1 (i) = S 2 (j)) then cij = smatch else c ij = smis; if (S 1 (i) = S 3 (k)) then cik= smatch else c ik = smis; if (S 2 (j) = S 3 (k)) then cjk= smatch else ι jk := smis; d 1 = D(i-1, j-1, k-1) + cij + cik + cjk; d 2 = D(i-1, j-1,k) + cij + 2*sspace; d 3 = D(i- 1, j, k- 1) + cik + 2xsspace; d 4 = D(i, j- 1,k-1) + cjk + 2*sspace; d 5 = D(i-1, j, k) + 2*sspace; d 6 = D(i, j- 1, k) + 2*sspace; d 7 = D(i, j, k- 1) + 2*sspace; D(i, j, k) :: Min[d1, d2, d3, d4, d5, d6, d7]; end;

11 A speedup for the exact solution Definition: Let d 1,2 (i,j) be the edit distance between suffixes S 1 [l..n] and S 2 [j..n] of strings S 1 and S 2. Define d 1,3 (i,k) and d 2,3 (j,k) analogously. Key idea Recall that D(i, j,k) is the optimal SP score for aligning S1[1..i], S2[1.. j],and S3[1..k). If D(i, j, k) + d1,2(i, j) + d1,3(i, k) + d2,3( j, k) is greater than z then node (i, j, k) cannot be on any optimal path and so (in a forward computation) D(i, j, k) need not be sent forward to any cell.


Download ppt "Multiple Alignment – Υλικό βασισμένο στο κεφάλαιο 14 του βιβλίου: Dan Gusfield, Algorithms on Strings, Trees and Sequences, Cambridge University Press."

Similar presentations


Ads by Google