Intro to Alignment Algorithms: Global and Local Intro to Alignment Algorithms: Global and Local Algorithmic Functions of Computational Biology Professor Istrail
Sequence Comparison Biomolecular sequences DNA sequences (string over 4 letter alphabet {A, C, G, T}) RNA sequences (string over 4 letter alphabet {ACGU}) Protein sequences (string over 20 letter alphabet {Amino Acids}) Sequence similarity helps in the discovery of genes, and the prediction of structure and function of proteins. Algorithmic Functions of Computational Biology Professor Istrail
The Basic Similarity Analysis Algorithm Global Similarity Scoring Schemes Edit Graphs Alignment = Path in the Edit Graph The Principle of Optimality The Dynamic Programming Algorithm The Traceback Algorithmic Functions of Computational Biology – Professor Istrail
Jupiter’s code: Alignment Massa Master Mas-sa Master Mass-a Master Massa- Master Algorithmic Functions of Computational Biology Professor Istrail
Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: GCGCATTTGAGCGA TGCGTTAGGGTGACCA A possible alignment: - GCGCATTTGAGCGA - - TGCG - - TTAGGGTGACC match mismatch indel Algorithmic Functions of Computational Biology – Professor Istrail
Consider two sequences Over the alphabet belong to Algorithmic Functions of Computational Biology – Professor Istrail
Scoring Schemes Unit-score AC G T A C G T Algorithmic Functions of Computational Biology – Professor Istrail
Alignment ACG | | | AGG Score =(A,A)(C,G)(G,G) ++ = = 2 Unit-cost A|AA|A A is aligned with A C|GC|G C is aligned with G G is aligned with G G|GG|G Algorithmic Functions of Computational Biology – Professor Istrail
Gaps ACATGGAAT ACAGGAAAT ACAT GG - AAT ACA - GG AAAT OPTIMAL ALIGNMENTS SCORE78 AAAGGG GGGAAA SCORE AAAGGG GGGAAA “-” is the gap symbol Algorithmic Functions of Computational Biology – Professor Istrail
(x,y) = the score for aligning x with y (-,y) = the score for aligning - with y (x,-) = the score for aligning x with - Algorithmic Functions of Computational Biology – Professor Istrail
A-CG - G ATCGTG Alignment Score (A,A) +(G,G) +(C,C) +(-,T) + (G,G) THE SUM OF THE SCORES OF THE PAIRWISE ALIGNED SYMBOLS Algorithmic Functions of Computational Biology – Professor Istrail
Scoring Scheme Dayhoff score... PTIPLSRLFDNAMLRAHRLHQ SAIENQRLFNIAVSRVQHLHL Partial alignment for Monkey and Trout somatotropin proteins - A R N D C Q E G H I L K M F P S T W Y V ARNDARND 6 4 Algorithmic Functions of Computational Biology – Professor Istrail
Scoring Functions Scoring function = a sum of a terms each for a pair of aligned residues, and for each gap The meaning = log of the relative likelihood that the sequences are related, compared to being unrelated Identities and conservative substitutions are Positive terms Non-conservative substitutions are Negative terms Algorithmic Functions of Computational Biology – Professor Istrail
The Edit Graph Suppose that we want to align AGT with AT We are going to construct a graph where alignments between the two sequences correspond to paths between the begin and and end nodes of the graph. This is the Edit Graph Algorithmic Functions of Computational Biology – Professor Istrail
AGT has length 3 AT has length 2 The Edit graph has (3+1)*(2+1) nodes The sequence AGT The sequence AT Algorithmic Functions of Computational Biology – Professor Istrail
A G T A T AGT indexes the columns, and AT indexes the rows of this “table” Algorithmic Functions of Computational Biology – Professor Istrail
A G T A T The Graph is directed. The nodes (i,j) will hold values. Algorithmic Functions of Computational Biology – Professor Istrail
T A G A T Algorithmic Functions of Computational Biology Professor Istrail
T A T A G A-A- -A-A AAAA -A-A -A-A -A-A -A-A G-G- A-A- A-A- G-G- G-G- T-T- T-T- T-T- -T-T -T-T -T-T -T-T ATAT GTGT TTTT GAGA TATA Directed edges get as labels pairs of aligned letters. Algorithmic Functions of Computational Biology – Professor Istrail
Alignment = Path in the Edit Graph T A T A G A-A- -A-A AAAA -A-A -A-A -A-A -A-A G-G- A-A- A-A- G-G- G-G- T-T- T-T- T-T- -T-T -T-T -T-T -T-T ATAT GTGT TTTT GAGA TATA AGT A-T Every path from Begin to End corresponds to an alignment Every alignment corresponds to a path between Begin and End Algorithmic Functions of Computational Biology – Professor Istrail
The Principle of Optimality The optimal answer to a problem is expressed in terms of optimal answer for its sub-problems Algorithmic Functions of Computational Biology – Professor Istrail
Dynamic Programming Part 1: Compute first the optimal alignment score Part 2: Construct optimal alignment We are looking for the optimal alignment = maximal score path in the Edit Graph from the Begin vertex to the End vertex Given: Two sequences X and Y Find: An optimal alignment of X with Y Algorithmic Functions of Computational Biology – Professor Istrail
The DP Matrix S(i,j) A G T A T S(2,1) S(1,0) Algorithmic Functions of Computational Biology – Professor Istrail
The DP Matrix Matrix S =[S(i,j)] S(i,j) = The score of the maximal cost path from the Begin Vertex and the vertex (i,j) (i,j) (i,j-1) (i-1,j) (i-1,j-1) The optimal path to (i,j) must pass through one of the vertices (i-1,j-1) (i-1,j) (i,j-1) Algorithmic Functions of Computational Biology – Professor Istrail
Opt path (i,j) (i,j-1) (i-1,j) (i-1,j-1) Optimal path to (i-1,j) + (-, yj) - xi yj - S(i-1,j) + (-, yj) Algorithmic Functions of Computational Biology – Professor Istrail
Optimal path (i,j) (i-1,j) (i,j-1) (i-1,j-1) Optimal path to (i-1,j-1) + (xi,yj) S(i-1,j-1) + (xi, yj) Algorithmic Functions of Computational Biology – Professor Istrail
Optimal path (i,j) (i,j-1) (I-1,j) (i-1,j-1) Optimal path to (i,j-1) + (xi,-) S(i,j-1) + (xi, -) Algorithmic Functions of Computational Biology – Professor Istrail
The Basic ALGORITHM S(i,j) = S(i-1, j-1) + (xi, yj) S(i-1, j) + (xi, -) S(i, j-1) + (-, yj) MAX Algorithmic Functions of Computational Biology – Professor Istrail
T A T A G A-A- -A-A AAAA -A-A -A-A -A-A -A-A G-G- A-A- A-A- G-G- G-G- T-T- T-T- T-T- -T-T -T-T -T-T -T-T ATAT GTGT TTTT GAGA TATA AGT A - T Optimal Alignment Optimal Alignment and Tracback Algorithmic Functions of Computational Biology – Professor Istrail
S(i,j) = S(i-1, j-1) + (xi, yj), S(i-1, j) + (xi, -), S(i, j-1) + (-, yj) MAX 0, We add this The Basic ALGORITHM: Local Similarity Algorithmic Functions of Computational Biology – Professor Istrail
General Scoring Schemes 1. Independence of mutations at different sites Additive scoring scheme 2. Gaps of any length are considered one mutation All of the efficient alignment algorithms -- employing on the dynamic programming method --are based fundamentally on the of the fact that the scoring function is additive. Assumptions Algorithmic Functions of Computational Biology – Professor Istrail
Substitutions Matrices belong to Consider ungapped alignment of equal length sequences Compute the probability that the two sequences are related Compute the probability that the two sequences are not related Compute the ratio of the two probabilities Algorithmic Functions of Computational Biology Professor Istrail
Random Model R Every letter z occurs independently with probability q z Algorithmic Functions of Computational Biology - Course 3 Professor Istrail
Match Model M Aligned pairs of residues occur with joint probability abab p ab Algorithmic Functions of Computational Biology - Course 3 Professor Istrail
= log = i where Log-odds ratio s(a,b) = the substitution matrix Algorithmic Functions of Computational Biology - Course 3 Professor Istrail