Sequence Alignment Oct 9, 2002 Joon Lee Genomics & Computational Biology
Genomics & Computational Biology2 Dynamic Programming Optimization problems: find the best decision one after another Subproblems are not independent Subproblems share subsubproblems Solve subproblem, save its answer in a table
Genomics & Computational Biology3 Four Steps of DP 1.Characterize the structure of an optimal solution 2.Recursively define the value of an optimal solution 3.Compute the value of an optimal solution in a bottom-up fashion 4.Construct an optimal solution from computed information
Genomics & Computational Biology4 Sequence Alignment Sequence 1: G A A T T C A G T T A Sequence 2: G G A T C G A
Genomics & Computational Biology5 Align or insert gap G A A T T C A G T T A | | | G G A _ T C _ G _ _ A G _ A A T T C A G T T A | | | G G _ A _ T C _ G _ _ A
Genomics & Computational Biology6 Three Steps of SA 1.Initialization: gap penalty 2.Scoring: matrix fill 3.Alignment: trace back
Genomics & Computational Biology7 Step 1: Initialization GAATTCAGTTA G -2 G -4 A -6 T -8 C -10 G -12 A -14
Genomics & Computational Biology8 Step 2: Scoring A = a 1 a 2 …a n, B = b 1 b 2 …b m S ij : score at (i,j) s(a i b j ) : matching score between a i and b j w : gap penalty figure source
Genomics & Computational Biology9 Step 2: Scoring Match: +2 Mismatch: -1 Gap: -2
Genomics & Computational Biology10 Step 2: Scoring GAATTCAGTTA G -22 G -4 A -6 T -8 C -10 G -12 A = (-2) = = (-2) = -4
Genomics & Computational Biology11 Step 2: Scoring GAATTCAGTTA G -220 G -4 A -6 T -8 C -10 G -12 A (-1) = (-2) = (-2) = (-1) = (-2) = (-2) = 0
Genomics & Computational Biology12 Step 2: Scoring GAATTCAGTTA G -220 G -40 A -6 T -8 C -10 G -12 A = (-2) = (-2) = = (-2) = (-2) = -6
Genomics & Computational Biology13 Step 2: Scoring GAATTCAGTTA G G A T C G A
Genomics & Computational Biology14 Step 3: Trace back GAATTCAGTTA G G A T C G A
Genomics & Computational Biology15 Step 3: Trace back G A A T T C A G T T A G G A _ T C _ G _ _ A G A A T T C A G T T A G G A T _ C _ G _ _ A
Genomics & Computational Biology16 Excercise GCATCCG G A T C G Match: +2 Mismatch: -1 Gap: -2
Genomics & Computational Biology17 Excercise GCATCCG G A T C G Match: +2 Mismatch: -1 Gap: -2 G C A T C C G G A T C G
Genomics & Computational Biology18 Amino acids Match/mismatch → Substitution matrix
Genomics & Computational Biology19 Global & Local alignment Global: Needlman-Wunsch Algorithm Local: Smith-Waterman Algorithm From Mount Bioinformatics Chap 3
Genomics & Computational Biology20 References Sequence alignment with Java applet –