#5 - Dynamic Programming

Slides:



Advertisements
Similar presentations
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Advertisements

Pairwise Sequence Alignment
Measuring the degree of similarity: PAM and blosum Matrix
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Sequence Alignment.
Lecture 8 Alignment of pairs of sequence Local and global alignment
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Developing Pairwise Sequence Alignment Algorithms
Bioiformatics I Fall Dynamic programming algorithm: pairwise comparisons.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Pairwise & Multiple sequence alignments
Protein Sequence Alignment and Database Searching.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
BCB 444/544 F07 ISU Dobbs #4 - Sequence Alignment
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
8/31/07BCB 444/544 F07 ISU Dobbs #6 - More DP: Global vs Local Alignment1 BCB 444/544 Lecture 6 Try to Finish Dynamic Programming Global & Local Alignment.
8/31/07BCB 444/544 F07 ISU Dobbs #6 - Scoring Matrices & Alignment Stats1 BCB 444/544 Lecture 6 Finish Dynamic Programming Scoring Matrices Alignment Statistics.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Sequence similarity, BLAST alignments & multiple sequence alignments
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
#8 Finish DP, Scoring Matrices, Stats & BLAST
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Intro to Alignment Algorithms: Global and Local
Sequence Based Analysis Tutorial
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
BCB 444/544 Lecture 9 Finish: Scoring Matrices & Alignment Statistics
Pairwise Alignment Global & local alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
It is the presentation about the overview of DOT MATRIX and GAP PENALITY..
Presentation transcript:

#5 - Dynamic Programming BCB 444/544 8/29/07 Lecture 5 Dynamic Programming #5_Aug29 Revised 8/30 BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Required Reading (before lecture) #5 - Dynamic Programming Required Reading (before lecture) 8/29/07 Mon Aug 27 - for Lecture #4 Pairwise Sequence Alignment Chp 3 - pp 31-41 Wed Aug 29 - for Lecture #5 Dynamic Programming Eddy: What is Dynamic Programming? 2004 Nature Biotechnol 22:909 http://www.nature.com/nbt/journal/v22/n7/abs/nbt0704-909.html Thurs Aug 30 - Lab #2: Databases, ISU Resources & Pairwise Sequence Alignment Fri Aug 31 - for Lecture #6 Scoring Matrices and Alignment Statistics Chp 3 - pp 41-49 BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Review: Chp 2- Biological Databases #5 - Dynamic Programming 8/29/07 Review: Chp 2- Biological Databases Xiong: Chp 2 Introduction to Biological Databases What is a Database? Types of Databases Biological Databases Pitfalls of Biological Databases Information Retrieval from Biological Databases BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Types of Databases 8/29/07 3 Major types of electronic databases: Flat files - simple text files no organization to facilitate retrieval Relational - data organized as tables ("relations") shared features among tables allows rapid search Object-oriented - data organized as "objects" objects associated hierarchically BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Examples of Biological Databases #5 - Dynamic Programming Examples of Biological Databases 8/29/07 1- Primary DNA sequences GenBank - USA European Molecular Biology Lab - EMBL DNA Data Bank of Japan - DDBJ Structures (Protein, DNA, RNA) PDB - Protein Data Bank NDB - Nucleic Acid Data Bank BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Examples of Biological Databases #5 - Dynamic Programming Examples of Biological Databases 8/29/07 2- Secondary Protein sequences Swiss-Prot, TreEMBL, PIR these recently combined into UniProt 3- Specialized Species-specific (or "taxonomic" specific) Flybase, WormBase, AceDB, PlantDB Molecule-specific, disease-specific See: http://www.oxfordjournals.org/nar/database/c/ BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

SUMMARY: #2- Biological Databases #5 - Dynamic Programming 8/29/07 SUMMARY: #2- Biological Databases BEWARE! BONUS POINT (1 added to semester total) -- Who is Icelandic scientist who founded company that collected DNA from all Icelandic people? What is name of company? SEND URL Who was that Icelandic fellow? BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Chp 3- Sequence Alignment #5 - Dynamic Programming 8/29/07 Chp 3- Sequence Alignment SECTION II SEQUENCE ALIGNMENT Xiong: Chp 3 Pairwise Sequence Alignment Evolutionary Basis Sequence Homology versus Sequence Similarity Sequence Similarity versus Sequence Identity Methods Scoring Matrices Statistical Significance of Sequence Alignment Adapted from Brown and Caragea, 2007, with some slides from: Altman, Fernandez-Baca, Batzoglou, Craven, Hunter, Page. BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Motivation for Sequence Alignment #5 - Dynamic Programming Motivation for Sequence Alignment 8/29/07 "Sequence comparison lies at the heart of bioinformatics analysis." Jin Xiong Sequence comparison is important for drawing functional & evolutionary inferences re: new genes/proteins Pairwise sequence alignment is fundamental; it used to: Search for common patterns of characters Establish pair-wise correspondence between related sequences Pairwise sequence alignment is basis for: Database searching (e.g., BLAST) Multiple sequence alignment (MSA) BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 Homology Homology has a very specific meaning in evolutionary & computational biology - & term is often used incorrectly For us: Homology = similarity due to descent from a common evolutionary ancestor But, HOMOLOGY ≠ SIMILARITY When 2 sequences share a sufficiently high degree of sequence similarity (or identity), we may infer that they are homologous We can infer homology from similarity (can't prove it!) BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 Orthologs vs Paralogs 2 types of homologous sequences: Orthologs - "same genes" in different species; result of common ancestry corresponding proteins have "same" functions (e.g., human -globin & mouse -globin) Paralogs - "similar genes" within a species; result of gene duplication events proteins may (or may not) have similar functions (e.g., human -globin & human -globin) A A is the parent gene Speciation leads to B & C Duplication leads to C’ Speciation Duplication B and C are Orthologous C and C’ are Paralogous B C C' BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Sequence Homology vs Similarity #5 - Dynamic Programming Sequence Homology vs Similarity 8/29/07 Homologous sequences - sequences that share a common evolutionary ancestry Similar sequences - sequences that have a high percentage of aligned residues with similar physicochemical properties (e.g., size, hydrophobicity, charge) IMPORTANT: Sequence homology: An inference about a common ancestral relationship, drawn when two sequences share a high enough degree of sequence similarity Homology is qualitative Sequence similarity: The direct result of observation from a sequence alignment Similarity is quantitative; can be described using percentages BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Sequence Similarity vs Identity #5 - Dynamic Programming Sequence Similarity vs Identity 8/29/07 For nucleotide sequences (DNA & RNA), sequence similarity and identity have the "same" meaning: Two DNA sequences can share a high degree of sequence identity (or similarity) -- means the same thing Drena's opinion: Always use "identity" when making quantitative comparisons re: DNA or RNA sequences (to avoid confusion!) For protein sequences, sequence similarity and identity have different meanings: Identity = % of exact matches between two aligned sequences Similarity = % of aligned residues that share similar characteristics (e.g, physicochemical characteristics, structural propsensities, evolutionary profiles) Drena's opinion: Always use "identity" when making quantitative comparisons re: protein sequences (to avoid confusion!) BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

What is Sequence Alignment? #5 - Dynamic Programming 8/29/07 What is Sequence Alignment? Given 2 sequences of letters, and a scoring scheme for evaluating matching letters, find an optimal pairing of letters in one sequence to letters of other sequence. Align: 1: THIS IS A RATHER LONGER SENTENCE THAN THE NEXT. 2: THIS IS A SHORT SENTENCE. 1: THIS IS A RATHER LONGER SENTENCE THAN THE NEXT. 2: THIS IS A ######SHORT###SENTENCE##############. OR 2: THIS IS A ##SHORT###SENT#EN###CE##############. Is one of these alignments "optimal"? Which is better? BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Goal of Sequence Alignment #5 - Dynamic Programming Goal of Sequence Alignment 8/29/07 Find the best pairing of 2 sequences, such that there is maximum correspondence between residues DNA 4 letter alphabet (+ gap) TTGACAC TTTACAC Proteins 20 letter alphabet (+ gap) RKVA-GMA RKIAVAMA BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Statement of Problem 8/29/07 Given: 2 sequences Scoring system for evaluating match (or mismatch) of two characters Penalty function for gaps in sequences Find: Optimal pairing of sequences that: Retains the order of characters Introduces gaps where needed Maximizes total score BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Types of Sequence Variation #5 - Dynamic Programming Types of Sequence Variation 8/29/07 Sequences can diverge from a common ancestor through various types of mutations: Substitutions ACGA  AGGA Insertions ACGA  ACCGA Deletions ACGA  AGA Insertions or deletions ("indels") result in gaps in alignments Substitutions result in mismatches No change? match BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Gaps 8/29/07 Indels of various sizes can occur in one sequence relative to the other e.g., corresponding to a shortening of the polypeptide chain in a protein BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Avoiding Random Alignments with a Scoring Function #5 - Dynamic Programming 8/29/07 Avoiding Random Alignments with a Scoring Function Introducing too many gaps generates nonsense alignments: s--e-----qu---en--ce sometimesquipsentice Need to distinguish between alignments that occur due to homology and those that occur by chance Define a scoring function that rewards matches (+) and penalizes mismatches (-) and gaps (-) Scoring Function (S): e.g. Match:  1 Mismatch:  2 Gap:  1 S = (#matches) - (#mismatches) - (#gaps) BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Not All Mismatches are the Same #5 - Dynamic Programming Not All Mismatches are the Same 8/29/07 Some amino acids are more "exchangeable" than others (physicochemical properties are similar) e.g., Ser & Thr are more similar than Trp & Ala Substitution matrix can be used to introduce "mismatch costs" for handling different types of substitutions Mismatch costs are not usually used in aligning DNA or RNA sequences, because no substitution is "better" than any other (in general) Draw Ser vs Thr & Trp vs Ala BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Substitution Matrix 8/29/07 s(a,b) corresponds to score of aligning character a with character b Match scores are often calculated based on frequency of mutations in very similar sequences (more details later) BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Methods 8/29/07 Global and Local Alignment Alignment Algorithms Dot Matrix Method Dynamic Programming Method Gap penalities DP for Global Alignment DP for Local Alignment Scoring Matrices Amino acid scoring matrices PAM BLOSUM Comparisons between PAM & BLOSUM Statistical Significance of Sequence Alignment BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Global vs Local Alignment #5 - Dynamic Programming 8/29/07 Global vs Local Alignment Global alignment Finds best possible alignment across entire length of 2 sequences Aligned sequences assumed to be generally similar over entire length Local alignment Finds local regions with highest similarity between 2 sequences Aligns these without regard for rest of sequence Sequences are not assumed to be similar over entire length BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Global vs Local Alignment - example #5 - Dynamic Programming 8/29/07 Global vs Local Alignment - example 1 = CTGTCGCTGCACG 2 = TGCCGTG CTGTCGCTGCACG -TGCCG-T----G Global alignment -TG-C-C-G--TG CTGTCGCTGCACG -TGCCG-TG---- Local alignment Which is better? BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Global vs Local Alignment Which should be used when? #5 - Dynamic Programming 8/29/07 Global vs Local Alignment Which should be used when? Both are important but it is critical to use right method for a given task! Global alignment: Good for: aligning closely related sequences of similar length Not good for: divergent sequences or sequences with different lengths Local Alignment: Good for: searching for conserved patterns (domains or motifs) in DNA or protein sequences Not good for: generating an alignment of closely related sequences Global and local alignments are fundamentally similar; they differ only in optimization strategy used to align similar residues BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 Alignment Algorithms 3 major methods for pairwise sequence alignment: Dot matrix analysis Dynamic programming Word or k-tuple methods (later, in Chp 4) BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Dot Matrix Method (Dot Plots) #5 - Dynamic Programming 8/29/07 Dot Matrix Method (Dot Plots) Place 1 sequence along top row of matrix Place 2nd sequence along left column of matrix Plot a dot each time there is a match between an element of row sequence and an element of column sequence For proteins, usually use more sophisticated scoring schemes than "identical match" Diagonal lines indicate areas of match Contiguous diagonal lines reveal alignment; "breaks" = gaps (indels) A C G A C G BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Interpretation of Dot Plots #5 - Dynamic Programming Interpretation of Dot Plots 8/29/07 When comparing 2 sequences: Diagonal lines of dots indicate regions of similarity between 2 sequences Reverse diagonals (perpendicular to diagonal) indicate inversions What do such patterns mean when comparing a sequence with itself (or its reverse complement)? e.g.: Reverse diagonals crossing diagonals (X's) indicate palindromes Exploring Dot Plots BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 Dot Matrix Variations Compare 2 sequences Identify matching regions Identities for DNA seqs Similarities for protein seqs Compare sequence with itself Identify repeated regions Identify inverted repeats Identify palindromes For long sequences? Too many dots! Noisy! To reduce noise, plot one dot per "window of n matching residues" instead of one dot per "residue" BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Strengths & Weakneses of Dot Plots #5 - Dynamic Programming Strengths & Weakneses of Dot Plots 8/29/07 Strengths: Fast and easy Allows direct visual identification of regions of similarity Repeats, inversions, etc. are readily apparent Displays all possible matches Weaknesses: Doesn't generate full alignment - user must "connect the diagonals" No statistical assessment of quality of alignment (score) Impractical and noisy for long sequences Difficult to scale up to muliple alignment BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 For Pairwise sequence alignment Idea: Display one sequence above another with spaces inserted in both to reveal similarity C A T - T C A - C | | | | | C - T C G C A G C BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Global alignment: Scoring #5 - Dynamic Programming Global alignment: Scoring 8/29/07 CTGTCG-CTGCACG -TGC-CG-TG---- Reward for matches:  Mismatch penalty:  Space/gap penalty:  Score = w – x - y w = #matches x = #mismatches y = #spaces BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Global alignment: Scoring #5 - Dynamic Programming 8/29/07 Global alignment: Scoring Reward for matches: 10 Mismatch penalty: -2 Space/gap penalty: -5 C T G T C G – C T G C - T G C – C G – T G - -5 10 10 -2 -5 -2 -5 -5 10 10 -5 Total = 11 We could have done better!! BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Optimum Alignment 8/29/07 Score of an alignment is a measure of its quality Optimum alignment problem: Given a pair of sequences X and Y, find an alignment (global or local) with maximum score BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Alignment algorithms 8/29/07 Global: Needleman-Wunsch Local: Smith-Waterman Both NW and SW use dynamic programming Variations: Gap penalty functions Scoring matrices BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Dynamic Programming (DP) 8/29/07 As computer science concept - formalized in early 1950's by Bellman at RAND Corporation “Frequently, however, there are only a polynomial number of subproblems… If we keep track of the solution to each subproblem solved, and simply look up the answer when needed, we obtain a polynomial-time algorithm. “ ----Aho, Hopcroft, Ullman Reported to biologists for sequence alignment problems by Needleman & Wunsch, 1969 BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Key Idea 8/29/07 The score of the best possible alignment that ends at a given pair of positions (i, j) is equal to: the score of best alignment ending just previous to those two positions (i.e., ending at i-1, j-1) PLUS the score for aligning xi and yj BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Problem Formulation & Notations #5 - Dynamic Programming Problem Formulation & Notations 8/29/07 Given two sequences (strings) X = x1x2…xN of length N x = AGC N = 3 Y = y1y2…yM of length M y = AAAC M = 4 Construct a matrix with (N+1) x (M+1) elements, where S(i,j) = score of best alignment of x[1..i]=x1x2…xi with y[1..j]=y1y2…yj S(2,3) = score of best alignment of AG (x1x2) to AAA (y1y2y3) x1 x2 x3 y1 y2 y3 y4 BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Dynamic Programming 4 Steps: 8/29/07 Dynamic Programming 4 Steps: Define score of optimum alignment, using recursion Initialize and fill in a DP matrix for storing optimal scores of subproblems, by solving smallest subproblems first (bottom-up approach) Calculate score of optimum alignment(s) Trace back through matrix to recover optimum alignment(s) that generated optimal score BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

1- Define Score of Optimum Alignment using Recursion #5 - Dynamic Programming 1- Define Score of Optimum Alignment using Recursion 8/29/07 Define: Initial conditions: Recursive definition: For 1  i  N, 1  j  M: BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 2- Initialize & Fill in DP Matrix for Storing Optimal Scores of Subproblems Construct sequence vs sequence matrix: S(N,M) S(0,0)=0 S(i,j) S(i-1,j) S(i-1,j-1) S(i,j-1) 1 N M Recursion Initialization BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 2- cont Fill in DP Matrix Fill in from [0,0] to [N,M] (row by row), calculating best possible score for each alignment including residues at [i,j] Keep track of dependencies of scores (in a pointer matrix). S(N,M) S(0,0)=0 S(i,j) S(i-1,j) S(i-1,j-1) S(i,j-1) 1 N M BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

3- for Global Alignment: Calculate Score S(N,M) of Optimum Alignment #5 - Dynamic Programming 8/29/07 3- for Global Alignment: Calculate Score S(N,M) of Optimum Alignment What happens in last step in alignment of x[1..i] to y[1..j]? 1 of 3 cases applies: x1 x2 . . . xi-1 xi y1 y2 . . . yj-1 yj S(i-1,j-1) + (xi,yj) x1 x2 . . . xi-1 xi y1 y2 . . . yj — S(i-1,j) -  x1 x2 . . . xi — S(i,j-1) -  xi aligns to yj xi aligns to a gap yj aligns to a gap BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 Example Case 1: Line up xi with yj x: C A T T C A C y: C - T T C A G i - 1 i j j -1 x: C A T T C A - C y: C - T T C A G - Case 2: Line up xi with space i - 1 i j x: C A T T C A C - y: C - T T C A - G Case 3: Line up yj with space i j j -1 BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming Fill in the matrix 8/29/07 λ C T C G C A G C λ 0 -5 -10 -15 -20 -25 -30 -35 -40 -5 -10 -15 -20 -25 -30 -35 10 5 C A T T C We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) A C +10 for match, -2 for mismatch, -5 for space BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Calculate score of optimum alignment #5 - Dynamic Programming Calculate score of optimum alignment 8/29/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 We first compute T[i, j] for the smallest possible values of i and j, then for increasing values of i and j Usually performed with a table of size (n + 1) X (m + 1) +10 for match, -2 for mismatch, -5 for space BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

#5 - Dynamic Programming 8/29/07 4- Trace back through matrix to recover optimum alignment(s) that generated the optimal score How? "Repeat" alignment calculations in reverse order, starting at from position with highest score and following path, position by position, back through matrix Result? Optimal alignment(s) of sequences BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs

Traceback to Recover Alignment #5 - Dynamic Programming Traceback to Recover Alignment 8/29/07 λ C T C G C A G C C A T λ -5 -10 -15 -20 -25 -30 -35 -40 10 5 8 3 -2 -7 15 13 -4 20 18 28 23 26 33 * Can have >1 optimal alignment; this example has 2 BCB 444/544 F07 ISU Dobbs #5 - Dynamic Programming BCB 444/544 Fall 07 Dobbs