Introduction to Dynamic Programming

Slides:



Advertisements
Similar presentations
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Advertisements

Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Measuring the degree of similarity: PAM and blosum Matrix
Space/Time Tradeoff and Heuristic Approaches in Pairwise Alignment.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Alignment Storing, retrieving and comparing DNA sequences in Databases. Comparing two or more sequences for similarities. Searching databases.
1-month Practical Course Genome Analysis (Integrative Bioinformatics & Genomics) Lecture 3: Pair-wise alignment Centre for Integrative Bioinformatics VU.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Developing Sequence Alignment Algorithms in C++ Dr. Nancy Warter-Perez May 21, 2002.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence similarity. Motivation Same gene, or similar gene Suffix of A similar to prefix of B? Suffix of A similar to prefix of B..Z? Longest similar.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
LCS and Extensions to Global and Local Alignment Dr. Nancy Warter-Perez June 26, 2003.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Developing Pairwise Sequence Alignment Algorithms
Space-Efficient Sequence Alignment Space-Efficient Sequence Alignment Bioinformatics 202 University of California, San Diego Lecture Notes No. 7 Dr. Pavel.
Protein Sequence Alignment and Database Searching.
Gapped BLAST and PSI- BLAST: a new generation of protein database search programs By Stephen F. Altschul, Thomas L. Madden, Alejandro A. Schäffer, Jinghui.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
Indexing DNA sequences for local similarity search Joint work of Angela, Dr. Mamoulis and Dr. Yiu 17/5/2007.
Lecture 6. Pairwise Local Alignment and Database Search Csc 487/687 Computing for bioinformatics.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
Comp. Genomics Recitation 3 The statistics of database searching.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Chapter 3 Computational Molecular Biology Michael Smith
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Sequence Comparison Algorithms Ellen Walker Bioinformatics Hiram College.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
BLAST, which stands for basic local alignment search tool, is a heuristic algorithm that is used to find similar sequences of amino acids or nucleotides.
Space Efficient Alignment Algorithms and Affine Gap Penalties Dr. Nancy Warter-Perez.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Construction of Substitution matrices
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Introduction to Sequence Alignment. Why Align Sequences? Find homology within the same species Find clues to gene function Practical issues in experiments.
4.2 - Algorithms Sébastien Lemieux Elitra Canada Ltd.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
INTRODUCTION TO BIOINFORMATICS
The ideal approach is simultaneous alignment and tree estimation.
Sequence comparison: Local alignment
Biology 162 Computational Genetics Todd Vision Fall Aug 2004
Identifying templates for protein modeling:
#8 Finish DP, Scoring Matrices, Stats & BLAST
Using Dynamic Programming To Align Sequences
SMA5422: Special Topics in Biotechnology
Pairwise sequence Alignment.
#7 Still more DP, Scoring Matrices
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Basic Local Alignment Search Tool (BLAST)
Pairwise Alignment Global & local alignment
Sequence alignment BI420 – Introduction to Bioinformatics
Basic Local Alignment Search Tool (BLAST)
Reconfigurable Computing (EN2911X, Fall07)
Presentation transcript:

Introduction to Dynamic Programming The sequence alignment problem Wilson Leung 08/2015

Outline Overview of the sequence alignment problem Calculate the optimal global alignment Characteristics of dynamic programming algorithms Calculate the optimal local alignment Purpose: computational technique – iteration and recursion Motivation: how does computer scientists think about computational problems

Learning objectives Understand the theory behind sequence alignment Become a better informed user of NCBI BLAST This presentation will not cover: The BLAST algorithm Parameter optimizations Statistics for similarity searches (Karlin-Altschul theory) This talk will not necessarily make you a better BLAST user See Dr. Buhler’s lecture notes on BLAST Illustrates how to design a computational algorithm to solve a biological problem: Generate an optimal alignment between two sequences Korf, I., Yandell, M., and Bedell, J. (2003). BLAST. O’Reilly Media, Inc.

Design goals Generate an alignment between two sequences minimize the edit distance Generate an alignment between two sequences Identify the “best” (most parsimonious) alignment Generate the best alignment “quickly”

Strategy #1: Visual inspection ATTACCAG Query: || ||||| ATCACCAG Subject: Sequences must have high percent identity Applications: PAM scoring matrix (align sequences with >= 85% identity) Align mononucleotide runs during sequence improvement

Strategy #2: Enumerate all alignments Guaranteed to find the best alignment Does not scale Combinatorial explosion Two 300 bp sequences have ~10179 possible alignments (Eddy 2004) Brute-force algorithm Establish baseline performance and test cases Identify patterns in the problem space Brute-force approach used to determine short passwords

Apply the brute force algorithm to a single column of the alignment Homologous A- -A Not homologous A Query: -A A- A Subject: Three possible alignments for two 1 bp sequences Query length (M) = 1; Subject length (N) =1 Only two biological interpretations: A in the query is homologous to A in the subject A in the query is not homologous to A in the subject Most parsimonious alignment – minimize the number of gaps

Six possible relationships between the query and subject for M=2, N=2 2 aligned bases -AT A-T AT- 1 aligned base AT-- --AT A-T- -A-T A--T -AT- 0 aligned bases Do not have a time machine, but the alignment implies an evolutionary relationship Most of the possibilities are caused by gaps Each color denotes a different evolutionary relationship

Observations from the brute force alignment strategy Many of the possible alignments are redundant Imply the same evolutionary relationship Large number of possible alignments 13 possible alignments for sequences of length 2 Can ignore many possible alignments Many are suboptimal compared to the best alignment

Strategy #3: Dot plot Deletion in subject Align Subject (y) Cell position (i,j): i = Query position (x-axis) j = Subject position (y-axis) Draw a dot at (i,j) if the two bases are identical Connect the dots to make a line (alignment) Level of noise depends on repeat density Use longer words and higher cutoff scores to reduce noise Align Subject (y) Dot plot: compare each base in the query against each base in the subject Where does the alignment come from? Insertion in subject Query (x)

Assessment of the three sequence alignment strategies Infeasible to examine all possible alignments Need to reduce the search space Only a small subset of alignments are “interesting” Many alignments are redundant Connect the dots in the dot plot to create an alignment Consider the cumulative levels of similarity

The optimal alignment is composed of smaller optimal alignments AT AT Query: Subject: A T Query: Subject: - A T A - T - A T optimal substructure Only the best alignment at each position could be part of the final optimal alignment Align Deletion in subject Insertion in subject

Partition the alignment problem into smaller subproblems 1 100 1 Query Subject Subject (y) 100 Query (x) Assume the query and subject sequences are the same

Three different ways to reach cell (i,j) in the alignment matrix Align with subject (i-1, j-1) A Gap in query (i, j-1) (i,j-1) - A (i-1,j) Gap in subject (i-1, j) A - Subject (y) If cell (i,j) is part of the optimal path, identify the best path to reach cell (i,j) Compare sequences of length 0 – algorithm will terminate A (i,j) Query (x) Arrow = alignment

Construct a scoring system to measure similarity between two sequences Scoring system for the aligned state: 𝛔 𝛔(a, b) = Score for aligning a in query with b in subject 𝛔(A, A) = Bonus for aligning A in query with A in subject 𝛔(A, T) = Penalty for aligning A in query with T in subject Penalty for adding a gap: 𝛾 More sophisticated scoring systems take transitions, transversions, affine gap penalty into account Pearson WR. Selecting the Right Similarity-Scoring Matrix. Curr Protoc Bioinformatics. 2013;43:3.5.1-3.5.9. Protein sequences can use a more sophisticated scoring system (e.g. BLOSUM62)

Recursive definition for the optimal cumulative alignment score S(i,j) S(i,j) = max { } 𝛾 S(i ,j-1) + 𝛾 Gap in query S(i-1,j-1) + 𝛔(a,b) 𝛔(a,b) Align Subject (y) S(i-1,j ) + 𝛾 𝛾 Gap in subject prove by contradiction: only the best alignment at cell i,j could be part of the final optimal alignment b (i-1,j) (i,j) Query (x)

Determine the best way to reach cell (i,j) if it were part of the optimal alignment Query ? Optimal alignment Subject S(i,j) = max { } Align a b Prove by contradiction: only the best alignment at cell i,j could be part of the final optimal alignment Turns out we do not actually need to know the optimal alignment - One of the alignments in the DP matrix will be the optimal alignment (i.e. has the highest score) Gap in subject a Gap in query b

Use the maximum score at each cell to eliminate entire branch of suboptimal alignments (i,j) Gap in query Gap in subject We can eliminate entire branch of alignments because, by definition, there is a better alternative alignment Score encapsulates the history of all the decisions made up to cell (i,j) Align

Cumulative score S(i,j) encapsulates the alignment decisions up to position (i,j) All potential optimal alignments that go through cell (i,j) have the same ancestry Re-use the cumulative alignment score (memoization) Gaps are described by the cumulative score Do not affect the coordinates of the alignment matrix Do not know the optimal alignment until we complete the entire alignment matrix Optimal alignment has the highest cumulative score

Needleman-Wunsch algorithm (global alignment) (Query length: M; Subject length: N) Construct a (M+1) x (N+1) matrix Extra column and row = gaps at the beginning of the alignment Fill in the cells in the first row and first column with the cumulative gap costs Calculate the maximum score for subsequent cells (i,j) Keep track of the decision that leads to the maximum score (S) S(i-1,j-1) + 𝛔(a,b) S(i,j) = max S(i-1,j ) + 𝛾 S(i ,j-1) + 𝛾 Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol. 1970 Mar;48(3):443-53.

Initialize the alignment matrix (Match = +5; Mismatch = -2; Gap = -6) 1 2 3 4 5 6 7 8 T G C T C G T A 2 1 5 4 3 6 -6 -12 -18 -24 -30 -36 -42 -48 T -12 -6 -30 -24 -18 -36 T C Subject Alignment matrix corresponds to the decision tree A T A Query (Eddy, 2004)

Calculate the possible scores for the cell at position (1,1) (0,0) (1,0) S(1,1) = max { } -6 S(0,0) + 𝛔(T,T) 𝛔(T,T) S(1,0) + 𝛾 𝛾 Subject (y) S(0,1) + 𝛾 𝛾 -6 T (0,1) (1,1) Query (x) Align Gap in subject Gap in query

Calculate the optimal score for the cell at position (1,1) S(1,1) = max { } 0 + (+5) = 5 +5 5 -6 -6 + (-6) = -12 -6 -12 -6 + (-6) = -12 -6 -12 Subject (y) -6 T 5 S(1,1) = 5 Query (x) (Match = +5; Mismatch = -2; Gap = -6)

Calculate the possible scores for the cell at position (2,1) G (1,0) (2,0) S(2,1) = max { } -6 -12 S(1,0) + 𝛔(T,G) 𝛔(T,G) S(2,0) + 𝛾 𝛾 Subject (y) S(1,1) + 𝛾 𝛾 5 T (1,1) (2,1) Query (x) Align Gap in subject Gap in query

Calculate the optimal score for the cell at position (2,1) G S(2,1) = max { } -6 -6 + (-2) = -8 -2 -8 -12 -12 + (-6) = -18 -6 -18 5 + (-6) = -1 -6 -1 Subject (y) 5 T -1 S(2,1) = -1 Query (x) (Match = +5; Mismatch = -2; Gap = -6)

Align Gap in query Gap in subject Alignment matrix after two iterations (Match = +5; Mismatch = -2; Gap = -6) 1 2 3 4 5 6 7 8 T G C T C G T A 2 1 5 4 3 6 -6 -12 -18 -24 -30 -36 -42 -48 T -12 -6 -30 -24 -18 -36 5 -1 T C Subject aligned base followed by gap in subject A T A Query

Calculate the optimal score for the cell at position (3,1) G C S(3,1) = max { } -12 -18 -12 + (-2) = -14 -6 -24 -2 -14 -1 + (-6) = -7 Subject (y) -18 + (-6) = -24 -1 T -6 -7 -7 S(3,1) = -7 Query (x) (Match = +5; Mismatch = -2; Gap = -6)

Matrix after three iterations (Match = +5; Mismatch = -2; Gap = -6) Align Gap in query Gap in subject Matrix after three iterations (Match = +5; Mismatch = -2; Gap = -6) 1 2 3 4 5 6 7 8 T G C T C G T A 2 1 5 4 3 6 -6 -12 -18 -24 -30 -36 -42 -48 T -12 -6 -30 -24 -18 -36 5 -1 -7 T C Subject aligned base followed by gap in subject A T A Query

Calculate the optimal score for the cell at position (1,2) S(1,2) = max { } -6 5 -6 + (+5) = -1 T -6 -1 +5 -1 -12 + (-6) = -18 Subject (y) 5 + (-6) = -1 -12 T -6 -18 -1 S(1,2) = -1 Query (x) (Match = +5; Mismatch = -2; Gap = -6)

Complete alignment matrix (Match = +5; Mismatch = -2; Gap = -6) Gap in query Gap in subject Complete alignment matrix (Match = +5; Mismatch = -2; Gap = -6) 1 2 3 4 5 6 7 8 T G C T C G T A 2 1 5 4 3 6 -6 -12 -18 -24 -30 -36 -42 -48 T -6 5 -1 -7 -13 -19 -25 -31 -37 T -12 -1 3 -3 -2 -8 -14 -20 -26 C -18 -7 -3 8 2 3 -3 Subject -9 -15 Different alignments can produce the same cumulative score: cell (1,2) Align to gap in subject = 5-6 = -1 Align T to a T = -6+5 = -1 A -24 -13 -9 2 6 1 -5 -4 T -30 -19 -15 -4 7 4 -2 6 A -36 -25 -21 -10 1 5 2 11 Query

Use traceback to recover the optimal alignment Start from the cell within the last row and last column that has the highest score Recall the step (color) that leads to this optimal score Report this step in the alignment output All the alignment decisions have already been made Repeat until we reached the beginning of the sequence Two options if multiple paths produce the same score Report only one of the paths (pick arbitrarily) Report all paths with the optimal score Alternate paths could produce more optimal local alignments Cells in last row and last column – allow gaps at the end of the alignment

Query: T C G A T A Subject: Traceback: Query T G C T C G T A -6 -12 -18 -24 -30 -36 -42 -48 T -6 5 -1 -7 -13 -19 -25 -31 -37 T -12 -1 3 -3 -2 -8 -14 -20 -26 Subject C -18 -7 -3 8 2 3 -3 -9 -15 A -24 -13 -9 2 6 1 -5 -4 T -30 -19 -15 -4 7 4 -2 6 A -36 -25 -21 -10 1 5 2 11

Calculate the optimal score for the cell at position (5,3) S(5,3) = max { } -2 -2 + (+5) = 3 +5 3 -8 -8 + (-6) = -14 -6 -14 2 + (-6) = -4 -6 -4 Subject (y) 2 C 3 S(5,3) = 3 Query (x) (Match = +5; Mismatch = -2; Gap = -6)

Traceback must follow the steps that produce the optimal cumulative global alignment score -2 -8 T Subject (y) Global alignment often contains suboptimal local alignments Cannot go to 2 because it produces a worse score (i.e. -4) 2 3 C Query (x)

Query: T G - C - T C G A T A Subject: Traceback: Query T G C T C G T A -6 -12 -18 -24 -30 -36 -42 -48 T -6 5 -1 -7 -13 -19 -25 -31 -37 T -12 -1 3 -3 -2 -8 -14 -20 -26 Subject C -18 -7 -3 8 2 3 -3 -9 -15 A -24 -13 -9 2 6 1 -5 -4 T -30 -19 -15 -4 7 4 -2 6 A -36 -25 -21 -10 1 5 2 11

The Needleman-Wunsch algorithm is an example of a dynamic programming algorithm Problem must satisfy two criteria: Optimal substructure Optimal solution to the complete problem is composed of optimal solutions to the subproblems Overlapping problems Re-use the results for the subproblems (e.g., lookup table) Many bioinformatics problems satisfy these criteria Sequence alignment, gene prediction, RNA-folding Divide and conquer = optimal substructure + non-overlapping subproblems Same ancestry means that we can reuse results of the subproblems CS note: able to memoize results because function is idempotent Bellman B. The theory of dynamic programming. Bulletin of the American Mathematical Society. 1954; 60(6):503–516

Smith-Waterman algorithm (local alignment) (Query length: M; Subject length: N) Three changes to the Needleman-Wunsch algorithm: The minimum score for a cell is zero Initiate a new alignment when the cumulative score is negative Begin traceback from the cell within the entire matrix that has the highest score Terminate traceback when the score is zero S(i-1,j-1) + 𝛔(a,b) S(i-1,j ) + 𝛾 S(i,j) = max S(i ,j-1) + 𝛾 Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol. 1981 Mar 25;147(1):195-7.

Global versus local alignments Global alignment Optimal alignment along the entire length of two sequences Compare protein sequences to identify orthologs Local alignment Optimal alignment between parts of two sequences Identify conserved domains within protein sequences Glocal (semi-global) alignment Optimal global alignment for one sequence; optimal local alignment for the other sequence Map a coding exon against a genomic sequence local alignments do not have terminal gaps

Initialize the local alignment matrix (Match = +5; Mismatch = -2; Gap = -6) 1 2 3 4 5 6 7 8 T G C T C G T A 2 1 5 4 3 6 T T C Subject A T A Query

Calculate the possible local alignment scores for the cell at position (1,1) (0,0) (1,0) S(1,1) = max { } S(0,0) + 𝛔(T,T) 𝛔(T,T) S(1,0) + 𝛾 𝛾 S(0,1) + 𝛾 𝛾 Subject (y) T (0,1) (1,1) Query (x) Align Gap in subject Gap in query

Calculate the optimal local alignment score for the cell at position (1,1) S(1,1) = max { } 0 + (+5) = 5 +5 5 0 + (-6) = -6 -6 0 + (-6) = -6 -6 Subject (y) T 5 S(1,1) = 5 Query (x) (Match = +5; Mismatch = -2; Gap = -6)

Local alignment matrix (Match = +5; Mismatch = -2; Gap = -6) Gap in query Gap in subject Local alignment matrix (Match = +5; Mismatch = -2; Gap = -6) 1 2 3 4 5 6 7 8 T G C T C G T A 2 1 5 4 3 6 T 5 5 5 T 5 3 5 3 5 3 C 3 8 2 10 4 Subject 3 A 2 6 4 8 2 5 T 5 7 4 2 13 7 A 3 1 5 2 7 18 Query

Query: T C G A T A Subject: Traceback: Query T G C T C G T A T 5 5 5 T 5 3 5 3 5 3 Subject C 3 8 2 10 4 3 A 2 6 4 8 2 5 T 5 7 4 2 13 7 A 3 1 5 2 7 18

Techniques to improve the performance of sequence alignment Time and space complexity: O(MN) Double the size of the two sequences leads to a four-fold increase in the amount of time and space required Reduce memory requirement Myers EW, Miller W. Optimal alignments in linear space. Comput Appl Biosci. 1988 Mar;4(1):11-7. Fill the matrix in parallel (SIMD, CUDA) Farrar M. Striped Smith-Waterman speeds database searches six times over other SIMD implementations. Bioinformatics. 2007 Jan 15;23(2):156-61. Find high-scoring instead of the best alignment Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10. Myers and Miller implementation based on ideas from Hirschberg (1975) Hirschberg DS. A linear space algorithm for computing longest common subsequences. Commun. Assoc. Comput. Mach. 1975;18:341-343. SIMD = Simple instruction, multiple data Custom hardware from TimeLogic (Decypher)

Questions? Eddy SR. What is dynamic programming? Nat Biotechnol. 2004 Jul;22(7):909-10.

Rationale for calculating the scores for the entire alignment matrix Cannot determine the best global alignment without aligning the entire query and subject sequences Cannot evaluate all possible alignments If the alignment before we reached cell (i,j) is part of the optimal alignment: Identify the next step (i.e. align, gap in query, gap in subject) that will be part of the optimal alignment Use traceback to determine the final alignment Different alignments could produce the same score

Overview of the BLAST algorithm Heuristic algorithm to find local regions of similarity between the query and subject sequences Consists of four main stages: Find common subsequences (words) Extend the word matches into longer alignments Evaluate the significance of the high-scoring segment pairs (HSPs) Combine multiple HSPs into a longer alignment Korf, I., Yandell, M. and Bedell, J. (2003). The BLAST Algorithm. In BLAST (76-87). Sebastopol, CA: O’Reilly Media, Inc.

Number of alignments for two sequences with length N Stirling’s approximation

Number of alignments for two sequences with length N

Number of alignments for two sequences with length N

Brute force alignment approach is computationally intractable Sequence length (N) # possible alignments 10 1.87E+05 50 1.01E+29 100 9.07E+58 200 1.03E+119 300 1.35E+179 400 1.88E+239 500 2.70E+299