C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences Score: Match = 1 Mismatch = 0 Gap = -1
C T C G T A GTCTGTCT Find the Best Alignment For These Two Sequences How do we find the best alignment from this mess? We start in the lower right-hand corner and work backwards!
Dynamic Programming Finds the Best Score and the Corresponding Alignment ACTCG A C A G-4-2O122 T A G Alignment: Start in lower right corner and work backwards: AC- - TCG ACAGTAG
Rules to Discover The Alignment 1.Start in the lower right box – this box contains the best alignment score for the two sequences relative to this particular scoring scheme. NOTE: This may NOT be the largest value in the table, but it is the best score for completely aligning the two sequences. All other scores in the table are for partial alignments of the sequences. 2.Work backwards following the arrows from the present box in reverse order. 3.Diagonal arrow is a pairing of the characters 4.Vertical arrow represents a gap in the sequence across the top 5.Horizontal arrow represents a gap in the sequence along the side.
Discussion of Needleman-Wunsch 1.It greatly reduces the number of steps needed to find the best comparison of two sequences. If sequence 1 is m characters long and sequence two is n characters long then the number of steps is reduced to 3mn steps as opposed to as many as min(m,n)! steps. 2.3mn is not overwhelming if you are only comparing 2 sequences, but if you are comparing a query sequence against every other sequence in a 3million sequence database, it becomes intractable. 3.An adjustment needs to be made to ignore leading and trailing gaps. To do this simply place 0’s in the first row and column and do not allow any gap penalties after the last character in the shorter sequence has been aligned. See later example. 4.Furthermore, it is designed to optimize a global alignment and may misalign some subsequences that have high quality alignments. See later example.
Needleman – Wunsch Does NOT Always Give the Best Local Alignment Result! Consider the two sequences: AAACACGTGTCT and CACGT AAACACGTGTCT C A C G T Indicated Alignment: AAACACGTGTCT CAC- - GT - - But, CACGT is a subsequence of AAACACGTGTCT !!
AAACACGTGTCT C A C G T Problem: We penalize leading and ending gaps the same as interior gaps. Algorithm: Same as N-W except 0’s in first row and first column. Furthermore, horizontal and vertical moves after first sequence is aligned are penalty free in the bottom row. This algorithm is called SemiGlobal Alignment The Alignment: AAACACGTGTCT CACGT
Smith – Waterman Local Alignment Algorithm Scoring 1. Same as Needleman – Wunsch with one exception. 2. If a cell becomes negative evaluate it as 0. Local Alignment 1.Locate the last match in the table with the highest score. 2.Work backwards from that match as in Needleman – Wunsch until you come to a zero. 3.Stop prior to the zero.
ATCTCGTATGATG G T C T A T C A C Smith-Waterman Local Alignment TCGTATGA TC- TATCA
Discussion of Smith – Waterman 1.Finds the highest scoring subsequence alignment within the two sequences. This is very useful when comparing two very long sequences. 2.Still requires 3mn steps to complete the scoring matrix which makes it impractical for an extended database search. 3.It is generally recognized as the most accurate of the local alignment schemes. 4.Results are still dependent upon the scoring scheme. See your friendly local neighborhood biologist for help.