Download presentation
Presentation is loading. Please wait.
Published byBenjamin Cannon Modified over 5 years ago
1
Find the Best Alignment For These Two Sequences
C T C G T A -1 -2 -3 -4 -5 -6 G T C Score: Match = Mismatch = Gap = -1
2
Find the Best Alignment For These Two Sequences
C T C G T A -1 -2 -3 -4 -5 -6 G T C Score: Match = Mismatch = Gap = -1
3
Find the Best Alignment For These Two Sequences
C T C G T A -1 -2 -3 -4 -5 -6 G T C Score: Match = Mismatch = Gap = -1
4
Find the Best Alignment For These Two Sequences
C T C G T A -1 -2 -3 -4 -5 -6 1 2 G T C How do we find the best alignment from this mess? We start in the lower right-hand corner and work backwards!
5
Dynamic Programming Finds the Best Score and the Corresponding Alignment
-1 -2 -3 -4 -5 1 2 O -6 -7 Alignment: Start in lower right corner and work backwards: AC- - TCG ACAGTAG
6
Rules to Discover The Alignment
Start in the lower right box – this box contains the best alignment score for the two sequences relative to this particular scoring scheme. NOTE: This may NOT be the largest value in the table, but it is the best score for completely aligning the two sequences. All other scores in the table are for partial alignments of the sequences. Work backwards following the arrows from the present box in reverse order. Diagonal arrow is a pairing of the characters Vertical arrow represents a gap in the sequence across the top Horizontal arrow represents a gap in the sequence along the side.
7
Discussion of Needleman-Wunsch
It greatly reduces the number of steps needed to find the best comparison of two sequences. If sequence 1 is m characters long and sequence two is n characters long then the number of steps is reduced to 3mn steps as opposed to as many as min(m,n)! steps. 3mn is not overwhelming if you are only comparing 2 sequences, but if you are comparing a query sequence against every other sequence in a 3million sequence database, it becomes intractable. An adjustment needs to be made to ignore leading and trailing gaps. To do this simply place 0’s in the first row and column and do not allow any gap penalties after the last character in the shorter sequence has been aligned. See later example. Furthermore, it is designed to optimize a global alignment and may misalign some subsequences that have high quality alignments. See later example.
8
Needleman – Wunsch Does NOT Always Give the Best Local Alignment Result!
Consider the two sequences: AAACACGTGTCT and CACGT A C G T -1 -2 -3 -4 -5 -6 -7 -8 -10 -11 -12 -13 -9 1 2 Indicated Alignment: AAACACGTGTCT CAC- - GT - - But, CACGT is a subsequence of AAACACGTGTCT !!
9
Problem: We penalize leading and ending gaps the same as interior gaps.
C G T 1 2 3 4 5 Algorithm: Same as N-W except 0’s in first row and first column. Furthermore, horizontal and vertical moves after first sequence is aligned are penalty free in the bottom row. This algorithm is called SemiGlobal Alignment The Alignment: AAACACGTGTCT - - - CACGT
10
Smith – Waterman Local Alignment Algorithm
Scoring 1. Same as Semi-Global Alignment (no penalties for leading and trailing gaps) with one exception. 2. If a cell becomes negative evaluate it as 0. Local Alignment Locate the last match in the table with the highest score. Work backwards from that match as in Needleman – Wunsch until you come to a zero. Stop prior to the zero.
11
Smith-Waterman Local Alignment
Start with no penalty for leading gaps. Our scoring system will be 1 for a match, 0 for a mismatch, and -1 for an interior gap. Note: no cell can contain a negative number.
12
A T C G 1 Following the Carpenter’s Square design we come up with this partially filled in table.
13
A T C G 1 2
14
A T C G 1 2 3 NOTE: In the cell with the red numeral, the gap in the sequence along the left column proved to be the best alignment to that point.
15
Here is the final table for the two sequences
Here is the final table for the two sequences. Now we need to find the best, LOCAL alignment of subsequences of the two sequences. A T C G 1 2 3 4 5
16
Smith-Waterman Local Alignment
1 2 3 4 5 Here is the best LOCAL alignment: TCGTATGA TC- TATCA
17
Discussion of Smith – Waterman
Finds the highest scoring subsequence alignment within the two sequences. This is very useful when comparing two very long sequences. Still requires 3mn steps to complete the scoring matrix which makes it impractical for an extended database search. It is generally recognized as the most accurate of the local alignment schemes. Results are still dependent upon the scoring scheme. See your friendly local neighborhood biologist for help.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.