Download presentation
Presentation is loading. Please wait.
1
Sequence Alignment Tutorial #2
© Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger .
2
Sequence Comparison Much of bioinformatics involves sequences
DNA sequences RNA sequences Protein sequences We can think of these sequences as strings of letters DNA & RNA: |alphabet|=4 Protein: |alphabet|=20
3
Global Alignment Input: two sequences over the same alphabet
Output: an alignment of the two sequences Example: GCGCATGGATTGAGCGA and TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A
4
Global Alignment -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A
Example (cont): -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: Perfect matches Mismatches Insertions & deletions (indel) Best biological explanaiton Biological data Hypotheses space Symmetric view of evolution
5
Global Alignment scoring scheme
Score each position independently: Match: +1 Mismatch: -1 Indel: -2 Score of an alignment is sum of position scores Example: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Score: (+1x13) + (-1x2) + (-2x4) = 3 ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Score: (+1x5) + (-1x6) + (-2x11) = -23
6
Sequence Alignment Variants
Two basic variants of sequence alignment: Global alignment (The Needelman-Wunsch Algorithm) Local alignment (The Smith-Waterman Algorithm) Today we’ll see : Overlap alignment Affine cost for gaps We’ll use ideas of dynamic programming presented in the lecture
7
Overlap Alignment Consider the following problem:
Find the most significant overlap between two sequences S,T ? Possible overlap relations: a. b. Difference from local alignment: Here we require alignment between the endpoints of the two sequences.
8
Overlap Alignment Formally:
given S[1..n] , T[1..m] find i,j such that: d=max{D(S[1..i],T[j..m]) , D(S[i..n],T[1..j]) , D(S[1..n],T[i..j]) , D(S[i..j],T[1..m]) } is maximal. Solution: Same as Global alignment except we don’t not penalise overhanging ends.
9
Overlap Alignment Initialization: V[i,0]=0 , V[0,j]=0
Recurrence: as in global alignment Score: maximum value at the bottom line and rightmost line global local overlap
10
Overlap Alignment (Example)
S = PAWHEAE T = HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5
11
Overlap Alignment (Example)
S = PAWHEAE T = HEAGAWGHEE Scoring scheme : Match: +4 Mismatch: -1 Indel: -5
12
Overlap Alignment (Example)
S = PAWHEAE T = HEAGAWGHEE Scoring scheme: Match: +4 Mismatch: -1 Indel: -5
13
Overlap Alignment (Example)
The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring scheme could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE- Scoring scheme : Match: +4 Mismatch: -1 Indel:
14
Affine gap scores Observation: Insertions and deletions often occur in blocks longer than a single nucleotide. Consequence: Current scoring scheme gives a constant penalty per gap unit. This does not score well the above phenomenon. Question: How do we modify the scheme to incorporate this?
15
Alignment with affine gap scores
Penalty score for a gap of length g : d - penalty for introduction of a gap e - penalty for elongating the gap by one unit. Typically d > e Problem: When aligning S[i] to a gap we do not know how much to penalize. d or e ? Solution: we compute 3 matrices simultaneously M(i,j) - the score obtained by aligning S[i] to T[j] IS(i,j) - the score obtained by aligning S[i] to a gap IT(i,j) - the score obtained by aligning T[j] to a gap
16
Affine gap scores Initialization: depending on the problem (global, local,…) Recurrence: uses already known values - M(i’,j’), IS(i’,j’), IT(i’,j’) M(i-1,j-1) M(i-1,j) IS(i-1,j-1) IS(i-1,j) IT(i-1,j-1) IT(i-1,j) M(i,j-1) IS(i,j-1) IT(i,j-1) We assume that a deletion will not be followed directly by an insertion. This can be obtained by using
17
Why are two matrices enough?
Affine gap scores Simplification: Why are two matrices enough?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.