Download presentation
Presentation is loading. Please wait.
1
Sequence Alignment Tutorial #2
© Ydo Wexler & Dan Geiger .
2
Sequence Comparison Much of bioinformatics involves sequences
DNA sequences RNA sequences Protein sequences We can think of these sequences as strings of letters DNA & RNA: |alphabet|=4 Protein: |alphabet|=20
3
Sequence Alignment (Global)
Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: GCGCATGGATTGAGCGA TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A
4
Alignments -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements:
Perfect matches Mismatches Insertions & deletions (indel)
5
Simple Scoring Rule Score each position independently: Match: +1
Mismatch : -1 Indel -2 Score of an alignment is sum of position scores
6
Example Example: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A
Score: (+1x13) + (-1x2) + (-2x4) = 3 ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Score: (+1x5) + (-1x6) + (-2x11) = -23
7
Variants of Sequence Alignment
We have seen two basic variants of sequence alignment: Global alignment (Needelman-Wunsch) Local alignment (Smith-Waterman) This tutorial we will pose and solve two problems : Finding the best overlap alignment Using an affine cost for gaps The solution is based on the ideas of dynamic programming presented in the lecture
8
Question I: Overlap Alignment
Consider the following question: Can we find the most significant overlap between two sequences s,t ? Possible overlap relations: a. b. The difference between this problem and local alignment studied in class is that here we require alignment between the endpoints of the two sequences.
9
Question I: Overlap Alignment
Formally, given s[1..n] and t[1..m] find i,j such that d=max{d(s[1..i],t[j..m]), d(s[i..n],t[1..j]), d(s[1..n],t[i..j]), d(s[i..j],t[1..m]) } is maximal. Solution: Same as Global alignment except that the dynamic programming should not penalise overhanging ends.
10
Overlap Alignment Initialization: V[i,0]=0 , V[0,j]=0
11
Overlap Alignment Example
s = PAWHEAE t = HEAGAWGHEE Scoring system: Match: +4 Mismatch: -1 Indel: -5
12
Overlap Alignment Initialization: V[i,0]=0 , V[0,j]=0
Recurrence: as in global alignment Score: maximum value at the bottom line and rightmost line in the matrix
13
Overlap Alignment Example
s = PAWHEAE t = HEAGAWGHEE Scoring system: Match: +4 Mismatch: -1 Indel: -5
14
Overlap Alignment Example
s = PAWHEAE t = HEAGAWGHEE Scoring system: Match: +4 Mismatch: -1 Indel: -5
15
Overlap Alignment Example
The best overlap is: PAWHEAE------ ---HEAGAWGHEE Pay attention! A different scoring system could yield a different result, such as: ---PAW-HEAE HEAGAWGHEE-
16
Question II: Alignment with affine gap scores
Observation: Insertions and deletions often occur in blocks longer than a single nucleotide. Consequence: Standard scoring of alignment studied in lecture, which give a constant penalty d per gap unit , does not score well this phenomenon; Hence, a better gap score model is needed. Question: Can you think of an appropriate change to the scoring system for gaps?
17
Alignment with affine gap scores
Define the penalty score for a gap of length g to be d is the penalty for the introduction of a gap, while e is the penalty for elongating the gap by one. Denote: M(i,j) - the score obtained by aligning s[i] to t[j] Is(i,j) - the score obtained by aligning s[i] to a gap It(i,j) - the score obtained by aligning t[j] to a gap We assume that a deletion will not be followed directly by an insertion. This can be obtained by using
18
Alignment with affine gap scores
Recurrence takes advantage of the already known values M(i’,j’), Is(i’,j’), It(i’,j’)* M(i-1,j-1) M(i-1,j) Is(i-1,j-1) Is(i-1,j) It(i-1,j-1) It(i-1,j) M(i,j-1) Is(i,j-1) It(i,j-1) *
19
Alignment with affine gap scores
And to put it in a familiar form
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.