Download presentation
Presentation is loading. Please wait.
Published byArabella Melton Modified over 9 years ago
1
. Sequence Alignment
2
Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences as strings of letters u DNA & RNA: alphabet of 4 letters u Protein: alphabet of 20 letters
3
20 Amino Acids u Glycine (G, GLY) u Alanine (A, ALA) u Valine (V, VAL) u Leucine (L, LEU) u Isoleucine (I, ILE) u Phenylalanine (F, PHE) u Proline (P, PRO) u Serine (S, SER) u Threonine (T, THR) u Cysteine (C, CYS) u Methionine (M, MET) u Tryptophan (W, TRP) u Tyrosine (T, TYR) u Asparagine (N, ASN) u Glutamine (Q, GLN) u Aspartic acid (D, ASP) u Glutamic Acid (E, GLU) u Lysine (K, LYS) u Arginine (R, ARG) u Histidine (H, HIS) u START: AUG u STOP: UAA, UAG, UGA
4
Sequence Comparison u Finding similarity between sequences is important for many biological questions For example: u Find genes/proteins with common origin Allows to predict function & structure u Locate common subsequences in genes/proteins Identify common “motifs” u Locate sequences that might overlap Help in sequence assembly
5
Sequence Alignment Input: two sequences over the same alphabet Output: an alignment of the two sequences Example: u GCGCATGGATTGAGCGA u TGCGCCATTGATGACCA A possible alignment: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A
6
Alignments -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Three elements: u Perfect matches u Mismatches u Insertions & deletions (indel)
7
Choosing Alignments There are many possible alignments For example, compare: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A to ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Which one is better?
8
Scoring Alignments Rough intuition: u Similar sequences evolved from a common ancestor u Evolution changed the sequences from this ancestral sequence by mutations: Replacements: one letter replaced by another Deletion: deletion of a letter Insertion: insertion of a letter u Scoring of sequence similarity should examine how many operations took place
9
Simple Scoring Rule Score each position independently: u Match: +1 u Mismatch: -1 u Indel -2 Score of an alignment is sum of positional scores
10
Example Example: -GCGC-ATGGATTGAGCGA TGCGCCATTGAT-GACC-A Score: (+1x13) + (-1x2) + (-2x4) = 3 ------GCGCATGGATTGAGCGA TGCGCC----ATTGATGACCA-- Score: (+1x5) + (-1x6) + (-2x11) = -23
11
More General Scores u The choice of +1,-1, and -2 scores was quite arbitrary u Depending on the context, some changes are more plausible than others Exchange of an amino-acid by one with similar properties (size, charge, etc.) vs. Exchange of an amino-acid by one with opposite properties
12
For proteins
13
Additive Scoring Rules u We define a scoring function by specifying a function (x,y) is the score of replacing x by y (x,-) is the score of deleting x (-,x) is the score of inserting x u The score of an alignment is the sum of position scores
14
Edit Distance u The edit distance between two sequences is the “cost” of the “cheapest” set of edit operations needed to transform one sequence into the other u Computing edit distance between two sequences almost equivalent to finding the alignment that minimizes the distance
15
Computing Edit Distance u How can we compute the edit distance?? If | s | = n and | t | = m, there are more than alignments u The additive form of the score allows to perform dynamic programming to compute edit distance efficiently
16
Recursive Argument Define the notation: Using the recursive argument, we get the following recurrence for V :
17
Recursive Argument u Of course, we also need to handle the base cases in the recursion:
18
Dynamic Programming Algorithm We fill the matrix using the recurrence rule
19
Dynamic Programming Algorithm Conclusion: d( AAAC, AGC ) = -1
20
Reconstructing the Best Alignment u To reconstruct the best alignment, we record which case in the recursive rule maximized the score
21
Reconstructing the Best Alignment u We now trace back the path the corresponds to the best alignment AAAC AG-C
22
Reconstructing the Best Alignment u Sometimes, more than one alignment has the best score AAAC A-GC
23
Local Alignment Consider now a different question: Can we find similar substring of s and t Formally, given s[1..n] and t[1..m] find i,j,k, and l such that d(s[i..j],t[k..l]) is maximal
24
Local Alignment u As before, we use dynamic programming We now want to set V[i,j] to record the best alignment of a suffix of s[1..i] and a suffix of t[1..j] u How should we change the recurrence rule?
25
Local Alignment New option: u We can start a new match instead of extend previous alignment Alignment of empty suffixes
26
Local Alignment Example s = TAATA t = ATCTAA
27
Local Alignment Example s = TAATA t = TACTAA
28
Local Alignment Example s = TAATA t = TACTAA
29
Local Alignment Example s = TAATA t = TACTAA
30
Sequence Alignment We seen two variants of sequence alignment: u Global alignment u Local alignment Other variants: u Finding best overlap (exercise) All are based on the same basic idea of dynamic programming
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.