Download presentation
Presentation is loading. Please wait.
Published byChristal Taylor Modified over 9 years ago
1
Brandon Andrews
2
Longest Common Subsequences Global Sequence Alignment Scoring Alignments Local Sequence Alignment Alignment with Gap Penalties Questions
3
Goal: Looking for sequence similarity between two sequences Sequences can vary in length between each other Sequences are denoted as v and w and are viewed as strings of characters. v = ATTGCTA
4
Subsequences are an ordered sequence of characters in v or w For example: v = ATTGCTA then AGCA and ATTA are subsequences AGCA: ATTGCTA ATTA: ATTGCTA
5
The only operations we can perform is insertion and deletion Insertion: ATCTGAT -> A-TCTGAT The hyphen represents inserting anything Deletion: Insertion into the other sequence to offset the characters to line up the longest common subsequences v=AT-C-TGAT w=-TGCAT-A- How do we find TCTA using dynamic programming?
6
Turning one sequence into another with the least number of operations. Allowed insertion, deletion, and substitutions The longest common subsequences problem is basically identical with only insertion and deletion and the weights are 0 for a non-match and 1 for a match in the grid (basically Manhattan with fixed weights)
7
Example: Other slides Chapter 6: Edit Distance, Slides 54-58,
8
Chapter 6: Alignment
9
Scoring matrices are based on biological evidence. Certain amino acid mutations are more common than others. For instance, Asn, Asp, Glu, and Ser are the most mutable amino acids The probability that Ser mutates into Phe is approximately three times as likely as Trp mutating into the same amino acid Phe
10
1 mutation for every 100 amino acids Required condition that ensures proteins that are being analyzed are closely related. The scoring matrix uses probabilities that can change if the proteins are not closely related. The probability that one amino acid can mutate into another is different essentially 1 PAM is the average time for the “average” protein to mutate 1% You end up with PAM 1, PAM 2 type scoring matrices
11
Global alignment looked at two entire strings Local alignment attempts to only look for local alignments That is look for small sequences that are similar in larger sequences
12
Set an edge weight of 0 from the source to every other vertex.
13
Gaps are expected in the sequences. However, very small gaps could indicate dissimilarity, so a penalty is given for gaps that meet a criteria
14
An Introduction to Bioinformatics Algorithms Related Slides
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.