Download presentation
Presentation is loading. Please wait.
Published byHarold Carpenter Modified over 9 years ago
1
Pairwise Sequence Alignment Exercise 2
2
|| || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG TGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAG GAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACT GATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCA GAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAG GTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACA ACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCC TGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGT CATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGC ATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTT TCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACA ATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTT TCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTA CTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAG GGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGG TTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAAC AAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGT CTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAA GGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCC CTGGCTCACAAGTACCATTGA MVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE… Motivation
3
What is sequence alignment? Alignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical or similar characters in the sequences. MVNLTSDEKTAVLALWNKVDVEDCGGE || || ||||| ||| || || || MVHLTPEEKTAVNALWGKVNVDAVGGE
4
Why sequence alignment? Predict characteristics of a protein – Premised on: similar sequence (or structure) similar function
5
Local vs. Global Global alignment – finds the best alignment across the whole two sequences. Local alignment – finds regions of high similarity in parts of the sequences. Local alignment – finds regions of high similarity in parts of the sequences. ADLGAVFALCDRYFQ |||| |||| | ADLGRTQN-CDRYYQ ADLG CDRYFQ |||| |||| | ADLG CDRYYQ Global alignment: forces alignment in regions which differ Local alignment concentrates on regions of high similarity
6
Three types of changes: 1. Substitution – a replacement of one (or more) sequence letter by another: 2. Insertion - an insertion of a letter or several letters to the sequence: 3. Deletion - deleting a letter (or more) from the sequence: TA Evolutionary changes in sequences Insertion + Deletion Indel AAGA AACA AAG GAAA
7
Choosing an alignment: Many different alignments are possible: AAGCTGAATTCGAA AGGCTCATTTCTGA A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA- Which alignment is better? AAGCTGAATT-C-GAA AGGCT-CATTTCTGA-
8
Exercise: compute both alignment scores Match: +1 Mismatch: -2 Indel: -1 AAGCTGAATT-C-GAA AGGCT-CATTTCTGA- A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA-
9
Scoring systems: accounting for biological context Which is true about the scores in a pairwise alignment of nucleotide sequences? 1.Tr > Tv > 0 2.Tr < Tv < 0 3.0 > Tr > Tv 4.0 > Tv > Tr Tr = Transition Tv = Transversion
10
Scoring systems: accounting for biological context Which is true about the scores in a pairwise alignment of amino-acid sequences? 1.Asp->Asn > Asp->Glu 2.Arg->His > Ala->Phe 3.Arg->His Met
11
Substitutions matrices Nucleic acids: Transition-transversion Transition-transversion Amino acids: Evolutionary (empirical data) based: (PAM, BLOSUM) Evolutionary (empirical data) based: (PAM, BLOSUM) Physico-chemical properties based (Grantham, McLachlan) Physico-chemical properties based (Grantham, McLachlan)
12
PAM matrices Family of matrices PAM 80, PAM 120, PAM 250 The number with a PAM matrix represents the evolutionary distance between the sequences on which the matrix is based Greater numbers denote greater distances
13
PAM - limitations Based on only one original dataset Examines proteins with few differences (85% identity) Based mainly on small globular proteins so the matrix is biased
14
BLOSUM matrices Different BLOSUMn matrices are calculated independently from BLOCKS (ungapped local alignments) BLOSUMn is based on a cluster of BLOCKS of sequences that share at least n percent identity BLOSUM62 represents closer sequences than BLOSUM45
15
Substitution matrices exercise Pick the best substitution matrix (PAM and BLOSUM) for each pairwise alignment: Human – chimp Human – chimp Human - yeast Human - yeast Human – fish Human – fish PAM options: PAM60PAM120PAM250 BLOSUM options: BLOSUM45BLOSUM62BLOSUM80
16
PAM Vs. BLOSUM PAM100 = BLOSUM90 PAM120 = BLOSUM80 PAM160 = BLOSUM60 PAM200 = BLOSUM52 PAM250 = BLOSUM45 More distant sequences BLOSUM62 for general use BLOSUM62 for general use BLOSUM80 for close relations BLOSUM80 for close relations BLOSUM45 for distant relations BLOSUM45 for distant relations PAM120 for general use PAM120 for general use PAM60 for close relations PAM60 for close relations PAM250 for distant relations PAM250 for distant relations
17
Gap penalty AAGCGAAATTCGAAC A-G-GAA-CTCGAAC AAGCGAAATTCGAAC AGG---AACTCGAAC Which alignment is more likely? Which alignment has a higher score?
18
Web servers for pairwise alignment
19
BLAST 2 sequences (bl2Seq) at NCBI Produces the local alignment of two given sequences using BLAST (Basic Local Alignment Search Tool) engine for local alignment BLAST Does not use an exact algorithm but a heuristic
20
Back to NCBI
21
BLAST – bl2seq
22
Bl2Seq - query blastn – nucleotide blastp – protein
23
Bl2seq results
24
Match Dissimilarity Gaps Similarity Low complexity
25
BLAST – programs Query:DNAProtein Database:DNAProtein
26
BLAST – Blastp
27
Blastp - results
28
Blastp – results (cont’)
29
Blast scores: Bits score – A score for the alignment according to the number of similarities, identities, etc. Bits score – A score for the alignment according to the number of similarities, identities, etc. Expected-score (E-value) –The number of alignments with the same score one can “expect” to see by chance when searching a random database of a particular size. The closer the e-value is to zero, the greater the confidence that the hit is really a homolog
30
Blastp – acquiring sequences
31
blastp – acquiring sequences
32
Fasta format – multiple sequences >gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens] MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVAN ALAHKYH >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH >gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens] MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLT SFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAI ALAHKYH >gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens] MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLT SLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVAS ALSSRYH >gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens] MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLT SLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVAS ALSSRYH
33
Searching for remote homologs Sometimes BLAST isn’t enough Large protein family, and BLAST only finds close members. We want more distant members PSI-BLAST
34
PSI-BLAST Position Specific Iterated BLAST Regular blast Construct profile from blast results Blast profile search Final results
35
PSI-BLAST Advantage: PSI-BLAST looks for seq’s that are close to the query, and learns from them to extend the circle of friends Disadvantage: if we obtained a WRONG hit, we will get to unrelated sequences (contamination). This gets worse and worse each iteration
36
PSI-BLAST Which one(s) of the following is/are correct? 1. PSI-BLAST is expected to give more hits than BLAST 2. PSI-BLAST is an iterative search method 3. PSI-BLAST is faster than BLAST 4. Each iteration of PSI-BLAST can only improve the results of the previous iteration
37
BLAST – PSI-Blast
38
PSI-Blast - results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.