Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pairwise Sequence Alignment Exercise 2. || || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG.

Similar presentations


Presentation on theme: "Pairwise Sequence Alignment Exercise 2. || || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG."— Presentation transcript:

1 Pairwise Sequence Alignment Exercise 2

2 || || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG TGGAAGACTGTGGTGGTGAGGCCCTGGGCAGGTTTGTATGGAGGTTACAAGGCTGCTTAAG GAGGGAGGATGGAAGCTGGGCATGTGGAGACAGACCACCTCCTGGATTTATGACAGGAACT GATTGCTGTCTCCTGTGCTGCTTTCACCCCTCAGGCTGCTGGTCGTGTATCCCTGGACCCA GAGGTTCTTTGAAAGCTTTGGGGACTTGTCCACTCCTGCTGCTGTGTTCGCAAATGCTAAG GTAAAAGCCCATGGCAAGAAGGTGCTAACTTCCTTTGGTGAAGGTATGAATCACCTGGACA ACCTCAAGGGCACCTTTGCTAAACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGATCC TGAGAATTTCAAGGTGAGTCAATATTCTTCTTCTTCCTTCTTTCTATGGTCAAGCTCATGT CATGGGAAAAGGACATAAGAGTCAGTTTCCAGTTCTCAATAGAAAAAAAAATTCTGTTTGC ATCACTGTGGACTCCTTGGGACCATTCATTTCTTTCACCTGCTTTGCTTATAGTTATTGTT TCCTCTTTTTCCTTTTTCTCTTCTTCTTCATAAGTTTTTCTCTCTGTATTTTTTTAACACA ATCTTTTAATTTTGTGCCTTTAAATTATTTTTAAGCTTTCTTCTTTTAATTACTACTCGTT TCCTTTCATTTCTATACTTTCTATCTAATCTTCTCCTTTCAAGAGAAGGAGTGGTTCACTA CTACTTTGCTTGGGTGTAAAGAATAACAGCAATAGCTTAAATTCTGGCATAATGTGAATAG GGAGGACAATTTCTCATATAAGTTGAGGCTGATATTGGAGGATTTGCATTAGTAGTAGAGG TTACATCCAGTTACCGTCTTGCTCATAATTTGTGGGCACAACACAGGGCATATCTTGGAAC AAGGCTAGAATATTCTGAATGCAAACTGGGGACCTGTGTTAACTATGTTCATGCCTGTTGT CTCTTCCTCTTCAGCTCCTGGGCAATATGCTGGTGGTTGTGCTGGCTCGCCACTTTGGCAA GGAATTCGACTGGCACATGCACGCTTGTTTTCAGAAGGTGGTGGCTGGTGTGGCTAATGCC CTGGCTCACAAGTACCATTGA MVNLTSDEKTAVLALWNKVDVEDCGGEALGRLLVVYPWTQRFFE… Motivation

3 What is sequence alignment? Alignment: Comparing two (pairwise) or more (multiple) sequences. Searching for a series of identical or similar characters in the sequences. MVNLTSDEKTAVLALWNKVDVEDCGGE || || ||||| ||| || || || MVHLTPEEKTAVNALWGKVNVDAVGGE

4 Why sequence alignment? Predict characteristics of a protein – Premised on: similar sequence (or structure) similar function

5 Local vs. Global  Global alignment – finds the best alignment across the whole two sequences.  Local alignment – finds regions of high similarity in parts of the sequences.  Local alignment – finds regions of high similarity in parts of the sequences. ADLGAVFALCDRYFQ |||| |||| | ADLGRTQN-CDRYYQ ADLG CDRYFQ |||| |||| | ADLG CDRYYQ Global alignment: forces alignment in regions which differ Local alignment concentrates on regions of high similarity

6 Three types of changes: 1. Substitution – a replacement of one (or more) sequence letter by another: 2. Insertion - an insertion of a letter or several letters to the sequence: 3. Deletion - deleting a letter (or more) from the sequence: TA Evolutionary changes in sequences Insertion + Deletion  Indel AAGA AACA  AAG GAAA

7 Choosing an alignment:  Many different alignments are possible: AAGCTGAATTCGAA AGGCTCATTTCTGA A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA- Which alignment is better? AAGCTGAATT-C-GAA AGGCT-CATTTCTGA-

8 Exercise: compute both alignment scores  Match: +1  Mismatch: -2  Indel: -1 AAGCTGAATT-C-GAA AGGCT-CATTTCTGA- A-AGCTGAATTC--GAA AG-GCTCA-TTTCTGA-

9 Scoring systems: accounting for biological context Which is true about the scores in a pairwise alignment of nucleotide sequences? 1.Tr > Tv > 0 2.Tr < Tv < 0 3.0 > Tr > Tv 4.0 > Tv > Tr Tr = Transition Tv = Transversion

10 Scoring systems: accounting for biological context Which is true about the scores in a pairwise alignment of amino-acid sequences? 1.Asp->Asn > Asp->Glu 2.Arg->His > Ala->Phe 3.Arg->His Met

11 Substitutions matrices  Nucleic acids: Transition-transversion Transition-transversion  Amino acids: Evolutionary (empirical data) based: (PAM, BLOSUM) Evolutionary (empirical data) based: (PAM, BLOSUM) Physico-chemical properties based (Grantham, McLachlan) Physico-chemical properties based (Grantham, McLachlan)

12 PAM matrices  Family of matrices PAM 80, PAM 120, PAM 250  The number with a PAM matrix represents the evolutionary distance between the sequences on which the matrix is based  Greater numbers denote greater distances

13 PAM - limitations  Based on only one original dataset  Examines proteins with few differences (85% identity)  Based mainly on small globular proteins so the matrix is biased

14 BLOSUM matrices  Different BLOSUMn matrices are calculated independently from BLOCKS (ungapped local alignments)  BLOSUMn is based on a cluster of BLOCKS of sequences that share at least n percent identity  BLOSUM62 represents closer sequences than BLOSUM45

15 Substitution matrices exercise  Pick the best substitution matrix (PAM and BLOSUM) for each pairwise alignment: Human – chimp Human – chimp Human - yeast Human - yeast Human – fish Human – fish PAM options: PAM60PAM120PAM250 BLOSUM options: BLOSUM45BLOSUM62BLOSUM80

16 PAM Vs. BLOSUM PAM100 = BLOSUM90 PAM120 = BLOSUM80 PAM160 = BLOSUM60 PAM200 = BLOSUM52 PAM250 = BLOSUM45 More distant sequences BLOSUM62 for general use BLOSUM62 for general use BLOSUM80 for close relations BLOSUM80 for close relations BLOSUM45 for distant relations BLOSUM45 for distant relations PAM120 for general use PAM120 for general use PAM60 for close relations PAM60 for close relations PAM250 for distant relations PAM250 for distant relations

17 Gap penalty AAGCGAAATTCGAAC A-G-GAA-CTCGAAC AAGCGAAATTCGAAC AGG---AACTCGAAC Which alignment is more likely? Which alignment has a higher score?

18 Web servers for pairwise alignment

19 BLAST 2 sequences (bl2Seq) at NCBI Produces the local alignment of two given sequences using BLAST (Basic Local Alignment Search Tool) engine for local alignment BLAST  Does not use an exact algorithm but a heuristic

20 Back to NCBI

21 BLAST – bl2seq

22 Bl2Seq - query blastn – nucleotide blastp – protein

23 Bl2seq results

24 Match Dissimilarity Gaps Similarity Low complexity

25 BLAST – programs Query:DNAProtein Database:DNAProtein

26 BLAST – Blastp

27 Blastp - results

28 Blastp – results (cont’)

29 Blast scores:  Bits score – A score for the alignment according to the number of similarities, identities, etc.  Bits score – A score for the alignment according to the number of similarities, identities, etc.  Expected-score (E-value) –The number of alignments with the same score one can “expect” to see by chance when searching a random database of a particular size. The closer the e-value is to zero, the greater the confidence that the hit is really a homolog

30 Blastp – acquiring sequences

31 blastp – acquiring sequences

32 Fasta format – multiple sequences >gi|4504351|ref|NP_000510.1| delta globin [Homo sapiens] MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFESFGDLSSPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFSQLSELHCDKLHVDPENFRLLGNVLVCVLARNFGKEFTPQMQAAYQKVVAGVAN ALAHKYH >gi|4504349|ref|NP_000509.1| beta globin [Homo sapiens] MVHLTPEEKSAVTALWGKVNVDEVGGEALGRLLVVYPWTQRFFESFGDLSTPDAVMGNPKVKAHGKKVLG AFSDGLAHLDNLKGTFATLSELHCDKLHVDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVAN ALAHKYH >gi|4885393|ref|NP_005321.1| epsilon globin [Homo sapiens] MVHFTAEEKAAVTSLWSKMNVEEAGGEALGRLLVVYPWTQRFFDSFGNLSSPSAILGNPKVKAHGKKVLT SFGDAIKNMDNLKPAFAKLSELHCDKLHVDPENFKLLGNVMVIILATHFGKEFTPEVQAAWQKLVSAVAI ALAHKYH >gi|6715607|ref|NP_000175.1| G-gamma globin [Homo sapiens] MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLT SLGDAIKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTGVAS ALSSRYH >gi|28302131|ref|NP_000550.2| A-gamma globin [Homo sapiens] MGHFTEEDKATITSLWGKVNVEDAGGETLGRLLVVYPWTQRFFDSFGNLSSASAIMGNPKVKAHGKKVLT SLGDATKHLDDLKGTFAQLSELHCDKLHVDPENFKLLGNVLVTVLAIHFGKEFTPEVQASWQKMVTAVAS ALSSRYH

33 Searching for remote homologs  Sometimes BLAST isn’t enough  Large protein family, and BLAST only finds close members. We want more distant members  PSI-BLAST

34 PSI-BLAST  Position Specific Iterated BLAST Regular blast Construct profile from blast results Blast profile search Final results

35 PSI-BLAST  Advantage: PSI-BLAST looks for seq’s that are close to the query, and learns from them to extend the circle of friends  Disadvantage: if we obtained a WRONG hit, we will get to unrelated sequences (contamination). This gets worse and worse each iteration

36 PSI-BLAST Which one(s) of the following is/are correct? 1. PSI-BLAST is expected to give more hits than BLAST 2. PSI-BLAST is an iterative search method 3. PSI-BLAST is faster than BLAST 4. Each iteration of PSI-BLAST can only improve the results of the previous iteration

37 BLAST – PSI-Blast

38 PSI-Blast - results


Download ppt "Pairwise Sequence Alignment Exercise 2. || || ||||| ||| || || ||||||||||||||||||| MVHLTPEEKTAVNALWGKVNVDAVGGEALGRLLVVYPWTQRFFE… ATGGTGAACCTGACCTCTGACGAGAAGACTGCCGTCCTTGCCCTGTGGAACAAGGTGGACG."

Similar presentations


Ads by Google