Download presentation
Presentation is loading. Please wait.
1
Protein Sequence Alignments
Week 6
2
Learning Objectives Identify conservative and non-conservative amino acid substitutions Know the difference between percent identity and percent similarity. Understand the concept of homology and the difference between orthologs and paralogs. Identify protein domains in a BLASTp output 5) Use a substitution matrix to determine protein alignment scores
3
Working with Proteins Introduction to Proteins:
Amino Acid Sequence: primary structure Motifs and Domains—3D structure
4
Amino acids listed with abbreviations
Three-letter abbreviation One-letter abbreviation Alanine Ala A Arginine Arg R Asparagine Asn N Aspartic Acid Asp D Cysteine Cys C Glutamine Gln Q Glutamic Acid Glu E Glycine Gly G Histidine His H Isoleucine Ile I Leucine Leu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro P Serine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine Val V Asparagine or aspartic acid Asx B Glutamine or glutamic acid Glx Z
5
The side chain determines the properties of the amino acid
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
6
Hydrophilic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
7
Hydrophilic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
8
Hydrophilic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
9
Hydrophobic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
10
Hydrophobic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
11
Unique amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).
12
An organism’s evolutionary history is documented in its genome
Figure 26.2 An unexpected family tree
13
Homologous genes found in different species are called orthologs
Ancestral gene Ancestral species Speciation with divergence of gene Orthologous genes Species A Species B (a) Orthologous genes
14
Homologous genes within the same species are called paralogs
Gene duplication and divergence Paralogous genes Species A after many generations (b) Paralogous genes
15
Two sequences can diverge over time
1 Deletion 2 Figure 26.8 Aligning segments of DNA Insertion
16
Two sequences can diverge over time
3 4 Figure 26.8 Aligning segments of DNA
17
How do we identify sequences that are related (homologous) from sequences that are similar due to chance (analogous)? An alignment of random sequences Figure 26.9 A molecular homoplasy
18
The sequence alignment score tells us the relatedness of two sequences
>25% shared identity means two proteins are highly related Highly related proteins are potential homologs Homologs are two proteins that share a common ancestor—they originated from the same sequence but have changed over time (evolved from one another) Homologs must share similar 3D structure and perform similar functions Homologs within the same species are called paralogs, while homologs within different species are called orthologs
19
Without insulin humans develop the disease diabetes
Beta cells of the pancreas secrete the hormone insulin into the blood Insulin enhances the transport of glucose into body cells and stimulates the liver to store glucose as glycogen
20
The structure of human Insulin
PDB.org
21
The primary structure of human insulin
22
We can use protein blast (blastp) to find homologs of human insulin
27
Conserved domain found within insulin
31
Odobenus rosmarus divergens is the Walrus
Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1])
32
Alignment score generated by blastp from human insulin and walrus insulin
Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1])
33
Protein BLAST score—based on length, identical residues, conservative substitutions, mismatches and gaps. % Identity: The extent to which two amino acid sequences are invariant (how many residues are exact matches) % Similar: pairs of amino acid residues that are structurally or functionally related—connected by + signs (percent similar or positive) = all identical and similar matches Conservative substitutions occur when amino acids with similar biochemical properties are substituted for one another
34
Scoring the alignment of two sequences
MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens
35
Percent identity uses match/mismatch
scoring MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens Matches score as +1 Mismatches score as -1 Add the score of each pair of residues
36
Percent identity uses match/mismatch
scoring MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens … 22 matches - 6 mismatches Matches score as +1 Mismatches score as -1 Add the score of each pair of residues
37
% similarity scoring assigns specific values to every substitution
38
Percent similarity uses a substitution matrix
MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens
39
Non-conservative substitution
Biochemical Properties of M and T Hydrophilic with a polar side group Hydrophobic Different biochemical properties mean that such a substitution could disrupt protein function and therefore is counted as a negative substitution Non-conservative substitution
40
The BLOSUM62 Scoring Matrix
41
Percent similarity uses a substitution matrix
MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens
42
Conservative substitution
Biochemical Properties of H and R Hydrophilic amino acids Similar biochemical properties mean that such a substitution is unlikely to disrupt protein function and therefore is counted as a neutral substitution Conservative substitution
43
The BLOSUM62 Scoring Matrix
44
Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j
45
Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j The observed frequency is found by comparing known sequences—aligning these sequences and calculating the frequency of substitutions
46
Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j The observed frequency is found by comparing known sequences—aligning these sequences and calculating the frequency of substitutions This is done using different sequences and with different assumptions—leading to different scoring matrixes—We will use the BLOSUM62 matrix
47
Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j pi is the frequency of i in the database pj is the frequency of j in the database pi pj is the probability of randomly pairing/aligning of i and j
48
Derivation of the substitution matrix
Sij=log (qij)/pipj The substitution matrix is not based on the biochemical properties of the amino acids but by how often substitutions among two amino acids are seen If a substitution between two amino acids is seen a lot, then it is likely to maintain the function of the protein If a substitution between two amino acids is a rare event, then it is likely to disrupt the function of the protein
49
Conclusions Amino acid substitutions can be conservative and non-conservative (biochemical definition vs. statistical definition) Percent identity only calculates matches while percent similarity includes conservative substitutions. Homologs are sequences that share common ancestry; orthologs are homologs found in different species and paralogs are sequences found in the same species. A substitution matrix is used to calculate protein alignment scores
50
Worksheet
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.