Presentation is loading. Please wait.

Presentation is loading. Please wait.

Protein Sequence Alignments

Similar presentations


Presentation on theme: "Protein Sequence Alignments"— Presentation transcript:

1 Protein Sequence Alignments
Week 6

2 Learning Objectives Identify conservative and non-conservative amino acid substitutions Know the difference between percent identity and percent similarity. Understand the concept of homology and the difference between orthologs and paralogs. Identify protein domains in a BLASTp output 5) Use a substitution matrix to determine protein alignment scores

3 Working with Proteins Introduction to Proteins:
Amino Acid Sequence: primary structure Motifs and Domains—3D structure

4 Amino acids listed with abbreviations
Three-letter abbreviation One-letter abbreviation Alanine Ala A Arginine Arg R Asparagine Asn N Aspartic Acid Asp D Cysteine Cys C Glutamine Gln Q Glutamic Acid Glu E Glycine Gly G Histidine His H Isoleucine Ile I Leucine Leu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro P Serine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine Val V Asparagine or aspartic acid Asx B Glutamine or glutamic acid Glx Z

5 The side chain determines the properties of the amino acid
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

6 Hydrophilic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

7 Hydrophilic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

8 Hydrophilic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

9 Hydrophobic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

10 Hydrophobic amino acids
Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

11 Unique amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

12 An organism’s evolutionary history is documented in its genome
Figure 26.2 An unexpected family tree

13 Homologous genes found in different species are called orthologs
Ancestral gene Ancestral species Speciation with divergence of gene Orthologous genes Species A Species B (a) Orthologous genes

14 Homologous genes within the same species are called paralogs
Gene duplication and divergence Paralogous genes Species A after many generations (b) Paralogous genes

15 Two sequences can diverge over time
1 Deletion 2 Figure 26.8 Aligning segments of DNA Insertion

16 Two sequences can diverge over time
3 4 Figure 26.8 Aligning segments of DNA

17 How do we identify sequences that are related (homologous) from sequences that are similar due to chance (analogous)? An alignment of random sequences Figure 26.9 A molecular homoplasy

18 The sequence alignment score tells us the relatedness of two sequences
>25% shared identity means two proteins are highly related Highly related proteins are potential homologs Homologs are two proteins that share a common ancestor—they originated from the same sequence but have changed over time (evolved from one another) Homologs must share similar 3D structure and perform similar functions Homologs within the same species are called paralogs, while homologs within different species are called orthologs

19 Without insulin humans develop the disease diabetes
Beta cells of the pancreas secrete the hormone insulin into the blood Insulin enhances the transport of glucose into body cells and stimulates the liver to store glucose as glycogen

20 The structure of human Insulin
PDB.org

21 The primary structure of human insulin

22 We can use protein blast (blastp) to find homologs of human insulin

23

24

25

26

27 Conserved domain found within insulin

28

29

30

31 Odobenus rosmarus divergens is the Walrus
Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1])

32 Alignment score generated by blastp from human insulin and walrus insulin
Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1])

33 Protein BLAST score—based on length, identical residues, conservative substitutions, mismatches and gaps. % Identity: The extent to which two amino acid sequences are invariant (how many residues are exact matches) % Similar: pairs of amino acid residues that are structurally or functionally related—connected by + signs (percent similar or positive) = all identical and similar matches Conservative substitutions occur when amino acids with similar biochemical properties are substituted for one another

34 Scoring the alignment of two sequences
MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens

35 Percent identity uses match/mismatch
scoring MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens Matches score as +1 Mismatches score as -1 Add the score of each pair of residues

36 Percent identity uses match/mismatch
scoring MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens 22 matches - 6 mismatches Matches score as +1 Mismatches score as -1 Add the score of each pair of residues

37 % similarity scoring assigns specific values to every substitution

38 Percent similarity uses a substitution matrix
MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens

39 Non-conservative substitution
Biochemical Properties of M and T Hydrophilic with a polar side group Hydrophobic Different biochemical properties mean that such a substitution could disrupt protein function and therefore is counted as a negative substitution Non-conservative substitution

40 The BLOSUM62 Scoring Matrix

41 Percent similarity uses a substitution matrix
MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens

42 Conservative substitution
Biochemical Properties of H and R Hydrophilic amino acids Similar biochemical properties mean that such a substitution is unlikely to disrupt protein function and therefore is counted as a neutral substitution Conservative substitution

43 The BLOSUM62 Scoring Matrix

44 Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j

45 Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j The observed frequency is found by comparing known sequences—aligning these sequences and calculating the frequency of substitutions

46 Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j The observed frequency is found by comparing known sequences—aligning these sequences and calculating the frequency of substitutions This is done using different sequences and with different assumptions—leading to different scoring matrixes—We will use the BLOSUM62 matrix

47 Derivation of the substitution matrix
Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j pi is the frequency of i in the database pj is the frequency of j in the database pi pj is the probability of randomly pairing/aligning of i and j

48 Derivation of the substitution matrix
Sij=log (qij)/pipj The substitution matrix is not based on the biochemical properties of the amino acids but by how often substitutions among two amino acids are seen If a substitution between two amino acids is seen a lot, then it is likely to maintain the function of the protein If a substitution between two amino acids is a rare event, then it is likely to disrupt the function of the protein

49 Conclusions Amino acid substitutions can be conservative and non-conservative (biochemical definition vs. statistical definition) Percent identity only calculates matches while percent similarity includes conservative substitutions. Homologs are sequences that share common ancestry; orthologs are homologs found in different species and paralogs are sequences found in the same species. A substitution matrix is used to calculate protein alignment scores

50 Worksheet


Download ppt "Protein Sequence Alignments"

Similar presentations


Ads by Google