Protein Sequence Alignments

Slides:



Advertisements
Similar presentations
Proteins: Structure reflects function….. Fig. 5-UN1 Amino group Carboxyl group carbon.
Advertisements

Review.
A Ala Alanine Alanine is a small, hydrophobic
Fundamentals of Protein Structure August, 2006 Tokyo University of Science Tadashi Ando.
5’ C 3’ OH (free) 1’ C 5’ PO4 (free) DNA is a linear polymer of nucleotide subunits joined together by phosphodiester bonds - covalent bonds between.
Amino Acids, Peptides, Protein Primary Structure
Introduction to Bioinformatics Algorithms Sequence Alignment.
Molecular Techniques in Molecular Systematics. DNA-DNA hybridisation -Measures the degree of genetic similarity between pools of DNA sequences. -Normally.
Introduction to bioinformatics
©CMBI 2001 A Ala Alanine Alanine is a small, hydrophobic residue. Its side chain, R, is just a methyl group. Alanine likes to sit in an alpha helix,it.
Introduction to Bioinformatics Algorithms Sequence Alignment.
You Must Know How the sequence and subcomponents of proteins determine their properties. The cellular functions of proteins. (Brief – we will come back.
Chapter 27 Amino Acids, Peptides, and Proteins. Nucleic Acids.
Proteins and Enzymes Nestor T. Hilvano, M.D., M.P.H. (Images Copyright Discover Biology, 5 th ed., Singh-Cundy and Cain, Textbook, 2012.)
Proteins account for more than 50% of the dry mass of most cells
How does DNA work? What is a gene?
Protein Synthesis. DNA RNA Proteins (Transcription) (Translation) DNA (genetic information stored in genes) RNA (working copies of genes) Proteins (functional.
Proteins account for more than 50% of the dry mass of most cells
©CMBI 2006 Amino Acids “ When you understand the amino acids, you understand everything ”
. Sequence Alignment. Sequences Much of bioinformatics involves sequences u DNA sequences u RNA sequences u Protein sequences We can think of these sequences.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
LESSON 4: Using Bioinformatics to Analyze Protein Sequences PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
AMINO ACIDS.
WSSP Chapter 8 BLASTX Translated DNA vs Protein searches atttaccgtg ttggattgaa attatcttgc atgagccagc tgatgagtat gatacagttt tccgtattaa taacgaacgg ccggaaatag.
Learning Targets “I Can...” -State how many nucleotides make up a codon. -Use a codon chart to find the corresponding amino acid.
Welcome Back! February 27, 2012 Sit in any seat for today. You will have assigned seats tomorrow Were you absent before the break? Plan on coming to tutorial.
intro-VIRUSES Virus NamePDB ID HUMAN PAPILLOMAVIRUS 161DZL BACTERIOPHAGE GA1GAV L-A virus1M1C SATELLITE PANICUM MOSAIC VIRUS1STM SATELLITE TOBACCO NECROSIS2BUK.
Amino Acids ©CMBI 2001 “ When you understand the amino acids, you understand everything ”
Proteins.
Chapter 3 Proteins.
Genomics Lecture 3 By Ms. Shumaila Azam. Proteins Proteins: large molecules composed of one or more chains of amino acids, polypeptides. Proteins are.
Proteins Tertiary Protein Structure of Enzyme Lactasevideo Video 2.
Arginine, who are you? Why so important?. Release 2015_01 of 07-Jan-15 of UniProtKB/Swiss-Prot contains sequence entries, comprising
Amino acids Proof. Dr. Abdulhussien Aljebory College of pharmacy
Sequence similarity, BLAST alignments & multiple sequence alignments
Amino acids.
Translation PROTEIN SYNTHESIS.
Protein Folding Notes.
Whole process Step by step- from chromosomes to proteins.
Proteins account for more than 50% of the dry mass of most cells
Chpt. 5 The Structure and Function of Macromolecules
Proteins.
Proteins Proteins are long polymers made up of 20 different amino acid monomers They are quite large, with molar masses of around 5,000 g/mol to around.
Sequence Alignment.
BIOLOGY 12 Protein Synthesis.
Protein Alignments: Clues to Protein Function
Proteins.
Transport proteins Transport protein Cell membrane
Concept 5.3: Lipids are a diverse group of hydrophobic molecules
THE PRIMARY STRUCTURES OF PROTEINS
Sequence Alignment ..
Proteins account for more than 50% of the dry mass of most cells
Chemistry 121 Winter 2016 Introduction to Organic Chemistry and Biochemistry Instructor Dr. Upali Siriwardane (Ph.D. Ohio State)
The forces at work on proteins/ glutamic acid and valine
The Interface of Biology and Chemistry
Chapter 3 Proteins.
Fig. 5-UN1  carbon Amino group Carboxyl group.
A Ala Alanine Alanine is a small, hydrophobic
Fundamentals of Protein Structure
The Structure and Function of Macromolecules
South African amaXhosa patients with atopic dermatitis have decreased levels of filaggrin breakdown products but no loss-of-function mutations in filaggrin 
Proteins account for more than 50% of the dry mass of most cells
Proteins Genetic information in DNA codes specifically for the production of proteins Cells have thousands of different proteins, each with a specific.
The 20 amino acids.
Translation.
The 20 amino acids.
Chapter 18 Naturally Occurring Nitrogen-Containing Compounds
Example of regression by RBF-ANN
Proteins Proteins have many structures, resulting in a wide range of functions Proteins do most of the work in cells and act as enzymes 2. Proteins are.
“When you understand the amino acids,
Presentation transcript:

Protein Sequence Alignments Week 6

Learning Objectives Identify conservative and non-conservative amino acid substitutions Know the difference between percent identity and percent similarity. Understand the concept of homology and the difference between orthologs and paralogs. Identify protein domains in a BLASTp output 5) Use a substitution matrix to determine protein alignment scores

Working with Proteins Introduction to Proteins: Amino Acid Sequence: primary structure Motifs and Domains—3D structure

Amino acids listed with abbreviations Three-letter abbreviation One-letter abbreviation Alanine Ala A Arginine Arg R Asparagine Asn N Aspartic Acid Asp D Cysteine Cys C Glutamine Gln Q Glutamic Acid Glu E Glycine Gly G Histidine His H Isoleucine Ile I Leucine Leu L Lysine Lys K Methionine Met M Phenylalanine Phe F Proline Pro P Serine Ser S Threonine Thr T Tryptophan Trp W Tyrosine Tyr Y Valine Val V Asparagine or aspartic acid Asx B Glutamine or glutamic acid Glx Z

The side chain determines the properties of the amino acid Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Hydrophilic amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Hydrophilic amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Hydrophilic amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Hydrophobic amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Hydrophobic amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

Unique amino acids Lodish, H. et al. Molecular Cell Biology (New York; W.H. Freeman, 2000).

An organism’s evolutionary history is documented in its genome Figure 26.2 An unexpected family tree

Homologous genes found in different species are called orthologs Ancestral gene Ancestral species Speciation with divergence of gene Orthologous genes Species A Species B (a) Orthologous genes

Homologous genes within the same species are called paralogs Gene duplication and divergence Paralogous genes Species A after many generations (b) Paralogous genes

Two sequences can diverge over time 1 Deletion 2 Figure 26.8 Aligning segments of DNA Insertion

Two sequences can diverge over time 3 4 Figure 26.8 Aligning segments of DNA

How do we identify sequences that are related (homologous) from sequences that are similar due to chance (analogous)? An alignment of random sequences Figure 26.9 A molecular homoplasy

The sequence alignment score tells us the relatedness of two sequences >25% shared identity means two proteins are highly related Highly related proteins are potential homologs Homologs are two proteins that share a common ancestor—they originated from the same sequence but have changed over time (evolved from one another) Homologs must share similar 3D structure and perform similar functions Homologs within the same species are called paralogs, while homologs within different species are called orthologs

Without insulin humans develop the disease diabetes Beta cells of the pancreas secrete the hormone insulin into the blood Insulin enhances the transport of glucose into body cells and stimulates the liver to store glucose as glycogen

The structure of human Insulin PDB.org

The primary structure of human insulin

We can use protein blast (blastp) to find homologs of human insulin

Conserved domain found within insulin

Odobenus rosmarus divergens is the Walrus Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1])

Alignment score generated by blastp from human insulin and walrus insulin Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1])

Protein BLAST score—based on length, identical residues, conservative substitutions, mismatches and gaps. % Identity: The extent to which two amino acid sequences are invariant (how many residues are exact matches) % Similar: pairs of amino acid residues that are structurally or functionally related—connected by + signs (percent similar or positive) = all identical and similar matches Conservative substitutions occur when amino acids with similar biochemical properties are substituted for one another

Scoring the alignment of two sequences MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens

Percent identity uses match/mismatch scoring MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens Matches score as +1 Mismatches score as -1 Add the score of each pair of residues

Percent identity uses match/mismatch scoring MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens +1+1+1+1-1-1… 22 matches - 6 mismatches Matches score as +1 Mismatches score as -1 Add the score of each pair of residues

% similarity scoring assigns specific values to every substitution

Percent similarity uses a substitution matrix MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens

Non-conservative substitution Biochemical Properties of M and T Hydrophilic with a polar side group Hydrophobic Different biochemical properties mean that such a substitution could disrupt protein function and therefore is counted as a negative substitution Non-conservative substitution

The BLOSUM62 Scoring Matrix

Percent similarity uses a substitution matrix MALWTHLLPLLALLALWAPAPSRAFVNQ Captain Budd Christman, NOAA Corps - NOAA's Ark - Animals Collection Image ID: anim0022 ([1]) MALWMRLLPLLALLALWGPDPAAAFVNQ Homo sapiens

Conservative substitution Biochemical Properties of H and R Hydrophilic amino acids Similar biochemical properties mean that such a substitution is unlikely to disrupt protein function and therefore is counted as a neutral substitution Conservative substitution

The BLOSUM62 Scoring Matrix

Derivation of the substitution matrix Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j

Derivation of the substitution matrix Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j The observed frequency is found by comparing known sequences—aligning these sequences and calculating the frequency of substitutions

Derivation of the substitution matrix Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j The observed frequency is found by comparing known sequences—aligning these sequences and calculating the frequency of substitutions This is done using different sequences and with different assumptions—leading to different scoring matrixes—We will use the BLOSUM62 matrix

Derivation of the substitution matrix Sij=log (qij)/pipj Sij is the Score in the substitution matrix of amino acid i being substituted for j qij is the observed frequency of the substitution of amino acid i with j pi is the frequency of i in the database pj is the frequency of j in the database pi pj is the probability of randomly pairing/aligning of i and j

Derivation of the substitution matrix Sij=log (qij)/pipj The substitution matrix is not based on the biochemical properties of the amino acids but by how often substitutions among two amino acids are seen If a substitution between two amino acids is seen a lot, then it is likely to maintain the function of the protein If a substitution between two amino acids is a rare event, then it is likely to disrupt the function of the protein

Conclusions Amino acid substitutions can be conservative and non-conservative (biochemical definition vs. statistical definition) Percent identity only calculates matches while percent similarity includes conservative substitutions. Homologs are sequences that share common ancestry; orthologs are homologs found in different species and paralogs are sequences found in the same species. A substitution matrix is used to calculate protein alignment scores

Worksheet