Introduction to Protein Translation, Databases and Structural Alignment BMI 730 Victor Jin Department of Biomedical Informatics Ohio State University.

Slides:



Advertisements
Similar presentations
Gene Structure, Transcription, & Translation
Advertisements

Regulation of Protein Translation
Central Dogma Big Idea 3: Living systems store, retrieve, transmit, and respond to info essential to life processes.
Basics of Molecular Biology
Archives and Information Retrieval
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
The Molecular Genetics of Gene Expression
Gene Activity: How Genes Work
The Cell, Central Dogma and Human Genome Project.
The Protein Data Bank (PDB)
BMI 731 Protein Structures and Related Database Searches.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Translation and Transcription
Protein Structures.
Gene expression.
Protein synthesis decodes the information in messenger RNA
Colinearity of Gene and Protein DNA RNA protein genotype function organism phenotype DNA sequence amino acid sequence transcription translation.
RNA (Ribonucleic acid)
Protein Sequence Analysis - Overview Raja Mazumder Senior Protein Scientist, PIR Assistant Professor, Department of Biochemistry and Molecular Biology.
IBGP/BMI 705 Lab 4: Protein structure and alignment TA: L. Cooper.
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Essentials of the Living World Second Edition George B. Johnson Jonathan B. Losos Chapter 13 How Genes Work Copyright © The McGraw-Hill Companies, Inc.
Protein Tertiary Structure Prediction
Biology 10.1 How Proteins are Made:
Quiz tiiiiime What 3 things make up a nucleotide?
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Chapter 17 From Gene to Protein.
Translation Protein Biosynthesis. Central Dogma DNA RNA protein transcription translation.
Chapter 17 From Gene to Protein
DNA Function: Information Transmission. ● DNA is called the “code of life.” What does it code for? *the information (“code”) to make proteins!
Protein Structure Comparison. Sequence versus Structure The protein sequence is a string of letters: there is an optimal solution (DP) to the problem.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
LECT 20: PROTEIN SYNTHESIS AND TRANSLATIONAL CONTROL High fidelity of protein synthesis from mRNA is essential. Mechanisms controling translation accuracy.
Protein Synthesis Athena, Jen, Natalie. DNA versus RNA DNARNA Contains a 5-C sugar 5-C sugar is deoxyribose5-C sugar is ribose Each nucleotide has 1 of.
Gene expression. The information encoded in a gene is converted into a protein  The genetic information is made available to the cell Phases of gene.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
Transcription & TranslationNovember , 2012 W ARM U P … What are the differences between DNA & RNA?
Lecture 08 - Translation Based on Chapter 6 Gene Expression: Translation Copyright © 2010 Pearson Education Inc. What is the chemical composition of a.
RNA and Gene Expression BIO 224 Intro to Molecular and Cell Biology.
Lecture 11 CS5661 Structural Bioinformatics – Structure Comparison Motivation Concepts Structure Comparison.
EMBL-EBI Eugene Krissinel SSM - MSDfold. EMBL-EBI MSDfold (SSM)
7.3 Translation Image from pics/trans_bd.gif Essential Idea: Information transferred.
Protein Synthesis RNA, Transcription, and Translation.
From Gene to Protein Transcription and Translation.
Transcription and Translation
Lesson 4- Gene Expression PART 2 - TRANSLATION. Warm-Up Name 10 differences between DNA replication and transcription.
From Gene to Protein Chapter 17. Overview of Transcription & Translation.
Gene Expression : Transcription and Translation 3.4 & 7.3.
1 RNA ( Ribonucleic acid ) Structure: Similar to that of DNA except: 1- it is single stranded polyunucleotide chain. 2- Sugar is ribose 3- Uracil is instead.
Protein Synthesis. Central Dogma Transcription - mRNA Genetic information is first transcribed into an RNA molecule. This intermediary RNA molecule is.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches.
The flow of genetic information:
Protein Structure Comparison
7.3 Translation udent_view0/chapter3/animation__how_translation_work s.html.
Gene Expression: From Gene to Protein
Lesson starter Name the four bases found in DNA
Translation 2.7 & 7.3.
Concept 17.3: Eukaryotic cells modify RNA after transcription
Gene Expression: From Gene to Protein
Protein Structures.
DNA Replication How to make a functional protein Transcription
TRANSLATION AHL Topic 7.3 IB Biology Miss Werba
Gene Expression: From Gene to Protein
7.3 Translation Essential idea: Information transferred from DNA to
From DNA to Protein Class 4 02/11/04 RBIO-0002-U1.
7.3 Translation Understanding:
7.3 Translation Understanding:
Relationship between Genotype and Phenotype
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

Introduction to Protein Translation, Databases and Structural Alignment BMI 730 Victor Jin Department of Biomedical Informatics Ohio State University

Review of Protein Function and Translation Database and Software 3-D Alignment

Review of Protein Function and Translation Database and Software 3-D Alignment

Protein function Proteins are basic building blocks for every cellular structure from smallest membrane-bound receptor to largest organelle. Proteins are involved in all processes inside a cell. a) Gene regulation b) Metabolism c) Signalling d) Development e) Structure

Proteins serve crucial roles in a cell  Catalysis: Almost all chemical reactions in a living cell are catalyzed by protein enzymes.  Transport: Some proteins transports various substances, such as oxygen, ions, and so on.  Information transfer: For example, hormones. Alcohol dehydrogenase oxidizes alcohols to aldehydes or ketones Haemoglobin carries oxygen Insulin controls the amount of sugar in the blood

Translation of mRNA is highly regulated in multi-cellular eukaryotic organisms, whereas in prokaryotes regulation occurs mainly at the level of transcription. There is global regulation of protein synthesis.  E.g., protein synthesis may be regulated in relation to the cell cycle or in response to cellular stresses such as starvation or accumulation of unfolded proteins in the endoplasmic reticulum.  Mechanisms include regulation by signal-activated phosphorylation or dephosphorylation of initiation and elongation factors. Eukaryotic Translation

Translation of particular mRNAs may be inhibited by small single-stranded microRNA molecules about nucleotides long. MicroRNAs bind via base-pairing to 3' un-translated regions of mRNA along with a protein complex RISC (RNA-induced silencing complex), inhibiting translation and in some cases promoting mRNA degradation.  Tissue-specific expression of particular genome-encoded microRNAs is an essential regulatory mechanism controlling embryonic development.  Some forms of cancer are associated with altered expression of microRNAs that regulate synthesis of proteins relevant to cell cycle progression or apoptosis. microRNA

Protein factors that mediate & control translation are more numerous in eukaryotes than in prokaryotes. Eukaryotic factors are designated with the prefix "e".  Some factors are highly conserved across kingdoms. E.g., the eukaryotic elongation factor eEF1A is structurally and functionally similar to the prokaryotic EF-TU (EF1A).  In contrast, eEF1B, the eukaryotic equivalent of the GEF EF-Ts, is relatively complex, having multiple subunits subject to regulatory phosphorylation. Protein factors

 Initiation of protein synthesis is much more complex in eukaryotes, & requires a large number of protein factors.  Some eukaryotic initiation factors (e.g., eIF3 & eIF4G) serve as scaffolds, with multiple domains that bind other proteins during assembly of large initiation complexes. Initiation

Usually a pre-initiation complex forms, including:  several initiation factors  the small ribosomal subunit  the loaded initiator tRNA, Met-tRNA i Met. This then binds to a separate complex that includes:  mRNA  initiation factors including ones that interact with the 5' methylguanosine cap & the 3' poly-A tail, structures unique to eukaryotic mRNA.  Within this complex mRNA is thought to circularize via interactions between factors that associate with the 5' cap & with a poly-A binding protein. pre-initiation complex

 After the initiation complex assembles, it translocates along the mRNA in a process called scanning, until the initiation codon is reached.  Scanning is facilitated by eukaryotic initiation factor eIF4A, which functions as an ATP-dependent helicase to unwind mRNA secondary structure while releasing bound proteins.  A short sequence of bases adjacent to the AUG initiation codon may aid in recognition of the start site.  After the initiation codon is recognized, there is hydrolysis of GTP and release of initiation factors, as the large ribosomal subunit joins the complex and elongation commences. Translocation

Protein Translation Demo

Review of Protein Function and Translation Database and Software 3-D Alignment

Protein Databases UniProt is the universal protein database, a central repository of protein data created by combining Swiss-Prot, TrEMBL and PIR. This makes it the world's most comprehensive resource on protein information.proteinSwiss-ProtTrEMBLPIR The Protein Information Resource (PIR), located at Georgetown University Medical Center (GUMC), is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies.Georgetown Universitygenomicproteomic Swiss-Prot is a curated biological database of protein sequences from different species created in 1986 by Amos Bairoch during his PhD and developed by the Swiss Institute of Bioinformatics and the European Bioinformatics Institute.biological databaseproteinSwiss Institute of BioinformaticsEuropean Bioinformatics Institute Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains and families. PDB NCBI

PubMed – Protein Databases The Protein database contains sequence data from the translated coding regions from DNA sequences in GenBank, EMBL, and DDBJ as well as protein sequences submitted to Protein Information Resource (PIR), SWISS-PROT, Protein Research Foundation (PRF), and Protein Data Bank (PDB) (sequences from solved structures). The Structure database or Molecular Modeling Database (MMDB) contains experimental data from crystallographic and NMR structure determinations. The data for MMDB are obtained from the Protein Data Bank (PDB). The NCBI has cross-linked structural data to bibliographic information, to the sequence databases, and to the NCBI taxonomy. Use Cn3D, the NCBI 3D structure viewer, for easy interactive visualization of molecular structures from Entrez.Cn3D Tutorial:

Example – UniProt - Expasy

Example – PDB Only proteins with known structures are included.

Example – PDB

Protein Visualization Softwares Cn3d RasMol TOPS Chime DSSP Molscript Ribbons MSMS Surfnet …

Cn3d

Review of Protein Function and Translation Database and Software 3-D Alignment

Why Align Structures 1.For homologous proteins (similar ancestry), this provides the “gold standard” for sequence alignment – elucidates the common ancestry of the proteins. 2.For nonhomologous proteins, allows us to identify common substructures of interest. 3.Allows us to classify proteins into clusters, based on structural similarity.

Example of Structural Homologs Sequence alignment SLSAAEADLAGKSWAPVFANKNANGLDFLVALFEKFPDSANFFADFK-GKSVADIKA-S VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHG PKLRDVSSRIFTRLNEFVNNAANAGKMSAMLSQFAKEHVGFGVGSAQFENVRSMFPGFVA KKVADALTNAVAHVDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTP Structural alignment XSLSAAEADLAGKSW-APVFANKN-ANGLDFLVALFEKFPDSANFF-ADFKGKSVA—-DIK V-LSPADKTNVKAAWGK-VGAHA-GEYGAEALERMFLSFPTTKTYFPHF DLS-H ASPKLRDVSSRIFTRLNEFVNNAANAGKMSA-MLSQ-FAKEHV-GFGVGSAQFENVRSM-F GSAQVKGHGKKVADALTNAVAHV-D—-DMPNAL—-SALSDLHAHKLRVDPVNFKLLS-HCL PGFVA LVTLAAHLPAEFTP

The existence of large numbers of remote homologs shows us that true structural similarity is hard to see in the primary amino acid sequence Structural conservation is stronger than sequence conservation Sequence/Structure Homology

Remote Homology Remote homologs sometimes conserve function (all SH3-like domains bind peptides), and often conserve active site locations (TIM barrels active sites are at the ends of the barrels). Remote homologs probably are evolutionarily related and fold using the same folding pathway.

Example of Structural Homologs 4DFR: Dihydrofolate reductase 1YAC: Octameric Hydrolase of Unknown Specificity 5.9% sequence identity (best alignment) 1YAC structure solved without knowing function. Alignment to 4DFR and others implies it is a hydrolase of some sort.

Example of Structural Homologs DHFR:yellow & orange YAC:green & purple Sheets only Helices only

Sander-Schneider Relationship - “Naturally occurring sequences with more than 25% sequence identity over 80 or more residues always adopt the same basic structure”. - It only applies to naturally occurring proteins of known structure seen so far except for a few exceptions. - It is the basis of comparative modeling. Guaranteed structural similarity given by the relationship is a means to predict structure.

How to Align Structures 1. Visual inspection (by eye) 2.Computational approach Point-based methods using point distances and other properties to establish correspondences Secondary structure-based methods use vectors representing secondary structures to establish correspondences.

Global versus Local Global alignment

Local Alignment motif

Structural Alignment Algorithms Alignment algorithms create a one-to-one mapping of subset(s) of one sequence to subset(s) of another sequence. Structure-based alignment algorithms do this by minimizing the structure difference score or root- mean-square difference (rmsd) in alpha-carbon positions. The Problem Is: we don’t know the alignment. Structure-based alignment programs determine the alignment that minimizes the rmsd.

Evaluating Structural Alignments # of aligned residues Percent identity in aligned residues # of gaps Size of two proteins Conservation of known active site environments RMSD (root mean square deviation) of corresponding residues Dihedral angle difference … No universal criterion Application dependent

Least Squares Superposition Problem: find the rotation matrix, R and a vector, v, that minimize the following quantity: Where x i are the coordinates from one molecule and y i are the equivalent* coordinates from another molecule. *equivalent based on alignment

Comparing dihedral angles Torsion angles (  ) are: - local by nature (error propagation) - invariant upon rotation and translation of the molecule - compact (O(n) angles for a protein of n residues) Add 1 degree To all 

Structural Alignments Methods STRUCTAL [Levitt, Subbiah, Gerstein] Using dynamic programming with a distance metric DALI [Holm, Sander] Analysis of distance maps LOCK [Singh, Brutlag] Analysis of secondary structure vectors, followed by refinement with distances SSAP [Orengo and Taylor, 1989] VAST [Gibrat et al., 1996] CE [Shindyalov and Bourne, 1998] SSM [Krissinel and Henrik, 2004] …

Two Subproblems Find correspondence set Find alignment transform (protein superposition problem) Chicken-and-egg

DALI (Distance ALIgnment) DALI has been used to do an ALL vs. ALL comparison of proteins in the PDB, and to create a hierarchical clustering of families. FSSP = fold classification based on structure- structure alignment of proteins

VAST (Vector Alignment Search Tool) It places great emphasis on the definition of the threshold of significant structural similarity to avoid (many) similarities of small substructures that occur by chance in protein structure comparison. At the heart of VAST's significance calculation is definition of the "unit" of tertiary structure similarity as pairs of secondary structure elements (SSE's) that have similar type, relative orientation, and connectivity. In comparing two protein domains the most surprising substructure similarity is that where the sum of superposition scores across these "units" is greatest. abs.html#Ref_6

Exercises Look up Human Catalase in Find out: How long is the protein chain? Where is its active site? Is its 3D structure available? If so, how was it obtained? How long is its longest helix chain and where is it located? Look up PDB ID 1DGB in PDB. Find out: What protein is it? What is the resolution of its x-ray structure? Visualize its structure using the tools provided on PDB website (try them all). Look up PDB ID 1DGB in MMDB (PubMed Structure Database). Find out: What is its MMDB ID? Visualize its 3D structure using Cn3D. Export the images for different rendering effects (e.g., worm, spacefill). Search its structure neighbors using VAST. How many neighbors are found for the entire chain? Perform a VAST search for 2CZU chain A. View its alignment (in sequence) with 1X8P chain A, 1GKA chain B, and 1BJ7. Compare the structure alignment results with sequence alignment results (using ClustalW). View its alignment with 1X8P chain A in Cn3D.