MED260 Modeling Protein Function - October 11, 2006 1 Modeling Protein Function MED260 Philip E. Bourne Department of Pharmacology, UCSD

Slides:



Advertisements
Similar presentations
Molecular Biomedical Informatics Machine Learning and Bioinformatics Machine Learning & Bioinformatics 1.
Advertisements

PREDetector : Prokaryotic Regulatory Element Detector Samuel Hiard 1, Sébastien Rigali 2, Séverine Colson 2, Raphaël Marée 1 and Louis Wehenkel 1 1 Bioinformatics.
Research Methodology of Biotechnology: Protein-Protein Interactions Yao-Te Huang Aug 16, 2011.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
. Class 1: Introduction. The Tree of Life Source: Alberts et al.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Lecture 1 Rob Phillips California Institute of Technology (Block et al.) (Wuite et al.)
Protein Tertiary Structure Prediction Structural Bioinformatics.
MCB 7200: Molecular Biology
The Microbiome and Metagenomics
Proteomics Understanding Proteins in the Postgenomic Era.
Bioinformatics Jan Taylor. A bit about me Biochemistry and Molecular Biology Computer Science, Computational Biology Multivariate statistics Machine learning.
Genome of the week - Deinococcus radiodurans Highly resistant to DNA damage –Most radiation resistant organism known Multiple genetic elements –2 chromosomes,
The Science of Life Biology unifies much of natural science
Bioinformatics for biomedicine Protein domains and 3D structure Lecture 4, Per Kraulis
Protein Tertiary Structure Prediction
Pharm 202 Computer Aided Drug Design Phil Bourne -> Courses -> Pharm 202 Several slides are taken from UC Berkley.
BIOC3010: Bioinformatics - Revision lecture Dr. Andrew C.R. Martin
Development of Bioinformatics and its application on Biotechnology
Bioinformatics and it’s methods Prepared by: Petro Rogutskyi
Beyond the Human Genome Project Future goals and projects based on findings from the HGP.
Computer aided drug design
© Wiley Publishing All Rights Reserved. Protein and Specialized Sequence Databases.
Functional Associations of Protein in Entire Genomes Sequences Bioinformatics Center of Shanghai Institutes for Biological Sciences Bingding.
CS 790 – Bioinformatics Introduction and overview.
Gene Regulatory Network Inference. Progress in Disease Treatment  Personalized medicine is becoming more prevalent for several kinds of cancer treatment.
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
Finish up array applications Move on to proteomics Protein microarrays.
Introduction to Proteomics 1. What is Proteomics? Proteomics - A newly emerging field of life science research that uses High Throughput (HT) technologies.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
SELECTION OF NEW TARGET PROTEINS FOR DRUG DESIGN IN GENOME OF MYCOBACTERIUM TUBERCULOSIS Alexander V. Veselovsky V.N. Orechovich Institute of Biomedical.
Proteome and interactome Bioinformatics.
Chapter 21 Eukaryotic Genome Sequences
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
Outline Group Reading Quiz #2 on Thursday (covers week 5 & 6 readings Chromosome Territories Chromatin Organization –Histone H1 Mechanism of Transcription.
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Structural proteomics
NY Times Molecular Sciences Institute Started in 1996 by Dr. Syndey Brenner (2002 Nobel Prize winner). Opened in Berkeley in Roger Brent,
Bioinformatics MEDC601 Lecture by Brad Windle Ph# Office: Massey Cancer Center, Goodwin Labs Room 319 Web site for lecture:
Proteomics Session 1 Introduction. Some basic concepts in biology and biochemistry.
Decoding the Network Footprint of Diseases With increasing availability of data, there is significant activity directed towards correlating genomic, proteomic,
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Central dogma: the story of life RNA DNA Protein.
Retroviruses (Chap. 15, p.308) and Gene Regulation (Chap. 14) HIV (human immunodeficiency virus)
Structural proteomics Handouts. Proteomics section from book already assigned.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics Lecture to accompany BLAST/ORF finder activity
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
Proteomics, the next step What does each protein do? Where is each protein located? What does each protein interact with, if anything? What role does it.
X-ray crystallography – an overview (based on Bernie Brown’s talk, Dept. of Chemistry, WFU) Protein is crystallized (sometimes low-gravity atmosphere is.
Motif Search and RNA Structure Prediction Lesson 9.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
High throughput biology data management and data intensive computing drivers George Michaels.
1 Modelling and Simulation EMBL – Beyond Molecular Biology Physics Computational Biology Chemistry Medicine.
bacteria and eukaryotes
Today… Review a few items from last class
Introduction to Bioinformatics II
Prediction of protein function from sequence analysis
Homology Modeling.
Introduction to Bioinformatic
BIO307- Bioengineering principles SPRING 2019
(Really) Basic Molecular Biology
Introduction to Bioinformatics
Presentation transcript:

MED260 Modeling Protein Function - October 11, Modeling Protein Function MED260 Philip E. Bourne Department of Pharmacology, UCSD Slides on-line at:

MED260 Modeling Protein Function - October 11, Agenda Why model protein function? Where does it fit as a technique in modern medical research? The data deluge as a motivator The extent of what can be modeled Ontologies – establishing order from chaos Examples of what can be learnt Accuracy – a word of caution

MED260 Modeling Protein Function - October 11, Why Model Protein Function The rate of discovery of new proteins far outweighs our ability to functionally characterize them Functional discovery of new proteins has implications in: –Drug discovery –Biomarker identification –Understanding of biological processes –Identification of disease states and treatment regimes Why model protein function?

Cell Biology Anatomy Physiology ProteomicsGenomics Medicinal Chemistry Chemistry Organisms Organs Cells MacromoleculesBiopolymers Atoms & Molecules SCIENTIFIC RESEARCH & DISCOVERY REPRESENTATIVE DISCIPLINE EXAMPLE UNITS MRI Heart Neuron StructureSequence ProteaseInhibitor ElectronMicroscopy MigratorySensors VentricularModeling X-rayCrystallography ProteinDocking REPRESENTATIVE TECHNOLOGY Where does it fit as a technique in modern medical research?

Cell Biology Anatomy Physiology ProteomicsGenomics Medicinal Chemistry Chemistry Organisms Organs Cells MacromoleculesBiopolymers Atoms & Molecules SCIENTIFIC RESEARCH & DISCOVERY REPRESENTATIVE DISCIPLINE EXAMPLE UNITS MRI Heart Neuron StructureSequence ProteaseInhibitor ElectronMicroscopy MigratorySensors VentricularModeling X-rayCrystallography ProteinDocking REPRESENTATIVE TECHNOLOGY Translational Medicine Where does it fit as a technique in modern medical research?

MED260 Modeling Protein Function - October 11, The Ability to Model Protein Function Influences and can be Influenced by Any Level of Biological Complexity - Examples Genome - rapid increase in sequenced genomes provides new raw material Proteome – large increase in the number of 3D structures highlights new functions Interactome – identification of a binding partner points to a new function Metabolome – isolation of a protein within a metabolic pathway Cell - localization points to function Organ – gene expression in heart tissue points to function Organism – different physiology observed in species can be related to protein functions Where does it fit as a technique in modern medical research?

MED260 Modeling Protein Function - October 11, Cell Biology Anatomy Physiology ProteomicsGenomics Medicinal Chemistry Chemistry Organisms Organs Cells MacromoleculesBiopolymers Atoms & Molecules SCIENTIFIC RESEARCH & DISCOVERY REPRESENTATIVE DISCIPLINE EXAMPLE UNITS MRI Heart Neuron StructureSequence ProteaseInhibitor ElectronMicroscopy MigratorySensors VentricularModeling X-rayCrystallography ProteinDocking REPRESENTATIVE TECHNOLOGY We will focus here

At All Levels We Are Being Driven By Data Biological Experiment Data Information Knowledge Discovery Collect Characterize Compare Model Infer Sequence Structure Assembly Sub-cellular Cellular Organ Higher-life Year 9005 Computing Power Sequencing Technology Data Human Genome Project E.Coli Genome C.Elegans Genome 1 Small Genome/Mo. ESTs Yeast Genome Gene Chips Virus Structure Ribosome Model Metaboloic Pathway of E.coli Complexity Technology Brain Mapping Genetic Circuits Neuronal Modeling Cardiac Modeling Human Genome # People/Web Site Virtual Communities The Data Deluge

MED260 Modeling Protein Function - October 11, Metagenomics A First Look New type of genomics New data (and lots of it) and new types of data –17M new (predicted proteins!) 4-5 x growth in just few months and much more coming –New challenges and exacerbation of old challenges The Data Deluge

MED260 Modeling Protein Function - October 11, Metagenomics: First Results More then 99.5% of DNA in very environment studied represent unknown organisms –Culturable organisms are exceptions, not the rule Most genes represent distant homologs of known genes, but there are thousands of new families Everything we touch turns out to be a gold mine Environments studied: –Water (ocean, lakes) –Soil –Human body (gut, oral cavity, human microbiome) The Data Deluge

MED260 Modeling Protein Function - October 11, Metagenomics New Discoveries Environmental (red) vs. Currently Known PTPases (blue) Higher eukaryotes The Data Deluge

MED260 Modeling Protein Function - October 11, The Good News and the Bad News Good news –Data pointing towards function are growing at near exponential rates –IT can handle it on a per dollar basis Bad news –Data are growing at near exponential rates –Quality is highly variable –Accurate functional annotation is sparse The Data Deluge

MED260 Modeling Protein Function - October 11, Genomes We all know about the human – what is not so well known is: –191 completed microbial genomes –44 archaea –727 bacteria –785 eukaryotes (complete or in progress) –Viroids …. The Data Deluge

MED260 Modeling Protein Function - October 11, Proteome We are reasonably good at finding proteins in genomes with intergenic regions but not perfect – eg alternative initiation codons Regulatory elements provide a different set of challenges We are not so good at assigning functions to those proteins Moreover the devil is in the details The Extent of What Can Be Modeled

MED260 Modeling Protein Function - October 11, Estimated Functional Roles (by % of Proteins) of the Proteome in a Complex Organism The Extent of What Can Be Modeled

MED260 Modeling Protein Function - October 11, Functional Nomenclature Needs to be Consistent for Orderly Progress – Enter EC and GO EC classifies all enzymes - e/ e/ Gene Ontology Consortium characterizes by molecular function, biochemiscal process and cellular location Ontologies – establishing order from chaos

Functional Coverage of the Human Genome 40% covered The Extent of What Can Be Modeled

MED260 Modeling Protein Function - October 11, Step 1. Learn What You Can from the Protein Sequence Find it Pay attention to the quality of the functional annotation – errors are transitive Understand its 1-D structure – domain organization, {signatures, fingerprints} Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Step 2. Is there a 3D Structure? If so What Can You Learn from That? Find it Understand it Characterize it Understand its function(s) – these follow a power law at the fold level – some folds are promiscuous (many functions) others are solitary or of unknown function Examples of what can be learnt

(a) myoglobin (b) hemoglobin (c) lysozyme (d) transfer RNA (e) antibodies (f) viruses (g) actin (h) the nucleosome (i) myosin (j) ribosome Courtesy of David Goodsell, TSRI

MED260 Modeling Protein Function - October 11, First Why Bother with Structure? An Example: Protein Kinase A This “molecular scene” for cAMP dependant protein kinase depicts years of collective knowledge. Beyond basics, only the atomic coordinates are captured by the PDB. Functional annotation requires the literature Examples of what can be learnt

MED260 Modeling Protein Function - October 11, What Did that Picture Tell Us? Two domains with associated functions ATP binding & substrate binding Through conserved residues and their spatial location details of the ATP and substrate binding and mechanism of the phospho transfer reaction So is structure the answer to functional modeling? Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Question: So is structure the answer to functional modeling? Answer: Partly - The number of unique protein sequences still outnumbers the number of unique structures by 100:1 Enter Structural Genomics Enter Structure Prediction Examples of what can be learnt

MED260 Modeling Protein Function - October 11, The Structural Genomics Pipeline (X-ray Crystallography) Basic Steps Target Selection Crystallomics Isolation, Expression, Purification, Crystallization Data Collection Structure Solution Structure Refinement Functional Annotation Publish Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Structural Genomics Will Give Us.. Good news –More structures (definitely) –New folds (some but not as anticipated) –New understanding of specific diseases and pathways (maybe) –Representatives from each major protein family (maybe) Bad news –Many new structures that are functionally unclassified (definitely) Examples of what can be learnt

MED260 Modeling Protein Function - October 11, What About Structure Prediction? Current rule We will be able to predict a structure when we know all the structures Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Random 1000 structurally similar PDB polypeptide chains with z > 4.5 (% sequence identity vs alignment length) Twilight Zone Why is Structure Prediction so Hard? Midnight Zone Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Approaches to Structure Prediction Homology modeling Threading (aka fold recognition) Ab initio How well do we do? – see CASP Consensus servers –Eva - –LiveBench - Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Step 3. What Can Be Got from Structure When You Have it? From Structural Bioinformatics Ed Bourne and Weissig p394 Wiley 2002 Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Specific Example Mj0577 – putative ATP molecular switch Mj0577 is an open reading frame (ORF) of previously unknown function from Methanococcus jannaschii. Its structure was determined at 1.7Å (Figure 7a) (Zarembinski et al, 1998). The structure contains a bound ATP molecule, picked up from the E. coli host. The presence of bound ATP led to the proposition that Mj0577 is either an ATPase, or an ATP-binding molecular switch. Further experimental work showed that Mj0577 cannot hydrolyse ATP by itself, and can only do so in the presence of M. jannaschii crude cell extract. Therefore it is more likely to act as a molecular switch, in a process analogous to ras-GTP hydrolysis in the presence of GTPase activating protein. From Structural Bioinformatics Ed Bourne and Weissig p402 Wiley 2002 Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Step 4. Proteins Do Not Function in Isolation But are Part of Complex Interaction Networks Examples of what can be learnt

MED260 Modeling Protein Function - October 11, Accuracy - A Word of Caution Errors are transitive –Proteins A and B are observed to have similar functions through sequence homology –Proteins B and C are observed to have similar functions through sequence homology –Is protein A related to protein C? –Up to 30% of current annotation may be wrong Accuracy - A Word of Caution

MED260 Modeling Protein Function - October 11, Questions?

MED260 Modeling Protein Function - October 11, Demo of Steps 1-4 Step 1. Learn What You Can from the Protein Sequence Step 2. Is there a 3D Structure? If So, What Can You Learn from That? Step 3. What Can Be Got from Structure When You Have it? Step 4. Proteins Do Not Function in Isolation But are Part of Complex Interaction Networks