The RCSB Protein Data Bank Teaching an Old Dog New Tricks

Slides:



Advertisements
Similar presentations
Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
Advertisements

MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
1.
The National Center for Biotechnology Information (NCBI) a primary resource for molecular biology information Database Resources.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
The design, construction and use of software tools to generate, store, annotate, access and analyse data and information relating to Molecular Biology.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Archives and Information Retrieval
Protein structure (Part 2 of 2).
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
The Protein Data Bank (PDB)
We are developing a web database for plant comparative genomics, named Phytome, that, when complete, will integrate organismal phylogenies, genetic maps.
BLOSUM Information Resources Algorithms in Computational Biology Spring 2006 Created by Itai Sharon.
ExPASy - Expert Protein Analysis System The bioinformatics resource portal and other resources An Overview.
Introductory Overview
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Cédric Notredame (30/08/2015) Chemoinformatics And Bioinformatics Cédric Notredame Molecular Biology Bioinformatics Chemoinformatics Chemistry.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Development of Bioinformatics and its application on Biotechnology
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Network Services for Biologists in the Genome Era The Work of the European Bioinformatics Institute.
Information Resources for Bioinformatics 1 MARC: Developing Bioinformatics Programs July, 2008 Alex Ropelewski Hugh Nicholas
Grant Number: IIS Institution of PI: Arizona State University PIs: Zoé Lacroix Title: Collaborative Research: Semantic Map of Biological Data.
23 rd August 2005CCP4-RCSB Workshop IUCr 2005 Florence Italy 1 N6: A Protein Crystallographic Toolbox: The CCP4 Software Suite and RCSB PDB Deposition.
NCBI’s Bioinformatics Resources Michele R. Tennant, Ph.D., M.L.I.S. Health Science Center Libraries U.F. Genetics Institute January 2015.
GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
Helen M. Berman, Rutgers University EMBO Practical Course Section: Searching Structure Databases September 26, 2008 PSI Structural Genomics Knowledgebase.
The Promise of Open Access Philip E. Bourne PhD University of California San Diego Open Access Day October 14, 2008
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Organizing information in the post-genomic era The rise of bioinformatics.
I am not a PDBid I am a Biological Macromolecule Philip E. Bourne University of California San Diego
Biological Databases Biology outside the lab. Why do we need Bioinfomatics? Over the past few decades, major advances in the field of molecular biology,
REMINDERS 2 nd Exam on Nov.17 Coverage: Central Dogma of DNA Replication Transcription Translation Cell structure and function Recombinant DNA technology.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Towards Data Attribution & Citation in the Life Sciences Philip E. Bourne UCSD 8/22/11Data Attribution and Citation.
Philip E. Bourne Professional Development Lecture 7 Understanding and Working the Publishing Process.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Data Integration and Management A PDB Perspective.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Labeling and Enhancing Life Science Links S. Heymann*, F. Naumann*, L. Raschid +, P. Rieger * * Humboldt Universität zu Berlin + University of Maryland.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
An Introduction to NCBI & BLAST National Center for Biotechnology Information Richard Johnston Pasadena City College.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
High throughput biology data management and data intensive computing drivers George Michaels.
Welcome to the Protein Database Tutorial. This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
STRUCTURAL DYNAMICS OF RHODOPSIN: A G-PROTEIN-COUPLED RECEPTOR by Basak Isin.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Economics and Impact of the Protein Data Bank (PDB) Archive
PDBe Protein Interfaces, Surfaces and Assemblies
Demo: Protein Information Resource
Next Generation Preprint Service
Archives and Information Retrieval
The Protein Data Bank: Evolution of a key resource in biology
Functional Annotation of the Horse Genome
Predicting Active Site Residue Annotations in the Pfam Database
Volume 26, Issue 6, Pages e2 (June 2018)
SUBMITTED BY: DEEPTI SHARMA BIOLOGICAL DATABASE AND SEQUENCE ANALYSIS.
Presentation transcript:

The RCSB Protein Data Bank Teaching an Old Dog New Tricks Philip E. Bourne pbourne@ucsd.edu

A Tribute From the guardian of a resource (institution) to all those men and women who make biology possible – may we never take you for granted Biocurator Perspectives

Agenda The old dog New tricks Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

History of the Old Dog 1970s Community discussions about how to establish an archive of protein structures Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases 1990s Ontology defined Structural genomics begins PDB moves to RCSB 2000s wwPDB formed

History of the Old Dog 1970s Community discussions about how to establish an archive of protein structures Cold Spring Harbor meeting in protein crystallography PDB established at Brookhaven (October 1971; 7 structures) 1980s Number of structures increases as technology improves Community discussions about requiring depositions IUCr guidelines established Number of structures deposited increases 1990s Ontology defined Structural genomics begins PDB moves to RCSB 2000s wwPDB formed

Unchanging Core Mission Create and maintain a well-curated database of macromolecular structure data derived using experimental methods that is… Always accessible to a diverse user community worldwide Developed in collaboration with that community that will… Facilitate and support scientific research and education

Challenges - Scientific More complex structures – molecular machines, complexes New methods (e.g. EM) Lack of a vocabulary to provide reductionism in complex structures Partially solved problems in analyzing structures – structure alignments, domain definitions, functional site determination and characterization, pathway relationships, interaction partners Integrating microscopic and macroscopic views Disease relationships

Growth and Complexity Number of released entries Year:

Data Integration Primary References Derived References Some Actions Human Proteome & Homology Models Function Coverage Target Selection CATH Domains/ Families Source Organism Browser CATH Browser SCOP Browser PFAM Display Structure SCOP PFAM Source Organism SWISS-PROT/ GenBank IDs Pubmed Enzyme Commission NCBI Taxonomy Abstract Search Enzyme Browser Reactome Gene Ontology OMIM/ Disease Genomes (NCBI Gene) Structural Genomics Targets Disease Browser Target Search Genome Browser SNPs Mapped to Structure Find Structures by SP ID GO Browsers Find Structures by GO ID NAR 2005, 33: D233-D237

Challenges - Technical Sheer numbers Efficient visualization Improved annotation Demands from a more diverse user base Centralization versus decentralization Web V2

Diverse User Community (180,000 individuals per month) and Diversifying Further Structural biologists Computational biologists Experimental biologists Educators Students Lay public

Agenda The old dog New tricks Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

New Tricks – Protein Representation The conventional view of a protein (left) has had a remarkable impact on our understanding of living systems, but is it time for a new view? It is not how one protein sees another after all.

Limitations of a Cartesian Viewpoint A local viewpoint – does not capture the global properties of the protein Limited to a single scale descriptor Limits comparative analysis New Tricks – Protein Representation

Protein Kinase A – Open Book View

Superfamily Members – The Same But Different

Alignment Violates the Triangle Inequality Many of the features in the distance matrix may be due to “distortions” induced by the failure to satisfy the TI. Poor distinguishing based on rmsd – illustrated by the breakdown of inequality Protein kinase like superfamily. Left - rmsd distance matrix. Right – number of violations of the triangle inequality at each pair of proteins. New Tricks – Protein Representation

An Alternative Approach: Multipolar Representation Roots in spherical harmonics Parameter space and boundary conditions can be a variety of properties Order of the multipoles defines the granularity of the descriptors Bottom line – interpreted as shape descriptors Gramada & Bourne 2006 BMC Bioinformatics 7:242

Results – Protein Kinase Like Superfamily Alignment Scheeff & Bourne 2005 PLoS Comp. Biol., 1(5) e49 Clear distinction between families. Some clustering seen inside TPKs that resemble various groups, even though there is little shape discrimination at this level. New Tricks – Protein Representation

Possibilities – Structure Based Phylogenetic Analysis Scheeff & Bourne Multipoles New Tricks – Protein Representation

New Tricks – Protein Motion Structures exist in a spectrum from order to disorder Ordered Structures Disordered Structures

Obtaining Protein Dynamic Information Protein Structures Treated as a 3-D Elastic Network Bahar, I., A.R. Atilgan, and B. Erman Direct evaluation of thermal fluctuations in proteins using a single-parameter harmonic potential. Folding & Design, 1997. 2(3): p. 173-181. New Tricks – Protein Motion

Gaussian Network Model Each Ca is a node in the network. Each node undergoes Gaussian-distributed fluctuations influenced by neighboring interactions within a given cutoff distance. (7Å) Decompose protein fluctuation into a summation of different modes. New Tricks – Protein Motion

Functional Flexibility Score Utilize correlated movements to help define regional flexibility with functional importance. Functionally Flexible Score For each residue: Find Maximum and Minimum Correlation. Use to scale normalized fluctuation to determine functional importance. Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Identifying FFRs in HIV Protease Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Other Examples BPTI and Calmodulin Gu, Gribskov & Bourne 2006 PLoS Comp. Biol. 2(7) e90

Side Note: Gaussian Network Model vs Molecular Dynamics GNM relatively course grained GNM fast to compute vs MD Look over larger time scales Suitable for high throughput New Tricks – Protein Motion

An Active Research Program Around the Resource is Good for the Resource

Agenda The old dog New tricks Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

Single worldwide archive of macromolecular structural data Ensures that the PDB remains a single & uniform archive publicly available to the worldwide community 3 founding members: RCSB PDB, PDBj, MSD-EBI Virtual Communities - Internal

wwPDB Activities Collaborative projects Remediation taxonomy, ligands, literature Single data processing system Virtual Communities - Internal

Agenda The old dog New tricks Thinking differently about proteins Virtual Communities Internal (wwPDB) External (modeling, other….) What will the resource look like in 2-5 years?

Virtual Communities - External Consider the PDB a gathering point through which a virtual and real community interacts with each other around a common interest

Virtual Communities - External Real Traveling art exhibit for lay audiences NJ Science Olympiad Science Expo Virtual Website Tutorials/Feedback Molecule of the Month PDB-in-a-CAVE

Virtual Communities - Modelers Recommendations of Workshop PDB depositions should be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules A central, publicly available archive (or technical equivalent thereof) or portal should be established for models It was unanimously agreed that methods for assessing model quality are essential Structure 2006 To be published

Agenda The old dog New tricks Thinking differently about proteins Virtual Communities Internal (wwPDB) External What will the resource look like in 2-5 years?

What Will the Resource Look Like in the Next 2-5 Years? Upwards of 75,000 structures Consensus (and different) views at the micro and macro scale – domains, SNPs, gene structure, cell localization, pathways, interactions, post-translational modification… Community annotation cf Wikipedia Distributed subsets - External Reference Files (XML) MyPDB PDB-in-a-box Specialized visualization tools (mbt.sdsc.edu)

Is a database really different than a biological journal? The Knowledge and Data Cycle 0. Full text of PLoS papers stored in a database 4. The composite view has links to pertinent blocks of literature text and back to the PDB Is a database really different than a biological journal? PloS Comp Biol 2005 1(3) e34 4. 1. 3. A composite view of journal and database content results 1. A link brings up figures from the paper 3. Now assigning DOIs to structures 2. 2. Clicking the paper figure retrieves data from the PDB which is analyzed

Acknowledgements The RCSB PDB NIH, NSF, DOE Apostol Gramada Multipole Analysis Jenny Gu Protein Motions

A Protein is More than the Union of its Parts Breaking the protein into parts changes the object of the comparison This is interpreted in many cases to imply that the rmsd measure is inadequate. The reality is that it is the aligning of structure that breaks the triangle inequality and not the measure per se. The reason for failure is that we effectively compare different objects then we say we do. Transitivity is not guaraneteed From Røgen & Fain (2003), PNAS 100:119-124 New Tricks – Protein Representation

An Alternative Approach: Multipolar Representation Roots in Spherical Harmonics Spatial distribution of a scalar quantity Parameterization + boundary conditions Charge distribution (i.e. structure) Scalar potential Justifies use of multipoles as a distribution of charge also geometry of spatial distribution of atoms Multipole expresses distortions in the spherical distribution Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

An Alternative Approach: Multipolar Representation “Out” Multipoles For a given rank l, they form a 2l+1 dimensional vector under 3D rotations Vector algebra applies => metric properties Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation

An Alternative Approach: Multipolar Representation The multipoles can be interpreted as shape descriptors In principle, from the entire series of multipoles one can reconstruct the scalar field and therefore the density, i.e the entire set of Cartesian coordinates, i. e. of the structure with a geometric level of detail The partitioning of the multipole series according to various representation of the rotational group allows for a multi-scale description of the structure Gramada & Bourne 2006 BMC Bioinformatics 7:242 New Tricks – Protein Representation