Worldwide Protein Data Bank www.wwpdb.org. Worldwide Protein Data Bank www.wwpdb.org Agenda  Welcome and Introductions  Overview of recent wwPDB progress.

Slides:



Advertisements
Similar presentations
Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
Advertisements

Worldwide Protein Data Bank September 7, 2007.
Continuous improvement of macromolecular crystal structures Tom Terwilliger (Los Alamos National Laboratory) DDD WG member ECM 2012: Diffraction Data Deposition.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
PubMed Central ANCHASL Spring Meeting April 1, 2005 Robert James Associate Director of Public Services Duke University.
Creating NCBI The late Senator Claude Pepper recognized the importance of computerized information processing methods for the conduct of biomedical research.
1.
Update on PDB Data Deposition Specifications
The Imperial College Tissue Bank A searchable catalogue for tissues, research projects and data outcomes Prof Gerry Thomas - Dept. Surgery & Cancer The.
How to use the web for bioinformatics Molecular Technologies Ethan Strauss X 1171
Archives and Information Retrieval
UI Standards & Tools Khushroo Shaikh.
Workshop on Biological Macromolecular Structure Models RCSB Protein Data Bank Rutgers, The State University of New Jersey.
Management and Distribution of Chemical Data in the Protein Data Bank John Westbrook, Dimitris Dimitropoulos, Jasmine Young, Peter Rose, Philip E. Bourne.
Configuration Management
1 NIH Public Access Policy Policy on Enhancing Public Access to Archived Publications Resulting From NIH-Funded Research (Public Access Policy)
Information and Communication Technologies in the field of general education in Armenia NATIONAL CENTER OF EDUCATIONAL TECHNOLOGIES.
SATERN for Supervisors May Session Objectives At the end of the session, participants will be able to:  Describe the benefits of SATERN.  Log.
Protein Interfaces, Surfaces and Assemblies
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Classroom User Training June 29, 2005 Presented by:
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
SOFTWARE ENGINEERING BIT-8 APRIL, 16,2008 Introduction to UML.
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
Databases in Bioinformatics and Systems Biology Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
X-ray crystallography NMR cryoEM Experimental approaches for structural biology.
Introduction to databases Tuomas Hätinen. Topics File Formats Databases -Primary structure: UniProt -Tertiary structure: PDB Database integration system.
23 rd August 2005CCP4-RCSB Workshop IUCr 2005 Florence Italy 1 N6: A Protein Crystallographic Toolbox: The CCP4 Software Suite and RCSB PDB Deposition.
Usability Issues Documentation J. Apostolakis for Geant4 16 January 2009.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Databank in Europe (PDBe)‏ An Introduction.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Helen M. Berman, Rutgers University EMBO Practical Course Section: Searching Structure Databases September 26, 2008 PSI Structural Genomics Knowledgebase.
Worldwide Protein Data Bank Worldwide Protein Data Bank History of the PDB  1970s  Community discussions about how to establish.
My Bibliography/eRA Commons Integration More utility, less work Bart Trawick Neil Thakur Commons Working Group, 9/22/09.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
EMBL-EBI EMBL-EBI EMBL-EBI What is the EBI's particular niche? Provides Core Biomolecular Resources in Europe –Nucleotide; genome, protein sequences,
ISGO: The International Structural Genomics Organization Goals of ISGO Develop standards and policies for structural genomics Sponsor international meetings.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
17 th October 2005CCP4 Database Meeting (York) CCP4(i)/BIOXHIT Database Project: Scope, Aims, Plans, Status and all that jazz Peter Briggs, Wanjuan Yang.
Data Integration and Management A PDB Perspective.
Structure database: PDB Tuomas Hätinen. Protein Data Bank A repository for 3-D biological macromolecular structure. It includes proteins, nucleic acids.
EBI is an Outstation of the European Molecular Biology Laboratory. MSDchem and the chemistry of the wwPDB EMBO 22nd-26th September 2008 EMBL-EBI Hinxton.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Deposition, Validation, Search and Analysis Services.
Macromolecular Structure Database Project EMSD Infra-structure Services for Europe To develop an autonomous structural database capability in Europe
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Real World Experiences in Operating a Collaboratory: The Protein Data Bank Helen M. Berman Board of Governors Professor of Chemistry.
Requirement Engineering. Recap Elaboration Behavioral Modeling State Diagram Sequence Diagram Negotiation.
EM Maps and Models in EMDB/PDB. Growth of EM entries
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
Project Database Handler The Project Database Handler is a brokering application, which will mediate interactions between the project database and other.
Primary vs. Secondary Databases Primary databases are repositories of “raw” data. These are also referred to as archival databases. -This is one of the.
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Copyright OpenHelix. No use or reproduction without express written consent1.
EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
NIH Considerations for CBI Trainees Leslie Kinsland November 12, 2015.
Session 6: Data Flow, Data Management, and Data Quality.
Data Coordinating Center University of Washington Department of Biostatistics Elizabeth Brown, ScD Siiri Bennett, MD.
Project Database Handler The Project Database Handler is a brokering application which will mediate interactions between the project database and other.
PDBe Protein Interfaces, Surfaces and Assemblies
Welcome to the Green.Arkansas.gov Question & Answer Session
Number of released entries
Volume 26, Issue 6, Pages e2 (June 2018)
Presentation transcript:

Worldwide Protein Data Bank

Worldwide Protein Data Bank Agenda  Welcome and Introductions  Overview of recent wwPDB progress  Introduction to the BMRB  Theoretical model policy  Issues for discussion and advice Break  wwPDB group interactions  wwPDB plans for 2007  Long term aims, funding, and stability  Executive session  Feedback to wwPDB  Set next meeting date (July 2007; Salt Lake City, UT?)

Worldwide Protein Data Bank wwPDB Achievements August 2005-October 2006  Continued growth of archive  Website updates  Publications and presentations  Time stamped archive  wwPDB team building  Annotation document  Remediation  BMRB formally a member of wwPDB

Worldwide Protein Data Bank Deposition issues

Worldwide Protein Data Bank The never ending story

Worldwide Protein Data Bank Deposition since establishment of 3 sites

Worldwide Protein Data Bank PDB entry processing  ,997 entries in PDB  Today 1-Oct ,323 entries in PDB Total size is 3.6 times when the 3 sites started  In entries deposited  In entries deposited We handle 2.8 as many entries per year with less staff - and all 3 sites produce high quality annotated PDB entries NO CURRENT BACKLOG UN-PROCESSED ENTRIES

Worldwide Protein Data Bank Time-stamped copies of the archive  24 Gbytes of data for 2005, released January 3, 2006  Includes: –PDB format entries –mmCIF format entries –PDBML format entries –Experimental data –Dictionary, schema and format documentation

Worldwide Protein Data Bank Outreach  wwPDB website  Publications and meetings

Worldwide Protein Data Bank

Worldwide Protein Data Bank Joint publications and presentations  Nucleic Acids Research 2007 Database Issue –Ensuring a single, uniform archive of PDB data  Methods in Molecular Biology 2007 –Data deposition and annotation at the wwPDB  Nature Structural & Molecular Biology, 2006 –Is one solution good enough? (response)  CODATA (October 23-25, 2006; Beijing, China) –The Worldwide Protein Data Bank  Encyclopedia of Genomics, Proteomics, and Bioinformatics, 2005 –The Protein Data Bank and the wwPDB

Worldwide Protein Data Bank The wwPDB Team

Worldwide Protein Data Bank wwPDB interactions this year  Exchange visits –MSD/RCSB (6) (thanks to WT) –PDBj/RCSB (1), –BMRB/RCSB-PDB (3)  Phone conference with site directors-twice a year  VTC’s among staff –BMRB/RCSB twice a month (ADIT-NMR) –MSD/RCSB-twice a week (annotation procedures, remediation)  among staff –MSD/RCSB~2 per day –PDBj/RCSB~2 per day

Worldwide Protein Data Bank What is the PDB?  Content  Processes to ensure quality (annotation project)

Worldwide Protein Data Bank Annotation project

Worldwide Protein Data Bank Annotation project GOALS  Standardize annotation rules and policies among wwPDB sites  Document annotation rules and policies  Create venue to update annotation rules and policies as necessary

Worldwide Protein Data Bank Annotation project How did we get there?  Review and discuss each PDB field by and VTC  Write document and review by all staff  Final review by site directors  Implement software compliant to new annotation procedures  Test software and train annotators  Publish document on Web

Worldwide Protein Data Bank Annotation project Resultant document  Specification of ALL fields in PDB file  Clarification of policies –Assignment of PDB IDs –Release of files and information –Changes to entries  Clarification of data representation –Chain ID for all atoms in the file –Multi-model representation for alternate conformation or disorder –Chimeras –Microheterogenity

Worldwide Protein Data Bank Remediation

Worldwide Protein Data Bank Remediation: scope 34,528 Entries Checked  Primary citations  Sequences & taxonomy  Ligand stereochemistry and nomenclature  Symmetry and coordinate transformations for virus entries  Diffraction source & beamline  Miscellaneous uniformity issues

Worldwide Protein Data Bank Remediation: statistics  Citations : –All primary citations checked –8508 citations manually examined –7037 citations confirmed and updated  Sequence and taxonomy : –47917 sequences checked –20068 updated sequence data references –11087 taxonomic references updated  Virus entries –250 entries checked and revised  Diffraction source –10985 entries revised  Miscellaneous uniformity corrections –1041 entries revised

Worldwide Protein Data Bank Remediation: statistics  Ligand stereochemistry and nomenclature –7568 ligand definitions checked –1758 new ligand definitions added –185 ligand definitions obsoleted –152,000 ligand instances checked –138,230 ligand instances OK –6815 ligand instances renamed

Worldwide Protein Data Bank Remediation process  Corrections contributed and reviewed by wwPDB members  Corrections on the archival mmCIF data files tracked in a version tracking system, CVS  New PDB exchange, PDBML and PDB format data files being produced now  Each wwPDB group will validate and load the resulting files into their database systems  Invited public testing will begin January 2007  General availability will start April 2007

Worldwide Protein Data Bank Remediation: Ligand dictionary rewrite  Model and idealized coordinates provided  Stereochemical configuration assignments  Aromatic atoms and bonds flagged  Definitions provided for “Chemistry Catalog” state with leaving atom candidates flagged  Nonstandard atom names revised (e.g. dinucleotides)  Duplicate ligand definitions marked as obsolete  Metal hydrate definitions obsoleted  Alternate atom name aded to store legacy atom names  SMILES and INCHI descriptors provided

Worldwide Protein Data Bank Remediation: major entry level corrections  Citations: –PubMed identifiers provided where available –Unpublished citations checked and flagged  Sequence and taxonomy: –UniProt sequence database references –Taxonomies from NCBI Taxonomy database  Diffraction source –Synchrotron facility and beamlines names consistently specified in coordination with BioSync

Worldwide Protein Data Bank Remediation: major ATOM record changes  Nomenclature changes –IUPAC H-atom names for standard amino acids and nucleotides –DNA and RNA differentiated (AD (DNA) & A (RNA)) –Modified nucleotides expressed as 3-letter codes (removed +’s) –PDB asterisks replaced by single quotes in atom names –Noncompliant ligands flagged in data files

Worldwide Protein Data Bank Remediation: Major REMARK changes  Virus entries –Transformations from deposited frame to point symmetry and crystallographic frame provided –NCS and point symmetry transformations properly differentiated

Worldwide Protein Data Bank EM standards  New dictionary for electron microscopy  MAP orientation conventions

BMRB John Markley Worldwide Protein Data Bank

Introduction to the BMRB  BMRB is the worldwide archival site for biomolecular NMR data  NMR data related to structures are cross referenced to PDB entries  PDBj mirrors BMRB and supports external BMRB depositions  As RCSB members, BMRB and PDB have worked closely to capture and annotate NMR data associated with deposited coordinate sets  Recognizing that the biomolecular NMR community would be best served by having a “one stop” deposition system for NMR structures, BMRB has been pursuing this goal in collaboration with the RCSB-PDB  BMRB plans to institute the same policy with MSD EBL Worldwide Protein Data Bank

Worldwide Protein Data Bank wwPDB NMR experimental data flow RCSB-PDB (deposition) ADIT-NMR PDBj-BMRB (deposition/processing/export) ADIT-NMR Mirror site BMRB (deposition/processing/export) ADIT-NMR central archive MSD/EBI (deposition/export) CCPN Deposited data Raw NMR-STAR Deposited data Processed NMR- STAR CERM-BMRB (export) Mirror site Processed NMR-STAR Processed NMR-STAR Processed NMR- STAR

 “One-stop” BMRB-PDB ADIT-NMR deposition site for structures and NMR data developed in collaboration with PDB is operational, with BMRB assigning PDB accession codes  Restraints database for legacy structures is nearing completion as part of the wwPDB “clean-up”; new tools to automate this process were developed in collaboration with MSD EBI  NMR-STAR v3 dictionary has been extended and released  Graphical interface with Jmol displays integrates PDB coordinate data with associated NMR parameters  BMRB is working with SG groups to improve efficiency of capturing protein NMR data  BMRB participates in the “PDB-BMRB Task Group on NMR” Major developments related to BMRB’s role in the wwPDB Worldwide Protein Data Bank

New “one-stop” deposition of NMR structures/ data Worldwide Protein Data Bank

 BMRB and RCSB-PDB depositions are now generated from a joint interface  BMRB interface has been streamlined  RCSB-PDB interface for NMR has been extended with optional fields for conformer and constraint statistics  Files in PDB format, mmCIF, and NMR-STAR can be uploaded to pre-populate a deposition  Many fields (i.e., experiment name, software name, software author, etc.) have pull-down lists to choose from for convenience and to improve uniformity  Fields common to multiple forms are linked to eliminate the need to retype information (i.e., uploaded data file names, author names, molecule names and others)  Help and examples have been improved Deposition interface features Worldwide Protein Data Bank

Worldwide Protein Data Bank Restraints grid is keyed to NMR structural entries

Worldwide Protein Data Bank Coordinated displays of NMR data and structures

Worldwide Protein Data Bank Theoretical Models Policy Haruki Nakamura

Worldwide Protein Data Bank Models  Define line between “pure” models and models based on data  Large experimental spectrum e.g. X-ray, NMR, EM, SAX, FRET models  Homology models especially as derived from structural genomics  Need a way to archive models that is totally compatible with PDB

Worldwide Protein Data Bank Defining a policy for models  Attended by modelers, structural genomicists, electron microscopists  Policies and suggested implementations developed  Outcome published in Structure –“Outcome of a Workshop on Archiving Structural Models of Biological Macromolecules”, Helen M. Berman, Stephen K. Burley, Wah Chiu, Andrej Sali, Alexei Adzhubei, Philip E. Bourne, Stephen H. Bryant, Roland L. Dunbrack, Jr., Krzysztof Fidelis, Joachim Frank, Adam Godzik, Kim Henrick, Andrzej Joachimiak, Bernard Heymann, David Jones, John L. Markley, John Moult, Gaetano T. Montelione, Christine Orengo, Michael G. Rossmann, Burkhard Rost, Helen Saibil, Torsten Schwede, Daron M. Standley, John D. Westbrook, Structure, /8: Workshop at Rutgers (November 19-20, 2005)

Worldwide Protein Data Bank Models Recommendations  PDB depositions will be restricted to atomic coordinates that are substantially determined by experimental measurements on specimens containing biological macromolecules.  A central, publicly available archive or portal should be established for models that are the explicit subject of peer review.  Methods for assessing model quality are essential for the integrity and long-term success of any publicly available model portal, either from a central repository or a set of linked resources. There was no consensus as to which single method or group of methods should be applied.

Theoretical Model DB 1 Theoretical Model DB 2 Proposed Portal for Multiple Databases for Protein Structures Berman, H. et al. (2006) Structure, 14, Theoretical Model DBs

Worldwide Protein Data Bank Characteristics of portal  Data Standards for Models  Access Models for a Central Portal of Models –The minimum contents for this portal require a unique identifier for each model registered with the system, each model's polypeptide chain sequence, and quality assessment information. –Additional information should be available, including: keywords, structural motifs, standard test sets of data, bound ligands, domains, flexibility, surface electrostatic properties, coding & noncoding SNPs, alternative splicing, oligomeric state, macromolecular interactions, literature references, subcellular localization, pathways, transcript profiling, & drugability. –Access to these data should be free and constantly available to a diverse worldwide user community of both model producers and users. Several levels of access are required for the different levels of users of the portal.

Worldwide Protein Data Bank Implementation of models policy  August 15, 2006: Policy announced with 60 day period of review  August 15-October 15, 2006: Transition Plan –All existing un-processed theoretical model entries as well as entries deposited during this time were not validated or processed. Entries will be released as-is without author review or corrections. –Authors had the choice of correcting their entries by withdrawing the original entry and then re-submitting the corrected version before October 15,  October 15, 2006: Theoretical model depositions no longer accepted

Worldwide Protein Data Bank Discussion Issues Kim Henrick

Worldwide Protein Data Bank SAX - New EXP TYPE  Hamburg to provide templates for consideration

Worldwide Protein Data Bank 4-letter code?  Use of PDB 4-letter code can be extended by allowing alpha-numeric in 1 st character to 35 x 36 x 36 x 36 = 1,632,960 combinations

Worldwide Protein Data Bank Patent Office The structures in the patent office may not represent a major loss of structures – current investigations indicate most patent structures are in the PDB. A much larger set of structures are in the Pharma on ligand bound structures.

Worldwide Protein Data Bank wwPDB SAC input request

Worldwide Protein Data Bank What is a PDB Entry? Rules for the smallest structure that can be submitted –Carbohydrate chains? –How long is a peptide? (24) –Non-gene product macromolecular biological ligands (e.g. antibiotics)? Particular request from NMR depositors

Worldwide Protein Data Bank Issues Annotation: EXP details Experimental Details  Twinning – twin factor in REMARK 3 requested and original un-twinned structure factors  TLS and conventional atomic B factor  Author derived Validation software and procedures/results – no longer accepted as in REMARK 42 – now a REMARK to carry software used and function

Worldwide Protein Data Bank Policy: pre-Release Details  Entries on HOLD or HPUB – currently details usually made public – AUTHOR, TITLE, STATUS  Authors request all details suppressed – however Journals need to check validity of IDCODE...Yes/No?

Worldwide Protein Data Bank Deposition policy  HPUB/HOLD limit of one year Current problem: After one year no response. Do we release?... fixed rules? No problems, release it Problems, withdraw it

Worldwide Protein Data Bank Major changes after remediation dictionary changes  Could have major affects on software  New dictionary will be announced to many software developers in early November, 2006

Worldwide Protein Data Bank Major changes after remediation nucleic acids –DNA and RNA differentiated (AD (DNA) & A (RNA)) residue names now A = rna, AD = dna –Modified nucleotides expressed as 3-letter codes (removed +’s) e.g. no longer treat as +C etc C31 as 2'-O-3-AMINOPROPYL CYTIDINE-5'- MONOPHOSPHATE –PDB asterisks replaced by single quotes in atom names O2* is back to O2’ as in refinement dictionaries

Worldwide Protein Data Bank IUPAC H-atom names for standard amino acids and nucleotides as in BMRB file as recommended by the NMR Task Force Example New PDB New PDB New PDB H H HG12 pro-R 1HG1 HD11 1HD1 HA HA HG13 pro-S 2HG1 HD12 2HD1 Major changes after remediation H-atom names

Worldwide Protein Data Bank Major changes after remediation other atom names  “strange” atom names as in co-factors like FAD i.e. AC1*, AN9, AC8 to be replaced by C1'A, N9A, C8A  In HEM atom names ‘N A’ to 'NA'

Worldwide Protein Data Bank Other issues

Worldwide Protein Data Bank Issues Annotation: Disorder/MODEL  Use of MODEL record with disorder with Alternate conformations of large portions of structures e.g. statistical disorder....in progress

Worldwide Protein Data Bank Issues Annotation: ATOM/SEQRES Mismatch Fitting species specific ATOM records to a related X-ray or EM data set of a different species, as for example, a large complex, ATPase... needs new tokens in progress

Worldwide Protein Data Bank Very large structures in PDB  A proposed solution

Worldwide Protein Data Bank Representing large complexes in the PDB PART and ENDPRT These records will act much like the existing MODEL/ENDMDL records providing a sectioning mechanism with the PDB file. PART sections will include records which describe the different constituent parts of a large molecular system.

Worldwide Protein Data Bank Representing large complexes in the PDB A PART/ENDPRT section will include all of the PDB records types which reference specific structural elements of the molecule. PDB records that do not define or reference specific elements of molecular structure will be at the beginning of the multipart PDB file

Worldwide Protein Data Bank Coffee

Worldwide Protein Data Bank wwPDB in 2007  Same again.. but more of it (deposition and processing) IN ADDITION  Rollout new files  Implement new annotation procedures  Discuss feasibility of a single deposition/processing system  Further team exchanges  Gather Pharma structures

Worldwide Protein Data Bank Long term goals, funding and stability

Worldwide Protein Data Bank We would really like to be the world wide PDB with regular stable funding

Worldwide Protein Data Bank Acknowledgements E-MSD is supported by grants from the Wellcome Trust, the EU (TEMBLOR, NMRQUAL and IIMS), CCP4, the BBSRC, the MRC and EMBL. The BMRB is supported by NIH grant LM05799 from the National Library of Medicine. PDBj is supported by grant-in-aid from the Institute for Bioinformatics Research and Development, Japan Science and Technology Agency (BIRD-JST), and the Ministry of Education, Culture, Sports, Science and Technology (MEXT). The RCSB PDB is supported by grants from the National Science Foundation, National Institute of General Medical Sciences, the Office of Science-Department of Energy, the National Library of Medicine, the National Cancer Institute, the National Center for Research Resources, the National Institute of Biomedical Imaging and Bioengineering, the National Institute of Neurological Disorders and Stroke, and the National Institute of Diabetes & Digestive & Kidney Diseases.