Making Deposition Easier

Slides:



Advertisements
Similar presentations
EndNote Web Reference Management Software (module 5.1)
Advertisements

EBI is an Outstation of the European Molecular Biology Laboratory. PDBeChem The Ligand Database.
1 SUBJECT DATABASES ENGLISH 115 Hudson Valley Community College Marvin Library Learning Commons.
Kyle Burkhardt, Data Annotation Leader RCSB PDB at Rutgers University Deposition and Validation using RCSB PDB Tools.
Jewelry Inventory Management Software Your Logo Here Welcome to a demonstration of Del Mar Data Systems Jewelry Inventory Management.
1 Welcome to the Protein Database Tutorial This tutorial will describe how to navigate the section of Gramene that provides collective information on proteins.
1 NIH Public Access Policy Policy on Enhancing Public Access to Archived Publications Resulting From NIH-Funded Research (Public Access Policy)
Chapter 9 Collecting Data with Forms. A form on a web page consists of form objects such as text boxes or radio buttons into which users type information.
C#OMPANY LOGO OMB Control Number HealthCare.gov Data Collection August 31, 2010 Training.
Number of released entries Year. Growth of Molecular Complexity Number of Chains Year Number of Structures Containing that Number of Chains.
SWIS Digital Inspections Project (SWIS DIP) Chris Allen, Information Management Branch California Integrated Waste Management Board November 5, 2008 The.
Getting started on informaworld™ How do I register my institution with informaworld™? How is my institution’s online access activated? What do I do if.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Bringing Structure to Biology: Small Molecules and the PDBe
Evaluation of Structure Quality Using RCSB PDB Tools Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University.
Kyle Burkhardt, Lead Data Annotator The RCSB PDB at Rutgers University Validation and Deposition at the RCSB Protein.
T. Rowe Price, Invest With Confidence and the Bighorn Sheep logo is a registered trademark of T. Rowe Price Group, Inc. Please dial from.
23 rd August 2005CCP4-RCSB Workshop IUCr 2005 Florence Italy 1 N6: A Protein Crystallographic Toolbox: The CCP4 Software Suite and RCSB PDB Deposition.
28 Mar 06Automation1 Overview of developments within CCP4 Generation 1 ccp4i tasks Generation 2 isolated scripts / web service Generation 3 integrated.
BALBES (Current working name) A. Vagin, F. Long, J. Foadi, A. Lebedev G. Murshudov Chemistry Department, University of York.
A Web Based Workorder Management System for California Schools.
2014 AIA San Diego Design Awards Submission Instructions Slide PowerPoint Template for the Foundation Categories Each project entry must submit a separate.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
An Introduction to CCP4i The CCP4 Graphical User Interface Peter Briggs CCP4.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
Copyright OpenHelix. No use or reproduction without express written consent1.
R. Keegan 1, J. Bibby 3, C. Ballard 1, E. Krissinel 1, D. Waterman 1, A. Lebedev 1, M. Winn 2, D. Rigden 3 1 Research Complex at Harwell, STFC Rutherford.
Data Integration and Management A PDB Perspective.
In context…. xia2: what is it? Automated expert data reduction – images in, reflections suitable for phasing out. Handles: –MAD data –Multiple passes.
Protein Data Bank: An Introduction Learning to Use the RCSB PDB Portal.
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Almost at the end … “If you don’t remember anything else, remember this”
Worldwide Protein Data Bank Common D&A Project Sequence Processing Modular Demo May 6, 2010 Project Deliverable.
Worldwide Protein Data Bank wwPDB Common D&A Project Full Project Team Meeting Rutgers March 16-19, 2010.
17 th October 2005CCP4 Database Meeting (York) CCP4i Database Overview Peter Briggs.
Copyright OpenHelix. No use or reproduction without express written consent1.
Public Libraries Survey Data File Overview. What We’ll Talk About PLS: Public Libraries Survey State level data Public library data (Administrative Entities)
JACoW / SPMS Joint Accelerator Conference Web (JACoW) Site Scientific Program Management System (SPMS) Conference Database Management Software Matt Arena,
Advanced Higher Computing Science
Architecture Review 10/11/2004
Introducing ART UCSF’s Application, Review and Tracking (ART) System
Journal of Mountain Science (JMS)
Stony Brook Integrative Structural Biology Organization
PDBe Protein Interfaces, Surfaces and Assemblies
Submission if Paying to Print in Dublin ($38)
Center for Undergraduate Research Fall 2017 Panther Pipelines: Discovery Day Poster Submission Guidelines The Virginia Union University (VUU) Center for.
Journal of Mountain Science
Take a REST from manual searching: PDBe, programmatically
BASIC INFORMATION ABOUT DATABASE MANAGEMENT SOFTWARE
CCP4 6.1 and beyond: Tools for Macromolecular Crystallography
Number of released entries
Materials Engineering Product Data Management (ePDM)
How to Publish with IEEE
Guides to Reviewerss Journal of Mountain Science Guides to Reviewerss
PE Determiner Portal Registration and Log on Workshop
Reduce the need for human intervention in protein model building
CCP4 from a user perspective
Recommended Budget Reductions
Almost at the end … “If you don’t remember anything else, remember this !!!!”
PE Determiner Portal Registration and Log on Workshop
CCP4-PDB Workshop ACA 2004 Chicago
Guide to Editors (ED) Journal of Mountain Science (JMS)
Problem Statement and Significance
TargetDB and PEPCDB •
Journal of Mountain Science (JMS)
The site to download BALBES:
Towson University Store
N6: A Protein Crystallographic Toolbox:
Presentation transcript:

Making Deposition Easier and Making Deposition Easier Shuchismita Dutta, Ph.D. ACA 2004 Chicago July 17th 2004

Motivation for this workshop: Change your spin about structural data deposition Data deposition is a chore a chore no more I can’t wait to use the cool deposition tools at the RCSB-PDB to deposit some more (structures) Motivation instead of goal, put a picture of a top

Overview of Data Deposition Tools log files from crystallographic applications pdb_extract ADIT Validation suite coordinates & experimental data Ligand Depot deposition

Structural data deposition today The why, when, how, where and what of deposition

Why do you deposit your structural data to the PDB “Compulsory” reasons Primary citation journal policies requires it Funding agency requires it “Voluntary” reasons For safe-keeping of structural data For the benefit of the entire scientific community

When do you deposit? Immediately after structure determination Just prior to or after submission of manuscript After the manuscript has been accepted – urgent request for PDB ID Just before the researcher is leaving the lab Several years after the initial data collection

How and Where do you deposit? Using the ADIT tool http://deposit.pdb.org/adit/ (RCSB-PDB) or http://pdbdep.protein.osaka-u.ac.jp/adit/ (PDBj). Using AutoDep http://autodep.ebi.ac.uk/ (MSD/EBI).

What do you deposit? The coordinates The structure factor file(s) and more … Information that only you can provide Information that you should complete and verify about the molecule(s) or complex about the crystallization and data collection Information that can be extracted from log files of crystallographic applications.

Information - only you can provide Contact information: author names, e-mail, postal address, phone, fax, including PI Release instructions: for coordinates, structure factors & sequence(s) Title for the deposited structure Related entries: name of database, ID, description Citation information: authors, title, journal details if available

Information about the molecule(s) - complete and verify Molecule Name, ligand name if appropriate Molecule details: Fragment name, mutations, EC # Sequence information: sequence, chain identifiers, appropriate database references Source information: genetically manipulated, natural or synthetic Keywords: To describe and search for the structure Biological assembly description

Information about crystallization and data collection - complete and verify Crystallization details: method, pH, temperature, crystallization solution components, solvent content, Matthews coefficient Crystal data: cell dimensions and space group Data collection information: number of crystals, type of diffraction experiment, radiation source, wavelength(s) used, detector type, data collection date, collection temperature

Information - extract from log files Data collection information: resolution limits, observed criterion for sigma (F) or sigma (I), number of unique reflections (all and observed), percentage of possible reflections observed, R-merge I or R-sym I, details about the highest resolution shell Refinement statistics: resolution limits for refinement, cut-off on sigma(F), number of unique reflections (all and observed) used in refinement, R-factors for all reflections, R-factor for observed reflections, R-factor for working set reflections, associated R-free for the cross-validation set, structure determination method, cross-validation reflection selection details, stereochemistry target values Software used: for data collection, data reduction, structure solution, and refinement In addition more info regarding phasing statistics

Structural data deposition in the future pdb_extract: an automated data extraction tool to prepare your structural data for deposition. logo

What does pdb_extract do? data collection reduction phasing structure refinement density modification molecular replacement Output files data template file mmCIF reflection data structure data deposition ADIT validation email or ftp validation

Advantages of using pdb_extract Automated data capture Creates more detailed deposition in files (phasing statistics) Output files can be directly validated and deposited Makes it easier for us to annotate Allows you to keep an electronic notebook for structures that are solved over a long period of time. logo reduces manual intervention since it uses the mmCIF PDB exchange dictionary everything goes faster

Logic for running pdb_extract Coordinate file for deposition 1 extract The data template file 2 Applications used for structure determination (output and log files) Completed coordinate file for validation pdb_extract Any flavor cif and pdb ice cream cones Gold standard for the output treasure chest with gold coins 3 Completed structure factor file for validation Structure factor file(s) in various formats pdb_extract_sf

File flavors mmCIF PDB mmCIF SF ASCII SF mtz SF XML

Logic for running pdb_extract Coordinate file for deposition 1 extract The data template file 2 Applications used for structure determination (output and log files) Completed coordinate file for validation pdb_extract Any flavor cif and pdb ice cream cones Gold standard for the output treasure chest with gold coins 3 Completed structure factor file for validation Structure factor file(s) in various formats pdb_extract_sf

Getting the sequence right in the data template file Missing residues: marked as question marks ‘????’ in the one-letter-code sequence. Complete the sequence at all these locations Missing side chains: Correct the sequence of any residue modeled as Ala or Gly due to missing side chain density Missing N- and/or C-termini: complete the sequence of the termini (include the sequence of cloning artifacts, expression tags etc. if present) Non-standard residues: extracted according to their 3 letter code (e.g. (MSE)) Add a slide with additional details in data template file

Additional data in the data template file contact authors release status citation and author list molecule name and details source information keywords biological assembly crystallization and data collection details

How to use pdb_extract? The CCP4i interface (CCP4) Intuitive and easy interface The command line interface (CCP4, pdb_extract) Flexible interface Need to use specific arguments The script interface (CCP4, pdb_extract) User friendly interface Script input file The Web interface (http://pdb-extract.rutgers.edu/) Can be run online from the RCSB-PDB

The CCP4i interface Coordinate file for deposition The data template file extract Applications used for structure determination (output and log files) Structure factor file(s) in various formats pdb_extract Completed coordinate file for validation Completed structure factor file for validation Generate a data template - Generate a complete mmCIF file for PDB deposition - mtz2various Structure factors for deposition - command line pdb_extract_sf

Show partial screen, change font in screen before taking screen shot

data scaling phasing Show partial screen density modifi- cation

density modifi- cation refine- ment Data template

The command line interface The data template file Coordinate file for deposition Applications used for structure determination (output and log files) Structure factor file(s) in various formats Completed coordinate file for validation Completed structure factor file for validation extract pdb_extract pdb_extract_sf

extract -pdb coordinate_PDB_file_name extract -cif coordinate_CIF_file_name pdb_extract -e MAD \ -p SOLVE -iLOG solve.prt \ -d RESOLVE -iLOG resolve.log \ -r refmac5 -icif peak.refmac -ipdb refmac.pdb\ -s HKL –iLOG scale-refine.log \ -sp HKL scale1.log scale2.log scale3.log \ -iENT date_template.text \ -o output.cif pdb_extract_sf -rt F -rp refmac5 -idat refmac_sf.mmcif \ (for refinement) -dt I -dp HKL \ (for phasing) -c 1 -w 1 -idat scale1.sca \ -c 1 -w 2 -idat scale2.sca \ -c 1 -w 3 -idat scale3.sca \ -o output_sf.cif

The script interface Generate the data template & script input files Coordinate file for deposition Generate the data template & script input files extract The data template file Applications used for structure determination (output and log files) Completed coordinate file for validation Run the script The script input file extract Completed structure factor file for validation Structure factor file(s) in various formats

===============PART 1: Structure Factor for Final Refinement============== Enter reflection data file used for final structure refinement <reflection_data_type = "F" > (enter I (intensity) or F (amplitude)) <reflection_data_format = "CCP4" > <reflection_data_file_name = " " > ==============PART 2: Structure Factors for Protein Phasing================ Enter reflection data files used for heavy atom or MAD phasing <scale_data_type = "I" > (enter I (intensity) or F (amplitude)) <scale_program_name = "HKL" > For data set 1: <crystal_number = "1" > <diffract_number = "1" > <scale_data_file_name_1 = " " > <scale_log_file_name_1 = " " > ==============PART 4: Statistics for Molecular Replacement================ Enter log files and software name for molecular replacement <mr_software = “AMORE " > <mr_log_file_LOG_1 = " " > <mr_log_file_LOG_2 = " " >

The web interface (from RCSB-PDB) Sequence of polymers in the structure extract pdb_extract pdb_extract_sf Coordinate file for deposition Applications used for structure determination (output and log files) Structure factor file(s) in various formats Coordinate file for ADIT (editing & validation) Completed structure factor file for validation Upload the coordinate file Press submit button Add additional details in ADIT

Multiple paths to data deposition CCP4i interface command line interface script interface web interface pdb_extract validate add information deposit ADIT validation

In summary Use pdb_extract to prepare your data Validate your files before deposition Use ADIT to deposit your files 3 paths to deposition To my gracious hosts 1. 2. 3. What do we recommend

Please Visit the RCSB PDB Booth #325 in “Data Alley” Demonstrations pdb_extract validation ADIT reengineered PDB site demos during coffee breaks Questions answered Tattoos, posters and literature You can always write to us at deposit@rcsb.rutgers.edu All information is available from deposit.pdb.org Funny picture for data alley

Acknowledgements The Protein Data Bank (PDB) is operated by Rutgers, The State University of New Jersey San Diego Supercomputer Center at the University of California, San Diego Center for Advanced Research in Biotechnology/UMBI/NIST The RCSB PDB is supported by funds from National Science Foundation (NSF) National Institute of General Medical Sciences (NIGMS) Office of Science, Department of Energy (DOE) National Library of Medicine (NLM) National Cancer Institute (NCI) National Center for Research Resources (NCRR) National Institute of Biomedical Imaging and Bioengineering (NIBIB) National Institute of Neurological Disorders and Stroke (NINDS) The worldwide PDB (wwPDB) is a collaboration between RCSB MSD/EBI PDBj RCSB-PDB is part of wwPDB with logo

RCSB-PDB Data Deposition Services pdb_extract Web- http://pdb-extract.rutgers.edu/ Standalone - http://deposit.pdb.org/mmcif/PDB_EXTRACT/index.html Validation Server Web - http://deposit.pdb.org/validate/ Standalone - http://deposit.pdb.org/mmcif/VAL/index.html ADIT Web – http://deposit.pdb.org/adit/ Standalone - http://deposit.pdb.org/mmcif/ADIT/index.html Ligand Depot - http://ligand-depot.rutgers.edu/ Overview and tutorials for all RCSB-PDB data deposition services – http://deposit.pdb.org