Information Management and Publication in Crystallography I2S2 Workshop Future of Data Management Systems in the Structural Sciences, RAL, Oxon, 1 April.

Slides:



Advertisements
Similar presentations
EPrints - Introducing EPrints 3 Software William J Nixon Digital Library Development Manager, University of Glasgow With many thanks to Les Carr and the.
Advertisements

Visualisation of chemical data Brian McMahon Research & Development Officer International Union of Crystallography 5 Abbey Square Chester CH1 2HU
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
A distributed architecture for crystallography data, metadata, and applications John C. Bollinger Indiana University Molecular Structure Center, Bloomington,
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Data and metadata in the Reciprocal Net John C. Bollinger Indiana University Molecular Structure Center, Bloomington, IN.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
Data Curation in Crystallography: Publisher Perspectives JISC Data Cluster Consultation Workshop CCLRC, Didcot, Oxon 10 October 2006.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
CCPN project modeling framework University of Cambridge European Bioinformatics Institute MSD group.
Publisher perspective eBank/R4L/SPECTRa Joint Consultation Workshop London Metropole Hotel 20 October 2006.
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
Accessing the data: going beyond what the author wanted to tell you Brian McMahon International Union of Crystallography 5 Abbey Square, Chester CH1 2HU,
Pensoft Writing Tool (PWT) Lyubomir Penev ViBRANT Tools for DNA taxonomists, 11 June 2013, Brussles ViBRANT.
Changing methods of data sharing in crystallography Professor John R Helliwell Imperial College, June 28th, 2006 The University of Manchester
Data activities of the International Union of Crystallography Brian McMahon IUCr 5 Abbey Square Chester CH1 2HU
PubMed Central Mahyar Ahmadpour-B. Kowsar Publicatin Corp. Kowsar Editorial Meeting 1 September 19th, 2013 Tehran, Iran.
1.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
University of Southampton, U.K.
Click to edit Master subtitle style JISC XYZ Project Principal Investigator: Peter Murray-Rust Project Team: Nick England, Brian Brooks Unilever Centre,
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
Crystallographic Data Publication at Source International Union of Crystallography Peter R. Strickland and Brian McMahon IUCr 5 Abbey Square Chester CH1.
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories.
Journals.iucr.org/f/ Acta Crystallographica Section F Structural Biology and Crystallization Communications An electronic journal for macromolecular structure.
The Crystallographic Information File (CIF) Description and Usage Ton Spek, Bijvoet Center for Biomolecular Research Utrecht University Sevilla, 14-Dec
ReQuest (Validating Semantic Searches) Norman Piedade de Noronha 16 th July, 2004.
XP New Perspectives on Microsoft Access 2002 Tutorial 71 Microsoft Access 2002 Tutorial 7 – Integrating Access With the Web and With Other Programs.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
ORGANIZING AND STRUCTURING DATA FOR DIGITAL PROJECTS Suzanne Huffman Digital Resources Librarian Simpson Library.
Update on the VERSIONS Project for SHERPA-LEAP SHERPA Liaison Meeting UCL, 29 March 2006.
1 Chuck Koscher, CrossRef New Developments Relating to Linking Metadata Metadata Practices on the Cutting Edge May 20, 2004 Chuck Koscher Technology Director,
1 The Chemistry Preprint Server: An Experiment in Scientific Communication James Weeks, ChemWeb Inc. 84 Theobalds Road, Holborn, London WC1X 8RR
Improved Reporting of Crystal Structures: the Impact of Publishing Policy on Data Quality Brian McMahon 1, Peter R. Strickland 1 and John R. Helliwell.
Thomson Scientific October 2006 ISI Web of Knowledge Autumn updates.
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
OARE Module 5A: Scopus (Elsevier). Table of Contents About Scopus (Elsevier) Using Scopus Search Page Results/Refine Search Pages Download, PDF, Export,
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
EBI is an Outstation of the European Molecular Biology Laboratory. Annotation Procedures for Structural Data Deposited in the PDBe at EBI.
Interactive visualization of data as a feature of online crystallography journal articles CODATA Conference 2008 Brian McMahon International Union of Crystallography.
Crystallographic Databases I590 Spring 2005 Based in part on slides from John C. Huffman.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
1 ARRO: Anglia Ruskin Research Online Making submissions: Benefits and Process.
Now launched! Visit nature.com/scientificdata Honorary Academic Editor Susanna-Assunta Sansone Advisory.
ITGS Databases.
Data Integration and Management A PDB Perspective.
Uganda Scholarly Digital Library (USDL) Makerere University’s Institutional Repository By Margaret Nakiganda URL:
1 EndNote X2 Your Bibliographic Management Tool 29 September 2009 Humanities and Social Sciences Resource Teams.
Data Harvesting: automatic extraction of information necessary for the deposition of structures from protein crystallography Martyn Winn CCP4, Daresbury.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Incentives for Biodiversity Data Publishing June 2011.
Hjw 1 IUCr working group: Diffraction Data Deposition  Members Steve Androulakis (TARDIS representative) John R. Helliwell (IUCr, Chairman of IUCr Journals.
Routine authoring and publication of enhanced figures IUCr submission for the ALPSP Award for Publishing Innovation 2008.
After the RAE: Continuing to manage research outputs Morag Watson Digital Library Development Manager University of Edinburgh.
Entering the Data Era; Digital Curation of Data-intensive Science…… and the role Publishers can play The STM view on publishing datasets Bloomsbury Conference.
CombeDay Making Data Openly Available Simon Coles.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Greater Visibility, Greater Access QSpace QSpace Queen’s University Research & Learning Repository.
Afternoon session: The archival problem and infrastructure for solutions Prof John R Helliwell Interactive Publications.
Moving on : Repository Services after the RAE
VI-SEEM Data Repository
‘The eCrystals Federation’ Management and Publication of Small Molecule Structure Data for the Whole Crystallographic Community S.J. Colesa*, J.G. Freya,
Developing Institutional Data Repositories
Presentation transcript:

Information Management and Publication in Crystallography I2S2 Workshop Future of Data Management Systems in the Structural Sciences, RAL, Oxon, 1 April 2011 Brian McMahon International Union of Crystallography 5 Abbey Square Chester CH1 2HU UK

International Union of Crystallography International Scientific Union Publishes 8 research journals: Acta Crystallographica Section A: Foundations of Crystallography Acta Crystallographica Section B: Structural Science Acta Crystallographica Section C: Crystal Structure Communications Acta Crystallographica Section D: Biological Crystallography Acta Crystallographica Section E: Structure Reports Online Acta Crystallographica Section F:Structural Biology and Crystallization Communications Journal of Applied Crystallography Journal of Synchrotron Radiation Publishes major reference work International Tables for Crystallography (8 volumes) Promotes standard crystallographic data file format (CIF)

Crystallographic (X-ray diffraction) experiment

Data can mean any or all of: 1. raw measurements from an experiment 2. processed numerical observations 3. derived structural information 4. variable parameters in the experimental set-up or numerical modelling and interpretation 5. bibliographic and linking information We make no fundamental distinction between data and metadata – metadata are data that are of secondary interest to the current focus of attention. Data relevant to publication (1) (2)(3) (4) (5)

Why publish data? Some reasons: To enhance the reproducibility of a scientific experiment To verify or support the validity of deductions from an experiment To safeguard against error To safeguard against fraud To allow other scholars to conduct further research based on experiments already conducted To allow reanalysis at a later date, especially to extract 'new' science as new techniques are developed To provide example materials for teaching and learning To provide long-term preservation of experimental results and future access to them To permit systematic collection for comparative studies

IUCr journal policy (0) Bibliographic and linking 'metadata' Standard propagation through publishing industry channels: Abstracting and indexing services Bibliographic databases Digital Object Identifiers (DOI) – registered through CrossRef OpenURL discovery services RSS feeds Indexing by Google Scholar, Microsoft etc. JISC #jiscopenbib Collaboration with Open Knowledge Foundation, Unilever Cambridge Centre for Molecular Informatics; support from Public Library of Science, Oxford University ('Open Citation' project) decouple bibliographic metadata from IP issues pragmatic metadata harvesting from diverse schemas

IUCr journal policy (1) Derived data For crystal/molecular structures with small unit cells (inorganic, metal-organic, organic) Atomic coordinates, anisotropic displacement parameters, molecular geometry and intermolecular contacts Experimental parameters, unit-cell dimensions, space group information Reference and modulated structure subsystems for aperiodic composite structures must be supplied in CIF format as an integral part of article submission and are freely available for download For biological macromolecular structures Atomic coordinates, anisotropic or isotropic displacement parameters, space group information, secondary structure and information about biological functionality must be deposited with the Protein Data Bank before or in concert with article publication; the article will link to the PDB deposition using the PDB reference code. Relevant experimental parameters, unit-cell dimensions are required as an integral part of article submission and are published within the article

IUCr journal policy (2) Processed experimental data For crystal/molecular structures with small unit cell (inorganic, metal-organic, organic) Structure factors Rietveld profiles must be supplied in CIF format as an integral part of article submission and are freely available for download. SHELXL instruction files are also required for validation For biological macromolecular structures Structure factors must be deposited with the Protein Data Bank before or in concert with article publication; the article will link to the PDB deposition using the PDB reference code.

IUCr journal policy (3) Primary experimental data For crystal/molecular structures with small unit cell and for biological macromolecular structures: IUCr journals have no current policy regarding publication of diffraction images or similar raw data entities. However, IUCr Commissions are interested in the possibility of establishing community practices for the orderly retention and referencing of such data sets, and the IUCr would like to see such data sets become part of the routine record of scientific research in the future. Typical size of raw data set (collection of diffraction images) ~ few Gb Not large enough to warrant dedicated data centres Too large for existing database operations (CCDC, PDB) Retention by individual scientists ~ 1 year Possibility of retention by laboratories/experimental facilities o Distributed archive o Requires identification/linking protocols to publications

Validation of published data For crystal/molecular structures with small unit cell (inorganic, metal-organic, organic) All structural models are processed upon article submission by the IUCr checkCIF suite and the resultant report scrutinised during the peer review process Structure factor files are now analysed with checkCIF and a report also generated for review checkCIF reports are published as supplementary documents for every Acta Cryst. E article SHELXL refinement instruction files are required For biological macromolecular structures Structural models are annotated and curated by PDB staff during deposition; authors are consulted and given opportunities for revision if errors or anomalies are found Structure factor files are checked at the PDB against the deposited structure Data validation reports must be supplied to IUCr journals

Data visualization Three-dimensional structural models can be visualized interactively in IUCr journals in several ways: From journal contents lists '3d view' provides Jmol applet (all CIFs) '3d view' allows browsers to launch helper applications such as Mercury, Rasmol (multi- structure CIFs) Within journal articles Enhanced Jmol figures created by authors Links to PDB visual representations

Data flow in crystallography Experiment (synchrotron or laboratory) Structure solution and refinement (laboratory) IUCr journals Other journals Chemistry databases (CCDC) Biological structure databases (PDB) Data reduction Raw experimental data Reduced/processed data Derived data retained by scientist archived at facility (~6 months) deposited published/disseminated validated

Publication flow in IUCr journals Experiment (synchrotron or laboratory) Structure solution and refinement (laboratory) IUCr journals Chemistry databases (CCDC) Data reduction Raw experimental data Reduced/processed data Derived data Published article raw data (imgCIF) CIF file.cif Author structure factors.fcf ValidationPeer review preprint.pdf.html Technical editing Publication article of record.sgml Bibliographic databases (ISI, etc.).fcf.cif.pdf.html.xml,.rdf etc..cif.fcf

Crystallographic Information Framework File format (CIF) –Tool chain: parsers, libraries, editors, database loaders –Fortran 77, C, C++, Python, Perl –Interchangeable with other formats (PDBML, CML) Information model schemas (CIF dictionaries) Schema language (Dictionary Definition Language, DDL) Domain ontologies (coreCIF, mmCIF, pdCIF, imgCIF) Integrated approach –Does not differentiate between 'data', 'metadata', 'publication content' etc. International Tables for Crystallography, Volume G: Definition and Exchange of Crystallographic Data –Sydney Hall and Brian McMahon (Editors) –ISBN: –Hardcover. 606 pages. First edition August 2005 Information exchange/archive standard

Two questions from COMCIFS Chair 1.What software is in use or under development at Diamond for working with imgCIF/CBF files? 2.NeXuS is very much a raw data format and requires definition of more metadata. This could make imgCIF/CBF an attractive target for software development, once the problem of detector/beamline integration is taken care of - i.e. the detector manufacturer has to provide some information, the beamline some other information (about geometry and optics mostly). What beamline difficulties have there been in supporting imgCIF output? mailto:

Authoring tools (1) publCIF Desktop CIF publishing editor, validator and formatter for small-molecule, powder, modulated and incommensurate structure CIFs Available for Linux, Microsoft Windows, MacOS X publCIF takes a CIF and prepares a formatted paper (Preprint) in the style of Acta Crystallographica Sections C and E. CIF and Preprint are presented side-by-side and are both editable. Changes made to one are applied to the other as the user types. Starting with a CIF resulting from a structure refinement, use publCIF to: add data items required for publication, using simple wizards that check input and access both dictionary information and data used in previous CIFs prepare standard and customized geometry tables using simple spread sheets write paper using a word-processing environment check CIF for both syntax and completeness, with facility to load directly into online checkCIF print or export Preprint of a paper check references store or retrieve author details from a local database

Authoring tools (2) printCIF Online CIF publishing validator and formatter for small-molecule, powder, modulated and incommensurate structure CIFs printCIF takes a CIF and prepares a formatted paper (Preprint) in the style of Acta Crystallographica Sections C and E. Starting with a CIF resulting from a structure refinement, use printCIF to: create HTML version of article and supplementary information with live Jmol visualization of molecular geometry create PDF version of article create PDF version of article and supplementary document create PDF version of supplementary material run online checkCIF validation visualize structure in Jmol

Authoring tools (3) experimental tables Online service for formatting complex geometry and experimental tables for small-molecule, powder, modulated and incommensurate structure CIFs Takes a CIF and prepares formatted experimental or geometry tables (most useful for Acta Crystallographica Section B but also suitable for other IUCr journals). Starting with a CIF resulting from a structure refinement, use this service to: create HTML or RTF versions of experimental and geometry tables reorder and relabel experimental tables show only information from selected data blocks add data items not required by the default request list reorder and suppress entries in molecular geometry tables reformat geometry tables visualize geometry in Jmol save CIF modified in accordance with reformatted presentation

Authoring tools (4) publBio Online service for authoring and editing structural biology or crystallization communications using macromolecular CIFs (mmCIF) Provides a personal web space for creating, editing, managing and submitting these categories of articles to Acta Crystallographica Sections D and F. Starting with a local mmCIF, a PDB reference, or an empty template, use this service to: add data items required for publication, using a structured forms-based graphical user interface that checks input and accesses both dictionary information and data used in previous CIFs write paper using WYSIWYG and predictive type- ahead forms check and format citations and literature references incorporate static and Jmol interactive figures create HTML, OpenOffice, RTF or PDF versions of an article lookup common crystallization solution components database maintain a local database of author information lookup references from Medline or Crystallography Journals Online

Authoring tools (5) Enhanced figure toolkit Create Jmol enhanced figures prior to submission of articles to IUCr journals during submission to IUCr journals from submitted CIF for structural articles from other CIF uploaded by authors from structures already published in IUCr journals from structures deposited in the PDB inorganic crystal lattices organic or metal-organic complexes biological macromolecules multiple views specified by author supports arbitrary Jmol scripts

The future Not all crystallography journals accept structural data for deposit, and IUCr journals do not (at present) accept primary data sets. Ensuring that data are retained in the permanent record of science will require: increased willingness by journals to deposit structural data and structure factors increased willingness of structural databases to store and curate structural data, structure factors and perhaps primary data increased collaboration between journals and structural databases that archive data sets assignment of persistent identifiers to unpublished data held in domain-specific repositories (e.g. eCrystals at Southampton) institutional repositories synchrotron and other experimental facilities image data stores (e.g. Atlas, Rutherford Appleton Laboratories) community codes of practice for storing, managing, archiving and accessing research data sets standard 'compound document' descriptions to link data and publications

Thank you