Linking Data and Publications: the Chemistry Way Simon Coles School of Chemistry, University of Southampton, U.K. CLADDIER workshop Chilworth, Southampton 15 th May 2007 This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0
The Research Data Lifecycle Research & e-Science workflows Aggregator services: national, commercial Repositories : institutional, e-prints, subject, data, learning objects Data curation: databases & databanks Validation Harvesting metadata Data creation / capture / gathering: laboratory experiments, Grids, fieldwork, surveys, media Deposit / self- archiving Peer-reviewed publications: journals, conference proceedings Publication Validation Data analysis, transformation, mining, modelling Searching, harvesting, embedding Presentation services: subject, media-specific, data, commercial portals Resource discovery, linking, embedding Linking
Current Situation - Data Deluge 30,000,000 2,000, ,000
Current Situation – Data and Publishing
Separating Data from Interpretations Underlying data (Institutional data repository) Intellect & Interpretation (Journal article, report, etc)
The eCrystals Public Data Archive
Laboratory IRs and Data Management
The R4L Repository Deposit Search / Browse Create new compoundAdd experiment data and metadata
Aggregator services Institutional data repositories Deposit, Validation Publication Validation Data analysis Search, harvest Presentation services / portals Data discovery, linking, citation Laboratory repository Deposit eCrystals Federation Model Publishers: peer- review journals, conference proceedings, etc Curation Preservation Subject Repository Institution Library & Information Services Data creation & capture in Smart lab Data discovery, linking, citation Search, harvest Deposit
Metadata standards: Dublin Core About 15 core elements
Metadata Publication ecrystals.chem.soton.ac.ukecrystals.chem.soton.ac.uk/perl/oai2
Metadata Publication Using simple Dublin Core Crystal structure Title (Systematic IUPAC Name) Authors Affiliation Creation Date Additional chemical information through Qualified Dublin Core Empirical formula International Chemical Identifier (InChI) Compound Class & Keywords Specifies which datasets are present in an entry DOI Rights & Citation Application Profile
Aggregating Datasets CCDC CDS
Aggregating Datasets
Search and Discovery
rnals/ProjectProspect/index.asp Controlled Vocabulary and Semantics
Linking Data and Publications Link data and associated publications Dataset annotated with metadata Semantic publishing on WWW and in journals bank-uk/pilot/
The Future? Database Citation Services Literature Citation Services Controlled Vocabulary & Semantics