Experiences with Repositories and Blogs in Laboratories or ‘R4L: The Repository for the Laboratory’ Leslie Carr, Simon Coles & Jeremy.

Slides:



Advertisements
Similar presentations
Partnering with Faculty / researchers to Enhance Scholarly Communication Caroline Mutwiri.
Advertisements

S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Usability WS, NeSC Jan 06 Enabling the reusability of scientific data: Experiences with designing an open access infrastructure for sharing.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Linking Data and Publications: the Chemistry Way Simon Coles School of Chemistry, University of Southampton, U.K. CLADDIER workshop.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
© S.J. Coles 2006 eCrystals: A Route for Open Access to Small Molecule Crystal Structure Data Simon Coles School of Chemistry, University of Southampton,
Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath
A centre of expertise in digital information management UKOLN is supported by: UK Perspectives on the Curation and Preservation of Scientific.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
Organising and Documenting Data Stuart Macdonald EDINA & Data Library DIY Research Data Management Training Kit for Librarians.
Open Access: what is it about…. l Improving access to peer reviewed original research literature l Improving the use of the literature and data l Improving.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
University of Southampton, U.K.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
© S.J. Coles 2006 Data Management in the Chemistry Domain Simon Coles School of Chemistry, University of Southampton, U.K.
Simon Briggs Department of Clinical Pharmacology University of Oxford 13 th June 2008 Data management – A researchers prospective.
© S.J. Coles 2005 eChemInfo2005 Open Archives as a Route for Capture, Dissemination and Access to Chemical Data and Information Simon Coles School of Chemistry,
Data, data standards and sharing Dr Daniel Swan Bioinformatics Support Unit
21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.
Computerised Maintenance Management Systems
Anthony Atkins Digital Library and Archives VirginiaTech ETD Technology for Implementers Presented March 22, 2001 at the 4th International.
The Value of a Unique Researcher Identifier to ChemSpider Projects Antony Williams ORCID Meeting, Boston, May 18 th 2011.
This work is licensed under a Creative Commons Attribution-NonCommercial- ShareAlike 3.0 Unported License.Creative Commons Attribution-NonCommercial- ShareAlike.
Distributed Access to Data Resources: Metadata Experiences from the NESSTAR Project Simon Musgrave Data Archive, University of Essex.
The DSpace Course Module – An introduction to DSpace.
Libra: Thesis and Dissertation Submission. What is Libra? UVA’s institutional repository, providing online archiving and access for the scholarly output.
University of Bergen Library Electronic publishing Bergen – Makerere visit February 2005.
The Information Environment for Neuroscientists David R Newman
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
BMC Open Access Colloquium, 8 February Morgan: "Open Access Repositories"
Scientific Data and Electronic Publishing Renze Brandsma, Head, Digital Production Centre University of Amsterdam Maarten Hoogerwerf, Project Manager,
Alma Swan Key Perspectives Ltd Truro, UK.  Researchers’ attitudes to data sharing  Data scientist skills  Both self-archived at:
11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
A centre of expertise in digital information managementwww.ukoln.ac.uk Making Effective Use Of Benchmarking Tools Brian Kelly UKOLN University of Bath.
Electronic labnotes Mari Wigham COMMIT/. Information WUR  Organising, sharing, finding and reusing data  Expertise in: ● Modelling data.
Funded by: © AHDS Preservation in Institutional Repositories Preliminary conclusions of the SHERPA DP project Gareth Knight Digital Preservation Officer.
It’s the data that makes a paper Joerg Heber Executive Editor Nature Communications.
26/05/2005 Research Infrastructures - 'eInfrastructure: Grid initiatives‘ FP INFRASTRUCTURES-71 DIMMI Project a DI gital M ulti M edia I nfrastructure.
Computerised Maintenance Management Systems
CombeDay Making Data Openly Available Simon Coles.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Perspectives from the Next Generation of Repository Managers
The reference management software -also called citation management software, citation manager or personal bibliographic management software- are programs.
Research Data Management 26 th April 2016 Federica Fina, Data Scientist, University of St Andrews Library.
Digital Repository DDUB Learning and Research Resources Center (CRAI) University of Barcelona 2016.
ARCHER Building data and information management tools for the complete research life-cycle July 2006.
The Information Environment for Neuroscientists David R Newman
NRF Open Access Statement
Accessing the VI-SEEM infrastructure
Open Exeter Project Team
Ian Bruno, Suzanna Ward The Cambridge Crystallographic Data Centre
eCrystals Federation: Open Repositories for global Open Science
VI-SEEM Data Repository
Research on Data Curation and Repositories
Developing Institutional Data Repositories
The role of metadata in census data dissemination
eCrystals Federation: Open Repositories for global Open Science
Presentation transcript:

Experiences with Repositories and Blogs in Laboratories or ‘R4L: The Repository for the Laboratory’ Leslie Carr, Simon Coles & Jeremy Frey University of Southampton, U.K. This work is licensed under a Creative Commons Licence Attribution-ShareAlike 3.0

The Problem: Data Generation Synthesis Characterisation

The Problem: Data Management “Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant” “Lost in some research assistant’s computer, the data are often irretrievable or an undecipherable string of digits” “To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data” “Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science.” ‘Lost in a Sea of Science Data’ S.Carlson, The Chronicle of Higher Education (23/06/2006)

The Problem: Data Deluge There are approx. 30 million known chemical compounds Approx. 2 million crystal structures have been determined There are less than 0.5 million published crystal structures residing in (licensed) curated databases There are just a few thousand ‘open’ crystal structures The primary cause of this is the current data publication process, which is tied to journal articles and peer review 40 years ago a PhD student would determine about 3 crystal structures during the course of their study – this can now be easily achieved in a day

The Problem: Publishing Data Spectroscopic analysis is often performed to ensure a reaction is proceeding according to plan – as a result <5% are published (via a process with heavy information loss)

The Problem: Reproducible Experiments Poor availability and description of experiments and arising data in the current literature How can we validate this data? Open data will need to be self explanatory and prove its own ‘correctness’ Requirement for an ‘experiment audit trail’ (Published) Science should be reproducible Requirement to provide sufficient data and metadata to back up an experiment description

The Solution Underlying data (Institutional data repository) Intellect & Interpretation (Journal article, report, etc)

Fitting into the Information Environment Institutional Data Sources

High Level Relationships

Repository Design Scenarios – assist design team in understanding each other Feedback from SPECTRa – based questionnaire First design: build one to throw away (out of the box EPrints) Population of disposable repository informed design of actual repository Population informed workflow capture and analysis Manufacturer discussions Requirements capture with publishing community

Questionnaire Results #1 Respondents comprised PhD students (55%), postdoctoral workers (18%) and faculty staff (19%) and totalled 110 people. Primary use of computers and the internet is for information researching, writing papers and reports, working up data and instrument control. Computers are used regularly for everyday work, but much less so for social networking and other ‘modern’ uses. Mainly highly established community standard applications, software and file formats are used, with less use of modern data sources. There is still extensive use of printed paper copies of PDF files, which are generally stored on personal computers without any structure or use of reference software. A researcher will have PDF files on their computer and prefers to communicate them by sending PDF’s to collaborators.

Questionnaire Results #2 About 66% have had to generate electronic supplementary information to supplement their journal articles. There is a predominance for self teaching use of software, as opposed to being taught professionally. Supplementary information is mainly generated and stored in proprietary formats, although there is considerable use of ‘popular’ formats (eg Microsoft Office). There is a preference, or requirement, to keep a hardcopy of data as well as an electronic one. Experimental and analysed data are generally kept on a group or instrument controlling computer, however there is often a need to keep a hardcopy (eg in a lab notebook). About 66% have not heard of ‘InChI’, ‘metadata’ or ‘JCAMP’ format, whilst around 50% have not heard of ‘DOI’, ‘Open Access’, ‘Semantic Web’ or ‘RDF’.

Questionnaire Results #3 There is a considerable lack of awareness surrounding repositories and their function. There is a requirement for search and discovery to be based predominantly on structure, formula, author or keyword. The most attractive purpose of a repository would be for the storage of a ‘permanent record’. Most chemists would comply if deposit in a repository was a mandatory requirement of funding or publication, however virtually all are ignorant of what the position of these organisations is with respect to open access and deposit in a repository.

The Plan

Workflow Analysis UV-VisPowder XRDNMRMass Spec Sufficient similarity to design a generic deposit / ingest process

The R4L Repository Deposit / Ingest Create new compound (parent record) Add new experiment type Add metadata and upload data files

The Probity Service Process to assert originality of a piece of work / repository record Incorporate into ePrints core software?

Repository Search / Browse Search / Browse Crucial record metadata for Data Management, Search/Browse and Discovery: Date; Instrument; Location; Compound Name; Experiment Type; Researcher

Report Generation Too cumbersome and inflexible (revisit?) Requirement for ‘familiar’ software Suitable for informatics, but not routine reporting

Report Generation Ability to import repository data into software and easily edit Need to bring the repository to the researchers ‘desktop’ Demonstrator employed Sharepoint to store templates (functionality to be incorporated into repository software?) Does this really bring anything new to reporting research?

Analysis & Discussion: Blogging Experiments A repository can… Allow one to put, store and get digital objects Provide minimal search and browse functions NOT provide the presentation and discussion functions essential to working up a scientific study Social networking tools and approaches can provide a way…

Getting data into Blogs Developing relationship between Blog and R4L repository Repository back-end, Blog for sharing data with collaborators and developing ideas / conclusions based on data ‘Live copy’ application only – development?

Enabling Research Enables ‘geographically distributed collaborative research’ Useful approach for sharing ‘failed’ experiments?

Open Notebook Science

Automatic Blogging by Machines

Automatic Logging of Sensor Data Timeline visualisation – instant detection of erroneous event Assists in analysing inconsistencies in datasets

Comments and Annotation Chemists need to scribble! A picture says a thousand words! Need for more advanced Blog tools / technology

R4L End-to-End Overview

Usability Low barrier to use; familiarity; flexibility; quick gain A specification/requirements for data repository based software?

Problems Encountered Over ambitious in affecting the attitudes of instrument manufacturers at such an early stage Attitudes of chemists towards changes in laboratory working procedure Attitudes of chemists towards change in the publication system Input from journal publishers before demonstrator / prototype available Blog software restrictive Extreme diversity and number of file formats employed for analytical chemistry

Future Directions A useful demonstrator to the practising scientist of the value of a laboratory repository… Further advocacy required in preservation and data management areas Feasibility of a departmental data repository? An exemplar for: The institution - towards an institutional data preservation policy? Publishers – improved handling of supplementary information Instrument manufacturers – will respond to the demands of their customer base Follow on funding: eChemistry; myExperiment Best practice in validation and reproducibility of experiments Develop relationship between data repository & ‘Blog approach’