Download presentation
Presentation is loading. Please wait.
Published byPaige Poole Modified over 10 years ago
1
Chemistry research data in the modern age: A clear need for curation expertise Simon Coles School of Chemistry, University of Southampton, U.K. s.j.coles@soton.ac.uk
2
Data Generation Synthesis Data Collection Data Workup Data Processing Publication
3
Data Types G bytes M bytes Lab / Institution Subject Repository / Data Centre / Public Domain k bytes RAW data DERIVED data RESULTS data
4
Incentives and Drivers Chemists dont think about their data! They need to understand that their data is valuable and has a use beyond that of an immediate gain, before they will consider curation issues. So what are the incentives and drivers? –Data Management –Data Deluge –Publishing Data –Validation, Assessment and Peer Review –Re-analysing Data –Data Reuse and Derivative Studies –Publishing and Funding Mandates
5
Curation Incentives - Data Management, Deluge & Publishing Data from experiments conducted as recently as six months ago might be suddenly deemed important, but those researchers may never find those numbers – or if they did might not know what those numbers meant Lost in some research assistants computer, the data are often irretrievable or an undecipherable string of digits To vet experiments, correct errors, or find new breakthroughs, scientists desperately need better ways to store and retrieve research data Data from Big Science is … easier to handle, understand and archive. Small Science is horribly heterogeneous and far more vast. In time Small Science will generate 2-3 times more data than Big Science. Lost in a Sea of Science Data S.Carlson, The Chronicle of Higher Education (23/06/2006)
6
Curation Incentives - Data Management, Deluge & Publishing 30,000,000 2,000,000 450,000
7
Curation Incentives - Data Management, Deluge & Publishing
8
Separating Data from Interpretations Underlying data (Institutional data repository) Intellect & Interpretation (Journal article, report, etc)
9
The eCrystals Data Repository An Institutional Repository http://ecrystals.chem.soton.ac.uk
10
The Repository for the Laboratory Search / Browse Deposit Create new compound Add experiment data and metadata
11
Curation Incentives - Validation & Peer Review
12
Curation Incentives - Raw Data Re-analysis Good dataDifficult data You never know when data might have to be revisited or new innovations will allow re-interpretation!
13
Curation Incentives - Funding and/or publishing mandates Mandates to store / make data available RCUK statement
14
Curation Incentives - Derivative Science Starting points for new science Derivation of knowledgebases
15
Curation Issues Need to engage stakeholders throughout the whole research data lifecycle: –Instrument manufacturers, –scientists, –archivists, –librarians, –subject repositories, –data centres, –publishers, –funders, –data miners & information providers
16
Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data
17
Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data –Future proofing… –Technology developments –eScience
18
Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community
19
Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community What data is worth storing? –Estimated that the real cost of a crystal structure is £75 - £100 ($200) –But what about the cost of producing the crystal? –Priceless! –The crystal was synthesised in a specialised laboratory, by highly trained researchers under a specific research program –A laboratory, researcher or scheme of work is a transient or evolving entity –As much data as possible must be acquired and future-proofed whilst the analyst has the substance to hand
20
Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community What data is worth storing? Provenance, workflow and rights protection
21
Curation Issues File formats, complexity and specialisation Data corruption and bit rot Quantity of data Catering for a whole community What data is worth storing? Provenance, workflow and protection of rights Available expertise, library/information services structure Cost and policy Business models –Subject librarian model - working closely with practitioners –New funding/structure models to support open data as OA takes off –Working group to assess the volume and diversity of research data –JISC funded survey - Cost of preserving research data –Commercialisation of knowledge derived from collections of data
22
Dealing with Data Report, June 2007 Recommendations 1 JISC should develop a Data Audit Framework to enable all Universities & colleges to carry out an audit of departmental data collections, awareness, policies & practice… Each Higher Education Institution should implement an Institutional Data Management, Preservation & Sharing Policy, which recommends data deposit in an appropriate open access data repository and/or data centre where these exist.
23
Institutional Structure Encourage restructuring through strategic funding Rechannel existing funding routes Financial structure – money for self archive or OA publishing Physical structure – embed LIS/curation staff in departments for advocacy – need to go native. Library / Information services need to be introspective / reinvent
24
Advocacy Younger digital generation Elders will not listen Method to engage at departmental level Funders undervaluing work – need enlightening
25
Funding Small science Low budget / funding Hypo publishing Unsupported Initial target areas that are safe – i.e. no sensitive data
26
Practice Small science vs big science Instrumentation vs manual Automate data capture Heterogeneity/variety in practice Problems same in industry
27
Tools Seamless Simple to use Low barrier to use Integrated into familiar environment Self describing (generrate provenance and preservation metadata in the background) Tagging / controlled vocab tools / servers Vocab checking Browser tools (familiar to youth) Thin client tools – repository lite. Minimal management. Highly distributed repositories
28
eInfrastructure Semantic / controlled vocabulary central services
29
Economic models and value Data *NOT* valueless once published (EPSRC train of thought) What is the *value* of departmental level data – this is not necessarily monetary Department, institution, individual, data centre, pharma, government, research council, public, third party services/businesses We undervalue data Subject repository economic sustainability Evidence to back up advocacy
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.