21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle.

Slides:



Advertisements
Similar presentations
IATUL Porto, May 21, 2006 DOI and e-Science Dr Anne E Trefethen Oxford e-Research Centre
Advertisements

Grey Literature, Institutional Repositories and the Organisational Context Simon Lambert, Brian Matthews & Catherine Jones Business & Information Technology.
Comb-e-Chem Jeremy Frey Sept 2003 From e-Science to Jeremy Frey School of Chemistry University of Southampton, UK X-ray single Mol STM.
The way to open resources Laurent Romary CNRS. Two aspects of scientific communication Research papers –All types (Conferences, journals, grey literature.
Panel 2 – Promoting Re-Use of Scientific Collections John Harrison SHAMAN Project University of Liverpool
CoAKTing IFD Dave in Hawaii. 2 CoAKTing IFD n Objective is to advance the state of the art in collaborative mediated spaces for distributed e- Science.
Creating Institutional Repositories Stephen Pinfield.
A centre of expertise in data curation and preservation DCC/NeSC eScience Workshop, June 2008 Working in partnership with the eScience community This work.
Crystal Structure EPrints: Source Through the Open Archive Initiative S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge.
Less is More Lightweight Ontologies and User Interfaces for Smart Labs J. G. Frey, G. V. Hughes, H. R. Mills, m. c. schraefel, G. M. Smith, David De Roure.
Opening the Research Data Lifecycle Workshop Capturing and Sharing Research Data Simon Coles School of Chemistry, University of Southampton, U.K.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
RCUK, Octiber Archiving research data and research publications. Dr Leslie Carr, Intelligence, Agents Multimedia, University of Southampton Dr Simon.
Digital | Curation | Centre Adding value to open access research data: reflections on the process of data curation Dr Liz Lyon, DCC Associate Director.
Terminologies: An e-Science perspective Nicholas Gibbins Intelligence, Agents, Multimedia University of Southampton.
UKOLN is supported by: Emergent technologies & digitisation: the institutional impact. Liz Lyon & Kevin Edge VCs Retreat, October a.
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
EBank UK CCLRC Workshop February eBank and CCLRC Workshop February 2005 University of Bath.
Digital Repositories: interoperability & common services Closing Remarks Dr Liz Lyon, UKOLN, University of Bath, UK
Building a Virtual Research Environment for the Humanities Ruth Kirkham – Project Manager John Pybus – Technical Support
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
The Central Role of Data ‘Capturing and Sharing Chemistry Research Data’ Simon Coles School of Chemistry, University of Southampton, U.K.
28 October 2005Jeremy Frey, University of Southampton1 “The CombeChem Experience” CICC Workshop 28 October 2005 Bloomington Indiana.
INFSO-RI Enabling Grids for E-sciencE Grid & Data Preservation Boon Low System Development, EGEE Training National.
University of Southampton, U.K.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
1 Digital Libraries and Evidence in the Developing World Context Dr. Jon Ferguson Senior Health Database Scientist IMMPACT Project University of Aberdeen.
Developing PANDORA Mark Corbould Director, IT Business Systems.
Alma Swan Key Perspectives Ltd Truro, UK. Key Perspectives Ltd.
Managing the Record of Research At the Smithsonian Using SIdora SAA Research Forum August 12, 2014.
Publishing and Visualizing Large-Scale Semantically-enabled Earth Science Resources on the Web Benno Lee 1 Sumit Purohit 2
Meeting Capture and Structural Replay Compendium in Meeting Replay Web Interface BuddySpace I-X Process System Mars Exploration Mission
The Information Environment for Neuroscientists David R Newman
CSED Computational Science & Engineering Department CHEMICAL DATABASE SERVICE The Current Service is Well Regarded The CDS has a long and distinguished.
The Value of Geospatial Metadata Metadata has tremendous value to Individuals within your organization, as well as to individuals outside of your organization.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
Smart Lab, Smart Tea H. R. Mills, G. V. Hughes, m. c. schraefel, J. G. Frey, G. M. Smith, David De Roure CombeChem Project Electronics and Computer Science.
Joint agINFRA & SCI-BUS workshop, 30/05/2013, Budapest, Hungary FP 7-INFRASTRUCTURES programme agINFRA Joint agINFRA & SCI-BUS workshop agINFRA.
Topic Rathachai Chawuthai Information Management CSIM / AIT Review Draft/Issued document 0.1.
“Science Ajar”. “Chemistry Clouds and Crowds” Jeremy G. Frey 8 Dec 2008.
11 Curation of Chemistry Data from the Laboratory to Publication Jeremy Frey & Simon Coles School of Chemistry University of Southampton Jeremy Frey &
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
Deepcarbon.net Xiaogang (Marshall) Ma, Yu Chen, Han Wang, John Erickson, Patrick West, Peter Fox Tetherless World Constellation Rensselaer Polytechnic.
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Enterprise Solutions Chapter 10 – Enterprise Content Management.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
UKOLN is supported by: Introduction to UKOLN Dr Liz Lyon, Director UKOLN, University of Bath, UK Grand Challenge Meeting, June a centre.
The Collaborative Semantic Grid David De Roure University of Southampton, UK
ICT in Classroom Prepared by: Ymer LEKSI Kukes
DANIELA KOLAROVA INSTITUTE OF INFORMATION TECHNOLOGIES, BAS Multimedia Semantics and the Semantic Web.
CombeDay Making Data Openly Available Simon Coles.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
1 DMS-DQS-SUPSC03-PRE-12-E © DEIMOS Space S.L., 2007 A Semantic Data Grid for Satellite Mission Quality Analysis Reuben Wright Deimos Space.
Oct 2004 Jeremy Frey Informatics1 Automation and Semantics: The CombeChem Experience Jeremy Frey CombeDay Feb 2005.
David De Roure Workflows in Support of Large-Scale Science Provenance, a.
Cameron Neylon School of Chemistry, University of Southampton & Science and Technology Facilities Council, Rutherford Appleton Laboratory, Oxfordshire.
Fedora Commons Overview and Background Sandy Payette, Executive Director UK Fedora Training London January 22-23, 2009.
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
The Information Environment for Neuroscientists David R Newman
Open Exeter Project Team
Workflows in archaeology & heritage sciences
Metadata for digital long-term preservation
LOD reference architecture
JISC Joint Programmes Meeting 2005
School of Information Studies, Syracuse University, Syracuse, NY, USA
Presentation transcript:

21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow The curation of laboratory experimental data as part of the overall data lifecycle Jeremy G.Frey School of Chemistry, University of Southampton, UK 21 Nov 2006 DCC Conference, Glasgow

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 If you do things right at the start then all the following processes are much easier! Exponentially growing amount of data - the future overwhelms the past

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 The Comb e Chem Project  End to End linking of data and information   So collect data with regard to how it could eventually be used  Make sure the metadata is of high quality  Record properly at source in Digital Form  The Chemistry Lab  People & Machines working together

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Combechem Smart Lab R4L e-Bank E-Malaria Instruments on the Grid BioSimGrid Statistics

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Plan & COSHH Digital Model Information Integration Report Knowledge Goal Literature Synthesis not just one laboratory but many co-laboratories working together Analysis Smart Laboratory Smart Storage Smart Dissemination Smart HCI The concept of Source Smart Workflow

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 If only I knew exactly how she did this experiments I know all this supplementary information could be useful but will people really remember the format? Is it worth all the hassle? I wish I could get the numbers from this graph - the pdf is not much use. I wish I had recorded things at the start the way I do now….. Typical Laboratory

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 First, they do an online search Need to make the data available Need to be able to find it But how to expose it?

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 I am sure we collected that information a few years ago… The details should be in her thesis….. Can you read what he says here….? Can you find the file of data that were used to make the plot? Some of these problems are due to the lack of information recorded at the time. Others are due to loss of information over time.

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 What are the people up to?  Capture Data and Context  People  Process  Environment

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Permanent, documented and primary record of laboratory observations

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Observations are never collected on note pads, filter paper or other temporary paper for later transfer into a notebook If you are caught using the “scrap of paper” technique, your improperly recorded data may be confiscated by your TA

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 COSHH L everage off things we already have to do – “We have a cunning plan”

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006

Pub-Sub systems provide the flexible & extensible approach to distribution of real time laboratory monitoring & archiving Smart Laboratory Spaces

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 But what about the laboratory environment? “I just realized, Howard, that everything in this apartment is more sophisticated than we are”

Semantic DataGrid  CombeChem used, tested & strained the Semantic Web for  Enhanced (annotated) DataGrid over multiple diverse stores  Storage of Provenance Information  Some Data Storage  Annotated multimedia streams  Units & Propoerties Ontology  Multiple Triple Stores

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Laboratory “Blogs”  Laboratory notebook is a Blog  Encourage and facilitate collaboration  Need a data repository behind the Blog  R4L  E-Bank  Flexible  Service oriented approach being developed  A VRE

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Instrument Blog ‘Blog-jects’

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 The ‘Scientific Blog’ is being tried in an attempt to combine laboratory notebooks and publication

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Format Issues – everyday and for the long term

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Note the use of “YouTube” An experiment that failed… Publishable? Useful?

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Record the ‘Scientific Conversation’ – this part of the record often exists only in the ‘grey literature’ CoAKTing Memetic

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Laboratory IRs and Information Management

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Repositories

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Validation  Increasing the value of data  How to bring all the necessary information together to enable appropriate validation  Increasingly difficult & expensive to achieve  Need provenance and context  Essential step otherwise just a collection of items

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Why? Publishing Data and Information Loss

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 SVG “active” graphics Link to data, follow links back to the raw data archive Link to simulation, full simulation data archived in BioSimGrid R4L Paper organized using RDF

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Access to information requires crossing administrative domains Researcher National Archive Research Group Institution International Database Research Group

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Subversive and furtive sharing & exploitation of data in virtual space Data CAS RDF OAI Taxi E- user Labs Digital Repository

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 He is charged with expressing contempt for meta-data

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Metadata Lifecycle  Creation and maintenance of metadata  Need a metadata infrastructure as well as a data infrastructure  Capture process as well as results  Automatic metadata generation when possible  Human annotation will always be needed

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Plans  Plans are useful  This is the way things are supposed to be done  The Plan provides a digital context so increases the value of planning  Key to our ‘Smart Lab’ approach….  Is it the best way?

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 Who is responsible  Context is crucial for curation  every person, on each step of the process of converting data to knowledge  Need to consider the future access to this information by themselves and others.

21 Nov 2006 Jeremy G. Frey University of Southampton DCC Conference Glasgow Information Providers Information Consumers These are the same people – if we can ‘talk’ to ourselves efficiently over time then that is a good start to be able to ‘talk’ to others

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 All I am saying is that now is the time to develop the technology to deflect an asteroid We must speed up the knowledge discovery process

21 Nov 2006Jeremy G. Frey University of Southampton DCC Conference 2006 PEOPLE  Southampton ECS, MATHS & CHEMISTRY  IT-INNOVATION  BRISTOL  UKOLN  CCLRC  INDIANA  SYDNEY  MANCHESTER  EPRSC e-Science & Chemistry Programmes  JISC e-Infrastructre  DTI  See web site for full details and links 