Download presentation
Presentation is loading. Please wait.
Published byGerard Fletcher Modified over 9 years ago
1
Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory brian.matthews@stfc.ac.uk
2
Scientific computing develop and operate computing infrastructure - HPC, PB Datastore, s/w, data management… Funds and operates large scale science for UK Research base - physics, astronomy - chemistry, materials ESO: Alma Array STFC
3
Major Science Facilities Big Science Particle Physics - exploring the very small Space Science - exploring the very large Small Science Understanding the world around us at a molecular level Lasers, Neutron & Light Source – ISIS & Diamond
4
Facilities Support Big Facilities for Small Science Diamond ISIS CLF
5
Science at STFC Facilities data Computing Analysis Modelling knowledge beam sampleImaging detector Neutrons and photons Provide complementary views of matter: Photons “see” electric charge – high atomic number nuclei Neutrons “see” nucleons – especially hydrogen atoms
6
The science we do - Structure of materials Fitting experimental data to model Bioactive glass for bone growth Structure of cholesterol in crude oil Hydrogen storage for zero emission vehicles Magnetic moments in electronic storage ~30,000 user visitors each year in Europe: –physics, chemistry, biology, medicine, –energy, environmental, materials, culture –pharmaceuticals, petrochemicals, microelectronics Longitudinal strain in aircraft wing Diffraction pattern from sample Visit facility on research campus Place sample in beam Billions of € of investment –c. £400M for DLS –+ running costs Over 5.000 high impact publications per year in Europe –But so far no integrated data repositories –Lacking sustainability & traceability
7
Similar architecture use for DLS Scaling is a constant concern Data rates keep increasing 70TB per month and rising Tailored ICAT Reengineered StorageD
8
Proposals Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment. Experiment Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team Analysed Data You will have the capability to upload any desired analysed data and associate it with your experiments. Publication Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications. B-lactoglobulin protein interfacial structure Example ISIS Proposal GEM – High intensity, high resolution neutron diffractometer H2-(zeolite) vibrational frequencies vs polarising potential of cations Central Facility Secure access to user’s data Flexible data searching Scalable and extensible architecture Integration with analysis tools Access to high- performance resources Linking to other scientific outputs Data policy aware http://code.google.com/p/icatproject/
9
Investigation PublicationKeywordTopic Sample Sample Parameter Dataset Dataset Parameter Datafile Datafile Parameter Investigator Related Datafile Parameter Authorisation Core Scientific Metadata Model (CSMD) The Core Metadata model forms the information model for ICAT. Designed to describe facilities based experiments in Structural Science.
10
TopCat
11
DOI’s for Data Publication
12
Is this enough? What we have so far is good for: –us to manage data –users to access their own data –citation of raw data But –Traceability and Validation? –Reuse of the data? Need to make context more explicit –Focussing on the dataset is the wrong subject of discourse
13
Support the wider Facilities Lifecycle Proposal Approval Scheduling Experiment Data storage Record Publication Scientist submits application for beamtime Facility committee approves application Facility registers, trains, and schedules scientist’s visit Scientists visits, facility run’s experiment Subsequent publication registered with facility Raw data filtered, and stored Data analysis Tools for processing made available As in PanData-ODI – D6.1 (which has much more detail)
14
Publishing Investigations So what we want is a record of EXPERIMENTS not data. Thus want the record of the context –The experimental intention and actors –The instruments and configurations used –The sample –The environmental parameters and context –The Raw Data Thus we want to publish a record of the whole INVESTIGATION –Can get most of this this from what we have The Investigation becomes a “first class” research object –Published –Identified and treated as a single entity –Cited and credited –Record of the output of the facility Analogous to a Journal Article –Investigation as the unit of discourse for scientific facilities. But also as an access point for validation and reuse –Because we have a record of what actually happened.
15
Our DataCite entries are in fact Investigations (red is for “data” notion, and green is for “investigation”)
16
“DataCite abuse” As we have seen, we use DataCite for Investigations, with Datasets only referred from them. Other data curators sometimes use DataCite for Publications (“documents”) that contain data: http://data.datacite.org/10.7480/OA http://data.datacite.org/10.7480/OA So “data” DOIs tend to resolve either into Investigations or Publications Extend the Resource Type Also may not want to have a landing page for all DOIs
17
Research Objects Represent the “investigation” as a Research Object –Research Objects (ROs) are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations. Their goal is to create a class of artifacts that can encapsulate our digital knowledge and provide a mechanism for sharing and discovering assets of reusable research and scientific knowledge www.researchobject.org and elsewhere (WorkFlow4Ever)www.researchobject.org Represent Investigation as a Research Object –Build a graph structure for the links in the research object. –Using an RDF representation, URIs –Publish as a linked data object Bechhofer, et. al. Why Linked Data is Not Enough for Scientists, Proceedings of the 10th IEEE e-Science Conference, Brisbane, Australia (2010) http://eprints.ecs.soton.ac.uk/21587/5/research-objects-final.pdf http://eprints.ecs.soton.ac.uk/21587/5/research-objects-final.pdf Arif Shaon, Sarah Callaghan, Bryan Lawrence, Brian Matthews. Opening up Climate Research: a linked data approach to publishing data provenance 7 th Int Digital Curation Conference (2011).
18
RDF representation of CSMD model Investigation An investigation or experiment Facility An experimental facility Dataset A collection of data files and part of an investigation Datafile A data file
19
After proposal: Initialise the Research Object Investigation #n DOI:STFC.xxx.n :instrument :investigator :n a csmd:Investigation ; csmd:investigation_doi doi:stfc.xxx.n csmd:investigation_investigationUser :iu1 ; csmd:investigation_instrument :inst1. :iu1 a csmd:investigationUser ; csmd:investigationUser_user :u1. :u1 a csmd:User. :inst1 a csmd:Instrument.
20
After the experiment Experimental Data Metadata Investigation #n DOI:STFC.xxx.n :dataset :instrument :investigator Own metadata format (CSMD) More or less what ICAT currently supports Adds extra details on parameters, datasets, formats etc. :sample Data Storage
21
Linking Publication into Investigation Raw Data Repository Publication Repository :dataset :publication :investigator cito:cites Investigation #n DOI:STFC.xxx.n :instrument :sample Publication Store
22
Raw Data Repository Derived Data Repository Publication Repository :dataset :publication :investigator Investigation #n DOI:STFC.xxx.n :instrument :sample Note that derived data could be on a different site :relatedDataset Linking the derived data into the Investigation
23
Linking the software into the Investigation :dataset :relatedDataset :publication :investigator W3C Prov ontology Assume that the software is in a repository Software Package 1 cito:cites :inputDataset :outputDataset :application Software Repository Investigation #n DOI:STFC.xxx.n :instrument :sample
24
Generate Landing page from RO
25
Setting the Boundary: It depends on your Point of View Investigations Extended Publication E-Portfolio
26
Setting a boundary : OAI-ORE
27
Preserving Investigations Now becomes preserving the research object. –Preserving a linked data graph –Persistency of identifiers –Managing integrity of external artefacts. –Link checking –Copying and mirrorign – checking consistency Representation Information to give more context on the objects –And on the aggregate as a whole PDI (Provenance, Integrity etc) on the whole aggregate object –As well as components
28
Adding Preservation Information – Rep Info for various items :dataset :relatedDataset :publication :investigator Would probably be more Work into a RepInfo Repository Would also have a RepInfo Network :application Investigation #n DOI:STFC.xxx.n :instrument :sample Instrument description (website) Raw data format description (e.g. NeXus) Parameter description (e.g. NXDL, Con Vocab) Software classification Software description Sample description Analysed data format description Publication format description
29
Adding Preservation Information – Rep Info for the whole aggregate :dataset :relatedDataset :publication :investigator :application Investigation #n DOI:STFC.xxx.n :instrument :sample Software classification CSMD Vocabulary description
30
Summary Investigation appropriate unit of discourse for facilities science –Publishable, Citable, Reportable –Can be used as a vehicle for validation and reuse Basic principles of building research objects for facilities science –Follow research lifecycle –Consider Investigation a RO “seed” –Apply Linked Data principles –Re-use existing vocabularies and ontologies –Share ROs via recognizable data formats and APIs Applicable beyond Facilities –Other analogous objects: –“experiments”, “observations”, “studies” The subject of preservation –How do we maintain the integrity of Investigation objects?
31
Thank You Questions? brian.matthews@stfc.ac.uk www.e-science.stfc.ac.uk brian.matthews@stfc.ac.uk www.e-science.stfc.ac.uk
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.