Metadata for Large Science: The ICAT Data Model Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory.

Slides:



Advertisements
Similar presentations
Louisa Casely-Hayford e-Science Ontologies & Ontology tools for the CCLRC Neutron & Muon Facility.
Advertisements

Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
S.J. Coles a*, M.B. Hursthouse a, R.A. Stephenson a, P. Cliff b, E. Lyon b, M. Patel b J. Downing c & P. Murray-Rust.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Data and Publication Discovery Brian Matthews, Information Management Group, STFC Rutherford Appleton Laboratory CLADDIER workshop, Chilworth, Southampton,
S.J. Coles a*, J.G. Frey a, M.B. Hursthouse a, L. Carr b & C.J. Gutteridge b. a School of Chemistry, University of Southampton, UK.; b School of Electronics.
© S.J. Coles 2006 Digital Repositories as a Mechanism for the Capture, Management and Dissemination of Chemical Data Simon Coles School of Chemistry, University.
Towards an information model for I2S2
Integrating research data into the publication workflow: eBank UK experience Rachel Heery, UKOLN, University of Bath
Federation eCrystals Federation: Open Repositories for Data-driven Science Dr Liz Lyon, UKOLN, University of Bath, UK Dr Simon Coles, University of Southampton,
© S.J. Coles 2006 Institutional Data Repositories for Chemistry Simon Coles School of Chemistry, University of Southampton, U.K.
EBankII Workshop 1 Making Scientific Data Openly Available Simon Coles School of Chemistry, University of Southampton.
I2S2 - Infrastructure for Integration in Structural Sciences Cross-Institutional Pilot
I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop RAL 11 th February 2010
The Discovery Landscape in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK – eBank UK project A centre.
ICAT + Information Model Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory
UKOLN is supported by: Bridget Robinson and Ann Chapman From analytical model to implementation and beyond CD Focus Schema Forum, CBI Conference Centre.
PaN-data WP7 - Integration Brian Matthews STFC-e-Science.
A multi-level metadata approach for a Public Sector Information data infrastructure Nikos Houssos 1,2, Brigitte Jörg 1,3, Brian Matthews 4 1 euroCRIS 2.
The Data Lifecycle and the Curation of Laboratory Experimental Data Tony Hey Corporate VP for Technical Computing Microsoft Corporation.
The Library behind the scene How does it work ? The Library behind the scenes 1 JINR / CERN Grid and advanced information systems 2012 Anne Gentil-Beccot.
Data Catalogue Service Work Package 4. Main Objective: Deployment, Operation and Evaluation of a cataloguing service for scientific data. Why: Potential.
University of Southampton, U.K.
EPrints Workshop, January eBank UK: Dissemination of research data using EPrints Simon Coles, School of Chemistry, University of Southampton.
Brian Matthews, CRIS 2002, 31/08/02 1 Accessing the Outputs of Scientific Projects Brian Matthews, Michael Wilson, Business & Information Technology Dept,
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
Publication of facility investigations Brian Matthews Scientific Information Group Scientific Computing Department STFC Rutherford Appleton Laboratory.
Data Exchange Tools (DExT) DExT PROJECTAN OPEN EXCHANGE FORMAT FOR DATA enables long-term preservation and re-use of metadata,
The Integration of Peer-to-peer and the Grid to Support Scientific Collaboration Tran Vu Pham, Lydia MS Lau & Peter M Dew {tranp, llau &
©euroCRIS/Keith G JefferyCRIS Seminar Brussels Discussion Topics Keith G Jeffery President, euroCRIS
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
University of Bergen Library Electronic publishing Bergen – Makerere visit February 2005.
Database structure for the European Integrated Tokamak Modelling Task Force F. Imbeaux On behalf of the Data Coordination Project.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
EBank UK: linking scientific data, scholarly communication and learning Michael Day and Rachel Heery UKOLN, University of Bath
Context and Linking in the Research Lifecycle CERIF and other standards Catherine Jones Scientific Information Group Scientific Computing Department STFC.
© 2005 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice The China Digital Museum Project.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
The european ITM Task Force data structure F. Imbeaux.
Cross-linking and Referencing Data and Publications in CLADDIER Brian Matthews, E-Science Centre, STFC Rutherford Appleton Laboratory.
E-Science: The view from the social sciences Jenny Fry and Ralph Schroeder Oxford Internet Institute Oxford e-Social Science Project.
The Faster Research Cycle Interoperability for better science Brian Matthews, Leader, Information Management Group, E-Science Centre, STFC Rutherford Appleton.
Jamie Hall (ILL). SciencePAD Persistent Identifiers Workshop PANData Software Catalogue January 30th 2013 Jamie Hall Developer IT Services, Institut Laue-Langevin.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
JISC Information Environment Service Registry (IESR) Ann Apps MIMAS, The University of Manchester, UK.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
Metadata for structural science Workshop on research metadata in context Nijmegen, 7–8 September 2010 Simon Lambert STFC e-Science UK.
Overview and Motivation of the ICAT Software Suite Kerstin Kleese van Dam.
TopCAT Use Cases Priorities User Interface 1 ICAT developer workshop, August 2009 Laurent Lerusse – STFC
Louisa Casely-Hayford e-Science The ISIS Facilities Ontology and OntoMaintainer Louisa Casely-Hayford and Shoaib Sufi.
ICAT Schema Current Schema organization What’s there but not yet implemented What could we want in the future 1 ICAT developer workshop, August 2009.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
CombeDay Making Data Openly Available Simon Coles.
A centre of expertise in digital information management UKOLN is supported by: Functional Requirements Eprints Application Profile Working.
Using the ICAT API to ingest business and experiment metadata Tom Griffin, STFC ISIS Facility NOBUGS 2012 ICAT Workshop
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Metadata-based Discovery: Experience in Crystallography UKOLN is supported by: Monica Duke UKOLN, University of Bath, UK A centre of.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Open Science and Research – Services for Research Data Management © 2014 OKM ATT 2014–2017 initiative Licenced under.
CRISP WP 17 1 / 2 Proposed Metadata Catalogue Architecture Document.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Donatella Castelli CNR-ISTI
ICAT- Experience and activities at ISIS
Repository Software - Standards
Brian Matthews STFC EOSCpilot Brian Matthews STFC
Developing Institutional Data Repositories
Presentation transcript:

Metadata for Large Science: The ICAT Data Model Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory

The ICAT software suite Catalogues all experiment related information Metadata gathered via integration with existing IT systems proposal systems data acquisition Provides a well defined API for easy embedding into any applications. Distributed Data Metadata Catalogue Generic Catalogue Access Interface Data Access and Analysis

Integrated e-Infrastructure Proposal Metadata Catalogue Information Experiment Data Acquisition System Secure Storage Data Analysis Publication E-Pubs Proposal System All Data and Metadata Capture is automated.

Model Motivation A common general format/standard for Scientific Studies and data holdings metadata did not exist By proposing Model and Implementation: –A specification for the types of metadata which should be captured during Scientific Studies –Cataloguing and organising data holdings: provide access for the Data Owner –Ease citation, collaboration, and integration –Allow easy Integration of distributed heterogeneous metadata systems into a homogeneous (virtual) Platform Therefore – The Common Scientific Metadata Model (CSMD) developed.

Core Scientific Metadata Model Metadata Granule Topic Study Description Access Conditions Data Location Data Description Keywords providing a index on what the study is about. Provenance about what the study is, who did it and when. Conditions of use providing information on who and how the data can be accessed. Detailed description of the organisation of the data into datasets and files. Locations providing a navigational aid to where the data on the study can be found. References into the literature and community providing context about the study. Related Material Legal Note Copyright, patents and conditions of use etc relating to the study and the data in the study.

Investigation PublicationKeywordTopic Sample Sample Parameter Dataset Dataset Parameter Datafile Datafile Parameter Investigator Reference / Proposal Id Previous Reference Facility Instrument Title Abstract Etc. Name Name/Units/Value etc Searchable Is Sample Parameter Is Dataset Parameter Is Datafile Parameter Verified Name Units String Value Numeric Value Range Top Range Bottom Error Full Reference URL Repository Name Parent Id Topic Level User Id Role Name Chemical Formula Safety Information Name Units String Value Numeric Value Range Top Range Bottom Error Name Sample Id Description Name Units String Value Numeric Value Range Top Range Bottom Error Name Description Version Location Format Format Version Create Time Modify Time Size Checksum Related Datafile Parameter Authorisation Source Datafile Id Destination Datafile Id Relation S/W Aplication S/W Version User Id Role e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc. Element Type Element Id CSMD in more detail

CSMD Used on DataPortal Implementation used as Data Interface for DataPortal Single view of heterogeneous systems/schemas Acts as a stress test of the model –Limitations feed into Model Requirements –New requirements feed back into implementation

ICAT 3.3 Database Schema

Core Scientific Metadata Model Usage Used on many projects since 2001 –STFC ICat Serving data from STFC Facilities (ISIS, DLS) SNS, Canada, Australia –e-Minerals, e-Materials – collaborations to provide access to Grid resources – with ISIS users. Outside STFC MyGrid BioInformatics project –information model based on version 1 of the CSMD Model This is being taken in the myIB project –Application to Integrated biology –Extending to provenance tracking in computational steering Also influence projects: –Comb-eChem, eBank, eCrystals –NERC DataGrid CCP1(Collaborative Computational Project in Quantum Chemistry) –assessed CSMD for metadata needs on their Grid Data Management Middleware

Interoperability Sharing data across boundaries –Across different research lifecycles –Across institutions –Across information objects –Across disciplines –Across time Characteristics –Loosely coupled –Across different authorities –Different internal models Infrastructure to support science across disciplines, scientific institutions and research groups Interoperable Metadata Models as a way we can enable Interoperable Data

Publishing the CSMD Currently published as: ICAT 3.3 database schema ICAT API CCLRC Scientific Metadata Model: Version 2 -Sufi & Matthews 2004 UML Model XML Schema Need to update and formally publish Dublin Core Application Profile Sharable and collaborative A “flat” model An exchange model NameSubject Term URI Refines NameTopic Has Encoding Scheme DC Subject Encoding Schemes Has Encoding Scheme ISIS Subject Encoding Schemes Comment ISIS Subject Encoding Schemes are available from ObligationRecommended

Towards a Facilities Ontology Ontologies are used to capture knowledge about a domain of interest. Increased flexibility when representing frequently changing viewpoints. Alterations can be made in the model without changing applications. Allows a unified view of heterogeneous data sources. Remove conflicts and terminological uncertainties. Facilitate moderated searches, optimisation of results ISIS Ontology At present over 10,000 keywords describing experiments in ISIS ICAT Free text terms - NO Structure of keywords Organise and formalise into an ontology Map familiar terms in one domain to related terms in different domains Search data by category across studies Cross facility searching of related scientific data from the various scientific facilities e.g. CLF and DLS - elsewhere Some initial work in this area – Louisa Casely-Hayford Re NXDL

Sample, Investigator and Experiment Ontologies Sample Investigator Experiment

CSMD and Simulations Detailed Information about the Data Analysis and underpinning Computational Simulations: Simulation/Analysis Code and version Simulation Set-up and Parameters Information about the Compute Resource Key Parameter from the Simulation Results Keywords and Classifications Links to Analysed Secondary Data Currently limited support in the CSMD Could provide link to Software libraries.

Sharing Publications CSMD offers the potential to integrate the outputs of scientific research: data and publications. Institutional Repository s/w now very well established –ePrints, DSpace, Fedora, ePubs –Large body of expertise available –Standard metadata models and protocols: DC-APs, FRBR, OAI-PMH, OAI-ORE –Not yet embedded in science practise except HEP! Linking science data and publications –Not yet well established –Needs data Identifiers and citation DOIs, Data Journals –Needs peer review of data –Can (and should?) be done on a P2P basis

Issues on Metadata in EDNP Integration with other Projects and Facilities. – Interchangeable Metadata to do this – Providing the input tools – Providing the APIs – Providing the User Interfaces Data Policy and Ownership. Data and Metadata Curation for long-term reuse.

Questions?