Presentation is loading. Please wait.

Presentation is loading. Please wait.

Towards an information model for I2S2

Similar presentations


Presentation on theme: "Towards an information model for I2S2"— Presentation transcript:

1 Towards an information model for I2S2
Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory

2 Facilities Process Characteristics : - formal application
Proposal Approval Scheduling Experiment Data storage Record Publication Facilities Process Characteristics : - formal application - set processes - central infrastructure - standard tools - hierarchical control - dedicated staff user office instrument scientists Library and IT support Subsequent publication registered with facility Data analysis Scientist submits application for beamtime Tools for processing made available Facility committee approves application Raw data filtered, cleansed and stored Scientists visits, facility run’s experiment Facility registers, trains, and schedules scientist’s visit

3 Requirements Secure access to user’s data Flexible data searching
Scalable architecture Extensible architecture Integration with analysis tools Access to high-performance resources Linking to other scientific outputs Data policy aware

4 Principles proposal systems
The ICAT software suite Catalogues all experiment related information Metadata gathered via integration with existing IT systems proposal systems data acquisition Provides a well defined API for easy embedding into any applications. Access data anywhere via the web Annotate and Search for data Share data with colleagues Access data via user’s own programs Utilise integrated e-Science resources Link to data from your publications Online Proposal System User Office System: User Database Scheduling Health and Safety Proposal Management Metadata Catalogue Data Acquisition System Storage Management System DataAccess Portal Single Sign On Account Creation and Management ICAT Software Suite, providing the crucial integration of key functions.

5 Component architecture
The ICAT software suite has a modular design with clear functional boundaries for each component. Core functionalities have been grouped together, customisable presentation layers are separated from the function layer to achieve easy maintenance, easy customisation, insulation from changes to underlying areas. All interaction with the ICAT catalogue are now through the ICAT API.

6 Data Storage/ Delivery System
ICAT Deployment User Database System Single Sign On Data Storage/ Delivery System Proposal System Publication System ICAT API e-Science Services RDBMS Software Repository Web Services API Command Line Tools Fortran C++ Java Glassfish / JBOSS

7 Data Portal

8 TopCat

9 Towards an Information Model

10 Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston

11 Functional requirements

12 A Metadata Model for Facilities Science
A common general format/standard for Scientific Studies and data holdings metadata did not exist By proposing a Model A specification for the types of metadata to capture Scientific Studies Cataloguing data holdings: provide access for the Data Owner Ease citation, sharing collaboration, and integration Allow easy Federation of distributed heterogeneous metadata systems into a homogeneous (virtual) Platform Therefore – The Common Scientific Metadata Model (CSMD) developed.

13 A Domain Model

14 Modelling Scientific Activity

15

16

17 Core Scientific Metadata Model
Damian Flannery Name Parent Id Topic Level Topic Publication Keyword Full Reference URL Repository Name User Id Role e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc. Element Type Element Id Authorisation Investigation Reference / Proposal Id Previous Reference Facility Instrument Title Abstract Etc. Investigator User Id Role Name Units String Value Numeric Value Range Top Range Bottom Error Name Sample Id Description Dataset Sample Name Chemical Formula Safety Information Sample Parameter Name Units String Value Numeric Value Range Top Range Bottom Error Name/Units/Value etc Searchable Is Sample Parameter Is Dataset Parameter Is Datafile Parameter Verified Datafile Dataset Parameter Parameter Name Description Version Location Format Format Version Create Time Modify Time Size Checksum Name Units String Value Numeric Value Range Top Range Bottom Error Source Datafile Id Destination Datafile Id Relation S/W Apllication S/W Version Related Datafile Datafile Parameter

18 Description set profile

19 Metadata granule Metadata Granule Topic Keywords providing a index on what the study is about. Study Description Provenance about what the study is, who did it and when. Access Conditions Conditions of use providing information on who and how the data can be accessed. Data Description Detailed description of the organisation of the data into datasets and files. Data Location Locations providing a navigational aid to where the data on the study can be found. Related Material References into the literature and community providing context about the study. Legal Note Copyright, patents and conditions of use etc relating to the study and the data in the study .

20 ICAT 3.3 Schema – Study (2)

21 Syntax and metadata formats

22 ICAT API and XML format

23 ICAT 3.3 Database Schema

24 CSMD History Model first pilot developed in 2001! Now in ICAT 3.3
Serving data from STFC Facilities (ISIS, DLS) Model proven robust – simple yet expressive

25 I2S2 - Infrastructure for Integration in Structural Sciences
Bridging the gap between raw and derived data EPSRC National Crystallography Service service provision function operates across institutions moderate infrastructure Diamond & ISIS operates on behalf of multiple institutions processes for experiments large infrastructure engineered to manage raw data derived data taken off site on laptops / removable drives “Lone” researcher scenario data sharing with colleagues via Little or no infrastructure Little management of raw or derived data

26 Interactions between research process
Proposal Extend to To laboratory based science To secondary analysis data To preservation information To publication data To domain specific vocabularies By being: - standardised - modular - extensible Record Publication Approval CSMD Scheduling Analysis Tools Facilities Experiment Facilities Experiment Data storage Data cleansing Sample Preparation Data analysis Local experiments Publication Simulation Facilities Proposal Cover the scientist’s research lifecycle as well as the facilities. Record Publication Literature Review Grant Proposal

27 Methodology The Singapore Framework for Dublin Core Application Profiles. Mikael Nilsson, Tom Baker, Pete Johnston

28 Issues Metadata model Framework for developing metadata model
Modularisation mechanisms and extensions Formats Model supporting laboratory tools How does the model fit ? Flexibility to handle local processes Adhoc, partial, un-ordered What needs changing in the model? What needs changing in tools? Data input and maintenance??? Simple ways of inputting the data Lab books?

29 Extension areas: Secondary analysis data Preservation data
Publication data Topic data chemistry Controlled lists (ontologies) for Instruments Facilities, Methods Access control Safety data Blogs and notebooks

30 Scattering function data
Part of ISIS study ISIS - ICAT Correction data Sample data Calibration data User inputs Control file Gudrun Scattering function data

31 Derived Data Generalised model Managing the links between data
Inputs of data sets Associated with a software item with a set of parameters Managing this? - lab-books ? - simple tools? - VRE ?


Download ppt "Towards an information model for I2S2"

Similar presentations


Ads by Google