Overview and Motivation of the ICAT Software Suite Kerstin Kleese van Dam.

Slides:



Advertisements
Similar presentations
Louisa Casely-Hayford e-Science Ontologies & Ontology tools for the CCLRC Neutron & Muon Facility.
Advertisements

Shoaib Sufi CCLRC e-Science Centre CCLRC Scientific Metadata (CSMD) Model April 2004 NESC.
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Towards an information model for I2S2
I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop RAL 11 th February 2010
DIGIDOC A web based tool to Manage Documents. System Overview DigiDoc is a web-based customizable, integrated solution for Business Process Management.
ICAT + Information Model Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Network Management Overview IACT 918 July 2004 Gene Awyzio SITACS University of Wollongong.
Copyright 2002 Prentice-Hall, Inc. Chapter 4 Automated Tools for Systems Development 4.1 Modern Systems Analysis and Design Third Edition.
Requirements Specification
Copyright 2002 Prentice-Hall, Inc. Chapter 4 Automated Tools for Systems Development 4.1 Modern Systems Analysis and Design Third Edition Jeffrey A. Hoffer.
Supplement 02CASE Tools1 Supplement 02 - Case Tools And Franchise Colleges By MANSHA NAWAZ.
Chapter 1 Introduction to Databases
Product Offering Overview CONFIDENTIAL AND PROPRIETARY Copyright ©2004 Universal Business Matrix, LLC All Rights Reserved The duplication in printed or.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse 2.
MDC Open Information Model West Virginia University CS486 Presentation Feb 18, 2000 Lijian Liu (OIM:
©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 18 Slide 1 Software Reuse.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
ITEC224 Database Programming
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
CF Conventions Support at BADC Alison Pamment Roy Lowry (BODC)
Metadata for Large Science: The ICAT Data Model Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory.
ETICS2 All Hands Meeting VEGA GmbH INFSOM-RI Uwe Mueller-Wilm Palermo, Oct ETICS Service Management Framework Business Objectives and “Best.
F. Toussaint (WDCC, Hamburg) / / 1 CERA : Data Structure and User Interface Frank Toussaint Michael Lautenschlager World Data Center for Climate.
Context and Linking in the Research Lifecycle CERIF and other standards Catherine Jones Scientific Information Group Scientific Computing Department STFC.
1 All-Hands Meeting 2-4 th Sept 2003 e-Science Centre The Data Portal Glen Drinkwater.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
Prepared By Prepared By : VINAY ALEXANDER ( विनय अलेक्सजेंड़र ) PGT(CS),KV JHAGRAKHAND.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
SCORM Course Meta-data 3 major components: Content Aggregation Meta-data –context specific data describing the packaged course SCO Meta-data –context independent.
Portable Infrastructure for the Metafor Metadata System Charlotte Pascoe 1, Gerry Devine 2 1 NCAS-BADC, 2 NCAS-CMS University of Reading PIMMS provides.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Project Database Handler The Project Database Handler is a brokering application that mediates interactions between the project database and the external.
Metadata for structural science Workshop on research metadata in context Nijmegen, 7–8 September 2010 Simon Lambert STFC e-Science UK.
CASE (Computer-Aided Software Engineering) Tools Software that is used to support software process activities. Provides software process support by:- –
Louisa Casely-Hayford e-Science The ISIS Facilities Ontology and OntoMaintainer Louisa Casely-Hayford and Shoaib Sufi.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
ICAT Schema Current Schema organization What’s there but not yet implemented What could we want in the future 1 ICAT developer workshop, August 2009.
Chapter 4 Automated Tools for Systems Development Modern Systems Analysis and Design Third Edition 4.1.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Foundations of Information Systems in Business. System ® System  A system is an interrelated set of business procedures used within one business unit.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
ICAT Status Alistair Mills Project Manager Scientific Computing Department.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Store and exchange data with colleagues and team Synchronize multiple versions of data Ensure automatic desktop synchronization of large files B2DROP is.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CRISP WP 17 1 / 2 Proposed Metadata Catalogue Architecture Document.
Working in the Forms Developer Environment
Modern Systems Analysis and Design Third Edition
Modern Systems Analysis and Design Third Edition
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Tools of Software Development
Chapter 4 Automated Tools for Systems Development
Modern Systems Analysis and Design Third Edition
Modern Systems Analysis and Design Third Edition
Analysis models and design models
Reportnet 3.0 Database Feasibility Study – Approach
Modern Systems Analysis and Design Third Edition
Presentation transcript:

Overview and Motivation of the ICAT Software Suite Kerstin Kleese van Dam

Science and Technology Facilities Council STFC employ more than 2200 staff who are deployed at 7 locations, these are: Swindon where the headquarter is based, the Rutherford Appleton Laboratory, the Daresbury Laboratory, the Chilbolton Observatory, the UK Astronomy Technology Centre in Edinburgh, the Isaac Newton Group of Telescopes on La Palma; and the Joint AstronomySwindonUK Astronomy Technology CentreIsaac Newton Group of TelescopesJoint Astronomy CentreCentre in Hawaii.

Research and Science Support at STFC Deliver world class science Engender world class science Communicate world class science Annually over visiting Scientists from around the world from both Academia and Industry.

Why an Integrated e- Infrastructure is required HPC Analysis Storage Analysis Experiment Computing HPC Scientist

What STFC aim to achieve with their e-Infrastructure Enabling users to get rapid access to their current and past data, related experiments, publications etc., leading to improved analysis through more complete information. Creating a powerful, long lasting scientific knowledge resource.

Integrated e- Infrastructure Proposal Metadata Catalogue Information Experiment Data Acquisition System Secure Storage Data Analysis Publication E-Pubs Proposal System All Data and Metadata Capture is automated.

e-Infrastructure – Access to Multiple Facilities(2) Data Portal SNS - ORNL ISIS – TS1 + 2 DLS CLF CSL - Canada SRS + ERLP

How we achieve the integration HPC Analysis Storage Analysis Experiment Computing HPC Metadata Scientist

ICAT Software Suite

The ICAT software suite centrally catalogues all experiment related information and extracted key results. Where ever possible information is gathered automatically trough integration with existing IT systems such as proposal systems or data acquisition. The catalogue and the data it references are accessible via a well defined API for easy embedding into any applications. Distributed Data Metadata Catalogue Generic Catalogue Access Interface Data Access and Analysis Applications

Underlying Data Infrastructure Online Proposal System User Office System incl.: User Database Scheduling Health and Safety Proposal Management Metadata Catalogue Data Acquisition System Storage Management System DataAccessPortal Single Sign On Account Creation and Management ICAT Software Suite, providing the crucial integration of key functions.

The online proposal system is the entrance point to the Data Management System, and is a rich source of contextual information about the users experiment. ICAT and the STFC Proposal Systems

ICAT and STFC Data Acquisition Plug-ins for the data acquisition system ensure automatic, quality controlled collection of data and metadata. ICAT can be easily linked to any existing system. ISIS : -SECI (C#,.net) with link to LabView and openGenie DLS : -Generic Data Acquisition (Java, on top of EPICS) CLF : -For Laser Diagnostics, (LabView)

ICAT and DLS Storage Management DLS uses the Storage Resource Broker for its Storage Management, this has been integrated with ICAT for data access and delivery. Main advantage : Decoupling physical file location from the logical one. Strict Security Expandable to many storage systems

ICAT and ISIS Storage Management ISIS uses their own in house developed data storage access system called Data.ISIS. Similar to SRB it abstracts from the physical location of the files and delivers the same advantageous in terms of decoupling of logical and physical location of files and security.

ICAT Architecture Online Proposal System User Office System incl.: User Database Scheduling Health and Safety Proposal Management Metadata Catalogue Data Acquisition System Storage Management System DataAccessPortal Single Sign On Account Creation and Management ICAT Software Suite, providing the crucial integration of key functions.

ICAT 3.3 Aims and Objectives ICAT API Version 3.3 aims to be the Grid aware software infrastructure that enables applications to exploit the capabilities of the ICAT catalogue. Data Portal Version 3.3 aims to be the Grid aware software infrastructure that serves the Data Search and Retrieval (DSR) requirements of the STFC. It makes use of the ICAT API 3.3.

Overall Architecture Principles The ICAT software suite has a modular design with clear functional boundaries for each component. Core functionalities have been grouped together, customisable presentation layers are separated from the function layer to achieve easy maintenance, easy customisation, insulation from changes to underlying areas. All interaction with the ICAT catalogue are now through the ICAT API.

Core Scientific Metadata Model (CSMD)

Rich Data at STFC Scientific Data of the highest Quality is produced at STFC Facilities and Departments. The continuity and longevity of STFC has led to a unique wealth of Information. How about a system that would give access to all of it independent of where it was produced?

Model Motivation (1) Most Scientists think in terms of Studies during which they perform a number of investigations e.g. experiments, observations, measurements and simulations. Results from these investigations usually run through different stages: raw data, analysed or derived data and end results. Data should be grouped accordingly. Metadata and Software (e.g. STFC DataPortal) should allow the user to search for interesting data. Not all information captured in specific metadata schemas e.g. CML, would be used to search for this data or distinguish one data set from another, give possibility to select special parameter.

Model Motivation (2) A common general format/standard for Scientific Studies and data holdings metadata did not exist By proposing Model and Implementation: –Form a specification for the types of metadata studies should capture during Scientific Studies –Ease citation, collaboration, exploitation and Integration –Allow easy Integration of distributed heterogeneous metadata systems into a homogeneous (albeit virtual) Platform Therefore – The Common Scientific Metadata Model (CSMD) developed.

General Layout Why – i.e. what was the need What is it – description – support keyword searches and taxonmic approaches – data organisation like a file systems but support linking to a database also Where is it used – project & software What are the users likely to search on What distinguishes one study/investigation/data set from the next

Metadata Model Structure The Common Scientific metadata model (CSMDM) is a study-data set orientated model holding study information about: –Topic Indexing –Provenance –Data Holding –Legal notes Copyright, patents and conditions of use etc relating to the study and the data in the study –Related Material Publications, Community information and related links –(Access Conditions) Metadata Granule Topic Study Access Conditions Related Material Legal Note Data Holding Investigation 1 M 1111 Atomic Data Object Data Collection M M M M 1

Model Breakdown: Provenance The Study contains the following metadata: –The Study Name –The Study Institution –The Investigator –Extended Study Information Abstract Funding Start and End times –Investigations

Investigations A Study can have more than one investigation; possible enumerations are experiment, simulation, measurements etc. – investigations contain: –Name –Investigation Type –Abstract –Resource –Link to DataHolding

Topic (for indexing) Keywords –Discipline (i.e. domain) –Keyword Source (e.g. domain dictionary) –Keyword Subjects –Discipline –Subject Source (e.g. domain taxonomy) –Subject

Access Condition & Related Material Access Conditions –Contains a list of users or groups who are allowed access to the metadata and data, or a pointer to an access control system which contains such data for this study Related Material –One or many links and or textual descriptions of material related to this study e.g. earlier studies or parallel studies

Data Data Description holds a logical description of the Study’s data: –Data Name –Type of Data –Status –Data Topic –Parameters –Related Data Ref –Relation type (e.g. derived) Data Location contains the link between logical name (e.g. URI's) and physical URLs –Data Name –Locator(s) (In the case of Atomic Data Objects these can refer to files as well as named Selects on a database – i.e. virtual data objects)

More on Parameters Parameters contain a lot of information about the atomic data objects (ADO) and collections A collection/ADO can have many parameter entries, each parameter entry contains: Parameter derivation (e.g. measured/fixed) –The value –The units –Range –Error margin Parameter aggregation is also supported

Cardinality Issues The model recommends a certain cardinality of elements Certain metadata components are necessary for one to have an instance of the implemented model – treating everything as optional is not acceptable It is though implementations may modify this more to their needs – model attempts to remain ideal (i.e. most common Cardinality)

Enumeration Issues Enumerations (or controlled vocabularies) e.g. types of investigator, types of institutions; these are distinct from the model e.g. as taxonomies are. However they are necessary for the model to work so implementations e.g. STFC DataPortal implementation of the model propose some enumerations for common things Recognised and relevant controlled vocabularies are hoped to be used by implementations where they are available

Conformance Level For a complete metadata study-dataset record a large amount of metadata has to be stored/processed So it’s useful to have conformance levels Model uses 5 levels Each level specifies more metadata (and Indexing information) should be held

Level 1 Type of Information captured: –Study and Investigation metadata with indexing at the Study level Level 1 metadata is similar to library/publication style metadata (e.g. DublinCore)

Level 2 Type of Information captured: –Level 1 + DataHolding metadata (i.e. DataSets and DataObjects)

Level 3 Type of Information captured: –Level 2 + related material, Access condition, indexing to data collection levels

Level 4 Type of Information captured: –Level 3 + indexing to data object level and data object parameter information

Level 5 Type of Information captured: –All metadata components are filled as L4 + funding, resources used, facilities used etc

Conformance Levels L1 is similar to library/publication style metadata (e.g. DublinCore) The current DataPortal uses somewhere between L4 and L5 –the new systems designed with CSMD conforms to L4+ Benefit of conformance levels; the higher the level of conformance to the CSMD the richer the clients that operate on the data can be –e.g. identifying datasets and atomic data objects which link directly to keywords/taxonomies and not just studies

CSMD Used on DataPortal Implementation used as Data Interface for DataPortal Single view of heterogeneous systems/schemas Acts as a stress test of the model –Limitations feed into Model Requirements –New requirements feed back into implementation

ICAT Schema 3.3

Specifics of the ICAT 3.3 Schema

ICAT 3.3 Schema - Facility

ICAT 3.3 Schema - Study

ICAT 3.3 Schema – Study (2)

ICAT 3.3 Schema – Study (3) Study Investigation Study Status

ICAT 3.3 Schema - Investigation

ICAT 3.3 Schema - Instrument

ICAT 3.3 Schema - Shift

ICAT 3.3 Schema – Shift (2)

ICAT 3.3 Schema - Keywords

ICAT 3.3 Schema - Topic

ICAT 3.3 Schema – Topic (2) Topic Topic List

What is an Ontology? Ontologies are used to capture knowledge about a domain of interest. An ontology describes the concepts in the domain and the relationships that hold between those concepts.

Advantages of Ontologies Provide increased flexibility when representing frequently changing viewpoints of information. Alterations can be simply followed up in the model without having to alter the applications on which they are based. Allows a unified view of heterogeneous data sources. Remove conflicts and terminological uncertainties. Facilitate Moderated searches, optimisation of the search results.

Why Ontologies are a useful Solution? At present over 1,700,000 keywords describing experiments are housed in ISIS ICAT many of which are synonyms. These keywords are used to index experimental studies, however this is seen as a limited method as these free text keywords have no context, and are hard to map by non-experts to terms used by facilities in the same domain and harder still to those outside. The creation of ontologies at ISIS will aid in the mapping of concrete manifestations of familiar terms in one domain as well as related concepts in different domains. This will facilitate searching of data by category and grouping of data into keywords across studies. This could aid in the cross facility searching of related scientific data from the various scientific facilities housed at STFC e.g. CLF and DLS.

A Protégé-OWL Ontology Classes Individuals Properties A class is a concept in the domain - a class of People - a class of Pets - a class of Countries A class is a collection of elements with similar properties. Instances of classes - America can be an instance of the class Country. Gemma Mathew Fluffy Italy America England Fido Class Person Class Pet Class Country livesIn hasSibling hasPet

ISIS Facilities Ontology Hierarchy

Class ISISExperiment Class DataFile Class Year wasConductedIn hasInvestigator Class Instrument Class Investigator HRP00145.RAW 1986 Pete Jones HRPD Class CrystallographyGroupExperiment hasUsedInstrument Hydrazinium Class InvestigationTitle hasTitle hasDataFileName Protein Crystallography GroupExperiment ISIS Facilities Ontology

Sample, Investigator and Experiment Ontologies Sample Investigator Experiment

Ontology Maintainer A web application for graphically displaying current versions of an ontology Currently ontologies are built within Protégé, an editing environment Difficulty in showing constructed ontologies to other domain experts The OntoMaintainer allows users to visualize ontology and enter feedback on the classification and structure of the hierarchy Encourages collaboration between domain experts (scientists) and ontology builders by allowing members of the community to be involved in the development and maintenance of ontologies

Topic Mapping Tool Mapping Tool provides a way of linking proposal system data to the structure of the ontology. Data is mapped to the ontology structure according to a set of defined rules. Proposal System Database Ontology Mapping Rules

Mapping Tool

Object Sample Detail Chemical FormulaName SampleType Liquid poly{1,4- phenylene-[9,9- bis(4-phenoxy butylsulfonate)] fluorene-2,7-diyl} ; C12E5; D2O poly{9,9-bis[6- (N,N-trimethyl- ammonium)hexyl] fluorene-co-1,4- phenylene}; C12E5;D2= C37H52N2I2: C22H46=6;D2= C37H30S2O8; C22H46O6;D2O

Ontologies would help maximise the value of data collected at ISIS and other STFC facilities by improving the access, navigation and reuse of data. Ontologies would facilitate the mapping of terms across STFC facilities which will allow cross-facility searching e.g. external users will be able to search for all experiments carried out across STFC using a powder diffractometer (instrument) even if they do not know the local names of the specific instruments. The OntoMaintainer will facilitate the process of creating and maintaining ontologies by providing a means of getting feedback directly from domain experts

ICAT 3.3 Schema - Investigator

ICAT 3.3 Schema – Investigator (2) Investigator Facility User

ICAT 3.3 Schema - Sample

ICAT 3.3 Schema – Sample Parameter

ICAT 3.3 Schema – Dataset

ICAT 3.3 Schema – Dataset (2)

ICAT 3.3 Schema – Dataset (3)

ICAT 3.3 Schema – Dataset Status

ICAT 3.3 Schema – Dataset Type

ICAT 3.3 Schema – Dataset Parameter

ICAT 3.3 Schema – Data File

ICAT 3.3 Schema – Data File (1)

ICAT 3.3 Schema – Data File (2)

ICAT 3.3 Schema – Related Data Files

ICAT 3.3 Schema – Data File Parameter

ICAT 3.3 Schema – Authorisation

Other ICAT Related Schema

ICAT API Session Schema There are 3 tables to the schema, the user, user_session and myproxy_servers: USER – All users who have logged in USER_SESSION – All user’s sessions on Icat MYPROXY_SERVERS -- configuration information about which server to logon to

ICAT Core Database Schema

ICAT Core Session

ICAT Core Event

ICAT Core User

ICAT Core DataBase

ICAT Core Database The core ICAT catalogue is at STFC run on an Oracle 10G RAC clustered database server. The system has been customised to make efficient use of the offered features of Oracle. If required these could however be removed in the future.

ICAT API

ICAT Architecture Online Proposal System User Office System incl.: User Database Scheduling Health and Safety Proposal Management Metadata Catalogue Data Acquisition System Storage Management System DataAccessPortal Single Sign On Account Creation and Management ICAT Software Suite, providing the crucial integration of key functions.

ICAT API Version 3.3 (1) The ICAT API version 3.3 is the interface that any application should use to interact with the core ICAT system catalogue. At present it is used by applications such as the ISIS XML ingest, the DLS Generic Data Acquisition System, DLS DDH and the DataPortal. The API offers a wide range of web services for the easy interaction with the ICAT core catalogue.

ICAT API Version 3.3 (2) The ICAT API version 3.3 consists of three main components: Web Services offered to other applications ICAT Catalogue Interactions ICAT Catalogue Session Management

ICAT API Version 3.3 (3) The ICAT API version 3.3 uses JPL and SQL to directly interact with the underlying oracle databases The ICAT API version 3.3 has been written in Java using EJB3, JPA and JAX-WS

ICAT API Version 3.3 (4) Web Services offered to other applications for the Search, List, Ingest, Delete, Modification of: Authentication Investigation, Datafile and Dataset Information Investigator Keywords Publication Sample Download

DataPortal

ICAT Architecture Online Proposal System User Office System incl.: User Database Scheduling Health and Safety Proposal Management Metadata Catalogue Data Acquisition System Storage Management System DataAccessPortal Single Sign On Account Creation and Management ICAT Software Suite, providing the crucial integration of key functions.

DataPortal for ICAT Version 3.3 The DataPortal is a highly customisable web interface to interact with the ICAT version 3.3. There are at present two distinctive versions one for ISIS and one for DLS. Whereas the underlying functionality is the same the graphical representation and choice of used services varies. The DataPortal offers a number of search interfaces, the ability to explore investigations and download associated data.

Top Left Hand Menu

Bottom Left Hand Menu

Top Right Hand Menu

Session Expire

DLS DataPortal

Questions?