Taming the facility data explosion The ICAT system explained Damian Flannery NOBUGS 2008 Sydney ICAT.

Slides:



Advertisements
Similar presentations
Connecting to Databases. relational databases tables and relations accessed using SQL database -specific functionality –transaction processing commit.
Advertisements

Towards an information model for I2S2
I2S2 - Infrastructure for Integration in Structural Sciences Cross-Institutional Pilot
I2S2 - Infrastructure for Integration in Structural Sciences Information Model Development Workshop RAL 11 th February 2010
Chapter 20 Oracle Secure Backup.
ICAT + Information Model Brian Matthews Scientific Information Group E-Science Centre STFC Rutherford Appleton Laboratory
Developing an XBRL Reporting Architecture Rafael Valero Arce Fujitsu España Services es.fujitsu.com.
ICAT Integration at DLS. Alun Ashton. What were the requirements? Integrate with current business system Collect Data and Metadata relating to a proposal.
G O B E Y O N D C O N V E N T I O N WORF: Developing DB2 UDB based Web Services on a Websphere Application Server Kris Van Thillo, ABIS Training & Consulting.
NCS Grid Service Ken Meacham, IT Innovation Crystal Grid Workshop, Sept 2004.
Software Frameworks for Acquisition and Control European PhD – 2009 Horácio Fernandes.
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Sharepoint Portal Server Basics. Introduction Sharepoint server belongs to Microsoft family of servers Integrated suite of server capabilities Hosted.
Talend 5.4 Architecture Adam Pemble Talend Professional Services.
Towards Bboogle 3.0.0: a Technical Walkthrough Patricia Goldweic Sr. Software Engineer AR&T, Northwestern University Brian Nielsen Manager, Faculty Support.
Copyright © The OWASP Foundation Permission is granted to copy, distribute and/or modify this document under the terms of the OWASP License. The OWASP.
Linux Operations and Administration
Apache Chemistry face-to-face meeting April 2010.
C Copyright © 2009, Oracle. All rights reserved. Appendix C: Service-Oriented Architectures.
Enticy GROUP THE A Framework for Web and WinForms (Client-Server) Applications “Enterprise Software Architecture”
Chemical Toxicity and Safety Information System Shuanghui Luo Ying Li Jin Xu.
95-843: Service Oriented Architecture 1 Master of Information System Management Service Oriented Architecture Lecture 10: Service Component Architecture.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
Integrated e-Infrastructure for Scientific Facilities Kerstin Kleese van Dam STFC- e-Science Centre Daresbury Laboratory
Event-Based Model for Reconciling Digital Entries Thesis Proposal Ahmet Fatih Mustacoglu 10/3/20151Ahmet.
ICAT Overview Tom Griffin, ISIS Facility ICAT Developer Workshop The Cosener’s House, Abingdon August 2009
Metadata for Large Science: The ICAT Data Model Brian Matthews, Leader, Scientific Applications Group, E-Science Centre, STFC Rutherford Appleton Laboratory.
ANSTO E-Science workshop Romain Quilici University of Sydney CIMA CIMA Instrument Remote Control Instrument Remote Control Integration with GridSphere.
Designing and Developing WS B. Ramamurthy. Plans We will examine the resources available for development of JAX-WS based web services. We need an IDE,
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice HP Library Encryption - LTO4 Key.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Event-Based Hybrid Consistency Framework (EBHCF) for Distributed Annotation Records Ahmet Fatih Mustacoglu Advisor: Prof. Geoffrey.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
3-Tier Client/Server Internet Example. TIER 1 - User interface and navigation Labeled Tier 1 in the following graphic, this layer comprises the entire.
Jamie Hall (ILL). SciencePAD Persistent Identifiers Workshop PANData Software Catalogue January 30th 2013 Jamie Hall Developer IT Services, Institut Laue-Langevin.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Ibm.com /redbooks © Copyright IBM Corp All rights reserved. WP07 ITSO iSeries Technical Forum WebSphere Portal Express– Installation, Configuration.
Chapter 2 Database System Concepts and Architecture Dr. Bernard Chen Ph.D. University of Central Arkansas.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Metadata for structural science Workshop on research metadata in context Nijmegen, 7–8 September 2010 Simon Lambert STFC e-Science UK.
TopCAT Use Cases Priorities User Interface 1 ICAT developer workshop, August 2009 Laurent Lerusse – STFC
Louisa Casely-Hayford e-Science The ISIS Facilities Ontology and OntoMaintainer Louisa Casely-Hayford and Shoaib Sufi.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
ICAT Schema Current Schema organization What’s there but not yet implemented What could we want in the future 1 ICAT developer workshop, August 2009.
ICM – API Server & Forms Gary Ratcliffe.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
1 AHM, 2–4 Sept 2003 e-Science Centre GRID Authorization Framework for CCLRC Data Portal Ananta Manandhar.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Using the ICAT API to ingest business and experiment metadata Tom Griffin, STFC ISIS Facility NOBUGS 2012 ICAT Workshop
ICAT Integration at ISIS Tom Griffin, ISIS Facility ICAT Developer Workshop The Cosener’s House, Abingdon August 2009
1 ECHO ECHO 9.0 for Data Partners Rob Baker January 23, 2007.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Module 5: Managing Content. Overview Publishing Content Executing Reports Creating Cached Instances Creating Snapshots and Report History Creating Subscriptions.
Google Code Libraries Dima Ionut Daniel. Contents What is Google Code? LDAPBeans Object-ldap-mapping Ldap-ODM Bug4j jOOR Rapa jongo Conclusion Bibliography.
ICAT Status Alistair Mills Project Manager Scientific Computing Department.
 Project Team: Suzana Vaserman David Fleish Moran Zafir Tzvika Stein  Academic adviser: Dr. Mayer Goldberg  Technical adviser: Mr. Guy Wiener.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
CRISP WP 17 1 / 2 Proposed Metadata Catalogue Architecture Document.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Chapter 2 Database System Concepts and Architecture
Sabri Kızanlık Ural Emekçi
ICAT- Experience and activities at ISIS
Grid Portal Services IeSE (the Integrated e-Science Environment)
IIS.
Service-centric Software Engineering
Lecture 1: Multi-tier Architecture Overview
Developing and testing enterprise Java applications
Distributed System using Web Services
Presentation transcript:

Taming the facility data explosion The ICAT system explained Damian Flannery NOBUGS 2008 Sydney ICAT

Damian Flannery The Problem(s) ICAT Large Data Volumes High Throughput Proliferation of data formats Multiple Data Analysis Step Increasing complexity of data Data Access requirements (Sharing and Restriction) Versioning of data formats and associated software Distributed Computation (accessed offline from research chain) Common names and units for temperature, pressure etc. Changing / differing metadata requirements International users / federation of data from facilities Relating to Proposals and Publications Ontologies Provenance (Creation, Ownership, History) Governments want return on investment

What is ICAT ? ICAT is a database (with a well defined API) that provides a uniform interface to experimental data and a mechanism to link all aspects of research from proposal through to publication. Access data anywhere via the web Annotate your data Search for data in a meaningful way e.g. taxonomy, Sample, temperature, pressure etc Share data with colleagues Access data via your own programs (C++, Fortran, Java etc.) via the ICAT API Identify potential collaborations Utilise integrated e-Science High- Performance Computing and Visualisation resources Link to data from your publications Etc. Proposals Once awarded beamtime at ISIS, an entry will be created in ICAT that describes your proposed experiment. Experiment Data collected from your experiment will be indexed by ICAT (with additional experimental conditions) and made available to your experimental team Analysed Data You will have the capability to upload any desired analysed data and associate it with your experiments. Publication Using ICAT you will also be able to associate publications to your experiment and even reference data from your publications. B-lactoglobulin protein interfacial structure Example ISIS Proposal GEM – High intensity, high resolution neutron diffractometer H2-(zeolite) vibrational frequencies vs polarising potential of cations Damian Flannery What is ICAT? ICAT

RDBMS Web Services API ICAT API Command Line Tools Glassfish / JBOSS JavaC++Fortran Data Storage/ Delivery System Single Sign On User Database System Proposal System Publication System e-Science Services Software Repository Damian Flannery Overview ICAT

Damian Flannery Federation ICAT RDBMS Web Services API ICAT API Data Storage/ Delivery System Single Sign On User Database System Proposal System Publication System e-Science Services Software Repository RDBMS Web Services API ICAT API Data Storage/ Delivery System Single Sign On User Database System Proposal System Publication System e-Science Services Software Repository RDBMS Web Services API ICAT API Data Storage/ Delivery System Single Sign On User Database System Proposal System Publication System e-Science Services Software Repository ISIS SNS ANSTO Data Portal

Investigation PublicationKeywordTopic Sample Sample Parameter Dataset Dataset Parameter Datafile Datafile Parameter Investigator Reference / Proposal Id Previous Reference Facility Instrument Title Abstract Etc. Name Name/Units/Value etc Searchable Is Sample Parameter Is Dataset Parameter Is Datafile Parameter Verified Name Units String Value Numeric Value Range Top Range Bottom Error Full Reference URL Repository Name Parent Id Topic Level User Id Role Name Chemical Formula Safety Information Name Units String Value Numeric Value Range Top Range Bottom Error Name Sample Id Description Name Units String Value Numeric Value Range Top Range Bottom Error Name Description Version Location Format Format Version Create Time Modify Time Size Checksum Related Datafile Parameter Authorisation Source Datafile Id Destination Datafile Id Relation S/W Apllication S/W Version User Id Role e.g Admin, Deleter, Updater, Reader, Creater, Downloader etc. Element Type Element Id Damian Flannery Data Model ICAT

Service Oriented Architecture »Services exposed as Web Services »User required to authenticate in order to obtain Session Token »Token is used in all subsequent API calls to for authorisation The API is modular in order to fit the needs of the facilities »Plugin own user database »Plugin data delivery system Chracteristics »Platform independent [Java] »Application Server independent [EJB3] »Database Independent (Almost!) [JPL] »Language independent [Web Services] Internals »Core functionality implemented as POJOs using JPA »For deployment EJB3 Session Beans bind the core API, user db and data delivery aspects together »Services are unit tested using JUNIT »Services are logged at every interaction point using LOG4J Damian Flannery ICAT API ICAT

Damian Flannery ICAT API Continued ICAT

Damian Flannery ICAT Client ICAT

Damian Flannery Data Portal ICAT

Role based permissions »[Super] »Admin »Create »Delete »Update »Download »Read Data Policy »3 year embargo on data (+1 if requested) »Commercial data is never made public »Instrument Scientists can access all data from their beamline »Calibration data is public »Any data that involves IPR (e.g. analysed) is private for perpetuity unless explicitly shared by user SSL Damian Flannery Security ICAT

Technologies Used »Java »NetBeans 6.1 »Glassfish UR2 »Ant »JUnit »JMeter »Log4J »EJB3 »JPA »JAX-WS »JAXB »Oracle (10G / 11G) »Subversion Damian Flannery Installation / Development ICAT Development Installation »Any O/S »Oracle 10G/11G »Java 6 Update 6 »Apache Ant v1.7+ »Glassfish v2 UR2 »Installed & Configured Cog Kit »Unzip download bundle »Update properties files e.g. database details »Run Ant commands

Damian Flannery User Database ICAT

Damian Flannery Data Delivery ICAT Data Portal ICAT API Data.ISIS User performs search via application e.g. Data Portal Search is executed in ICAT Permitted results are returned to application Results are displayed to the user User performs request to download datafile, multiple datafiles or dataset ICAT creates http GET link and passes to back to user (routed through application) sessionId (optional) fileId(s) or datasetId action (i.e. download, zip, compressed) 6 6 User clicks http link Data.ISIS call ICAT API to check permissions sessionId & datafileId(s) or datasetId Return Exception on failure or DownloadObject on success - userId - array [filename, cycle, run number] User gets their data! 10

Damian Flannery Data Delivery Continued ICAT

Client Damian Flannery XML Ingest ICAT RDBMS Web Services API ICAT API Data Storage/ Delivery System Single Sign On User Database System Proposal System Publication System e-Science Services Software Repository XSD XMLIngest(xml) InvestigationId Validation

Damian Flannery ISIS Integration ICAT Trigger NXIngest RawIngest

Damian Flannery Developers ICAT

Damian Flannery Future Developments ICAT Release Data Portal to ISIS users Move XML Ingest into asynchonous Message Driven Bean Rule-based policy implementation Expand and improve the supplied interface Proposal System integration Publication System integration Database independent Consequence… Look at issue/tickets & forum!

Damian Flannery Summary ICAT At ISIS »Volume of data ~4TB »~3M datafiles (22 instruments, 330/hour) »6.7GB metadata, 33M rows »550+ unit & stress tests Attempt to solve problems as outlined earlier in this talk Software characteristics »Scalability »Maintainability »Reliability »Availability »Extensibility »Performance »Manageability »Security We want to drive this forward We would like to do it in collaboration with other facilities

Damian Flannery Acknowledgements ICAT ISIS »Robert McGreevy, Kenneth Shankland, Tom Griffin, Stuart Ansell »Freddie Akeroyd, Chris Moreton-Smith, Matt Clarke, Kevin Knowles, Steven King, Adrian Hillier, Alex Hannon, Rob Dalgleish e-Science »Glen Drinkwater, Shoaib Sufi, Kerstin Kleese Van Dam, Laurent Lerusse, Rik Tyer, Phil Couch »Gordon Brown, Kier Hawker, Carmine Coiffe »Roger Downing

Damian Flannery Questions ICAT