EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org EGEE and gLite are registered trademarks Evaluating Metadata access strategies with.

Slides:



Advertisements
Similar presentations
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
Advertisements

E-Science Data Information and Knowledge Transformation The BinX Language.
Data Grids: Globus vs SRB. Maturity SRB  Older code base  Widely accepted across multiple communities  Core components are tightly integrated Globus.
1 An Introduction to OGSA-DAI Konstantinos Karasavvas 13 th September 2005.
RIZWAN REHMAN, CCS, DU. Advantages of ORDBMSs  The main advantages of extending the relational data model come from reuse and sharing.  Reuse comes.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Rice KRAD Data Layer JPA Design Eric Westfall July 2013.
Getting connected.  Java application calls the JDBC library.  JDBC loads a driver which talks to the database.  We can change database engines without.
An easy way to manage Relational Databases in the Globus Community Sandro Fiore ISUFI/ Center for Advanced Computational Technologies Director: prof. Giovanni.
AL-MAAREFA COLLEGE FOR SCIENCE AND TECHNOLOGY INFO 232: DATABASE SYSTEMS CHAPTER 7 INTRODUCTION TO STRUCTURED QUERY LANGUAGE (SQL) Instructor Ms. Arwa.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Metadata Creation with the Earth System Modeling Framework Ryan O’Kuinghttons – NESII/CIRES/NOAA Kathy Saint – NESII/CSG July 22, 2014.
1 Dr. Markus Hillenbrand, ICSY Lab, University of Kaiserslautern, Germany A Generic Database Web Service for the Venice Service Grid Michael Koch, Markus.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
ES Metadata Management Enabling Grids for E-sciencE ES metadata OGSA-DAI NA4 GA Meeting, D. Weissenbach, IPSL, France.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
INFSO-RI Enabling Grids for E-sciencE Supporting legacy code applications on EGEE VOs by GEMLCA and the P-GRADE portal P. Kacsuk*,
7 1 Chapter 7 Introduction to Structured Query Language (SQL) Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
1 1 EPCC 2 Curtin Business School & Edinburgh University Management School Michael J. Jackson 1 Ashley D. Lloyd 2 Terence M. Sloan 1 Enabling Access to.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks R-GMA Now With Added Authorization Steve.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
8 1 Chapter 8 Advanced SQL Database Systems: Design, Implementation, and Management, Seventh Edition, Rob and Coronel.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
INFSO-RI Enabling Grids for E-sciencE OGSA DAI Data Access and Integration Marek Ciglan Institute of Informatics, Slovac Academy.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Site Monitoring with Nagios E. Imamagic,
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
Mike Jackson EPCC OGSA-DAI Architecture + Extensibility OGSA-DAI Tutorial GGF17, Tokyo.
Metadata Mòrag Burgon-Lyon University of Glasgow.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Hands on session: the AMGA Metadata Catalogue.
INFSO-RI Enabling Grids for E-sciencE A service oriented framework to create, manage and update metadata for earth system science.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
INFSO-RI Enabling Grids for E-sciencE Intelligent Distributed Data Management in Earth System Science S. Kindermann, DKRZ, Germany.
INFSO-RI User Forum 1-3 March 2006 Enabling Grids for E-sciencE Worldwide ozone distribution by using Grid infrastructure ESA: L.
1 Service Creation, Advertisement and Discovery Including caCORE SDK and ISO21090 William Stephens Operations Manager caGrid Knowledge Center February.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarksEGEE-III INFSO-RI Astro-Wise and EGEE.
Basics of JDBC Session 14.
E-Science Data Information and Knowledge Transformation BinX – A Tool for Binary File Access eDIKT project team Ted Wen
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
INFSO-RI Enabling Grids for E-sciencE /10/20054th EGEE Conference - Pisa1 gLite Configuration and Deployment Models JRA1 Integration.
EGEE-II INFSO-RI Enabling Grids for E-sciencE P-GRADE overview and introduction: workflows & parameter sweeps (Advanced features)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite – UNICORE interoperability Daniel Mallmann.
Plug-In Architecture Pattern. Problem The functionality of a system needs to be extended after the software is shipped The set of possible post-shipment.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
INFSO-RI Enabling Grids for E-sciencE ESR Database Access K. Ronneberger,DKRZ, Germany H. Schwichtenberg, SCAI, Germany S. Kindermann,
ODBC, OCCI and JDBC overview
JDBC Database Management Database connectivity
SDMX Reference Infrastructure Introduction
MySQL Migration Toolkit
Plug-In Architecture Pattern
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Evaluating Metadata access strategies with the GOME test suite André Gemünd Fraunhofer SCAI

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 2 Motivation Testing the test suite –Sufficiency of specification and utility Investigate AMGA and GRelC as alternatives –Until now we‘ve used OGSA-DAI in NA4 –Used a java wrapper to access from Python and Perl –gLite integration

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 3 Introduction DEGREE Project –Dissemination and Exploitation of GRids in Earth sciencE –Bridge Earth Science and Grid Community –Identify barriers for broader acceptance –Identify and assess key requirements –Improve communication and collaboration

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 4 Introduction Test suites –Specify typical workflows for earth science applications –As white papers for testing Grid middleware –Organised and grouped into categories (data management, etc.) –Consisting of test cases with annotated tested requirements

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 5 Introduction GOME-Validation Test Suite –High amount of datasets from two sources  GOME satellite measurements  LIDAR ground station measurements –Correlate by metadata  geo-coordinates & date of measurement –Target components (as specified):  Data management  Database access  Workflow control

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 6 Proceeding What we did –Implement GOME-Validation as a representative workflow  Transmission and Grid registration of data files  Extraction and archiving of Metadata  Bidirectional correlation of files through Metadata  Abstraction of Metadata backend

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 7 Proceeding Software Design

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 8 Results Problems / Characteristics –Backend Compatibility –Data schema and types –Query language –GIS features –Indexing (IDs) –Bulk Action support –Hierarchical metadata –Reuse of Data

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 9 Results Database Compatibility –AMGA  uses ODBC MySQL, Oracle, pgSQL, etc.  Extensions and custom Functions need to be added to the Query Parser (Bison Grammar) –GRelC  C API libraries Config file states “choose between mysql and pgsql”  Needs pgSQL as configuration backend

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 10 Results Database Compatibility –OGSA-DAI  Unique strength  Uses JDBC, eXist and custom drivers  Write data providers for arbitrary data sources Databases and files already included  Combine data from different sources  Execute Transformations on data  Deliver to Grid-FTP, Gridservice, Client, …

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 11 Proceeding Data schema (OGSA-DAI & GRelC) –Raw SQL tables –Taken directly from Test suite specification –2 Tables  One for LIDAR and one for GOME files  Problem: 1 Lidar files hosts n datasets Different time / coordinates Save redundant or introduce relations?

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 12 Proceeding Data schema (AMGA) –We had to devise a modified schema  AMGA uses path structures  Entity-specific attributes –Leverage advantages  Dynamic change  Inheritance of attributes (hierarchy)

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 13 Results Using hierarchies in AMGA example –/gometest/lidar/ano/hgl/30108/  /ano/ Identifies station and thus also coordinates Here: Andoya, Norway  /hgl/ Author, here: Georg Hansen  /30108/ Identifies file entity  Files in this directory Real Datasets

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 14 Results Datatypes: Location of measurement –PostGIS Polygon  AMGA can use int, float, varchar, timestamp, text, or numeric But: unknown fieldtypes of database get returned as text  OGSA-DAI & GRelC let you choose No datatype abstraction  Function to determine containment? See query language

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 15 Results Datatypes –No additional types offered by the services –Desirable  Relations containment, adjacency, … Custom relations (ontology-like) isResultOf isUsedInExperiment  Array types –Not only abstraction but extension

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 16 Results Query language –OGSA-DAI and GRelC use SQL  Highly coupled to table schema  Differences in SQL dialect (e.g. pgSQL Oracle)  Support for SQL functions, Views, Extensions Syntax errors if extension is not enabled (e.g. PostGIS) –GRelC add. supports XMLDB query language  XPath XQuery –AMGA defines own query language  Makes for reusable queries / abstractions  May possibly limit query power Add. Functions need source change

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 17 Results Bulk Actions –AMGA additionally supports socket connection instead of document based (SOAP)  Low latency  Multiple queries without delay  High transfer rates possible –OGSA-DAI workflows  Pipeline, Parallel grouping of activities  Powerful but complicated

Enabling Grids for E-sciencE EGEE-II INFSO-RI Evaluating Metadata access strategies with the GOME test suite 18 Results What we would like to have  Integration of external data sources like OGSA-DAI For custom data sources like swiss-prot etc.  Integration to gLite Integration with file catalogue oBrowsable in both directions Support for aliases and replicas oAssess best replica for current location VOMS-based Authorization & Authentication  Extendible for GIS-features and the like  APIs for Java, C++, Python & Perl