Www.consorzio-cometa.it FESR Consorzio COMETA - Progetto PI2S2 The AMGA Metadata Catalog with use cases Salvatore Scifo, Tony Calanducci INFN Catania Grid.

Slides:



Advertisements
Similar presentations
Instant JChem - current status and what's coming soon. Tim Dudgeon Solutions for Cheminformatics.
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007.
Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
FESR Consorzio COMETA Grid Introduction and gLite Overview Corso di formazione sul Calcolo Parallelo ad Alte Prestazioni (edizione.
IST E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
FESR Trinacria Grid Virtual Laboratory The AMGA metadata catalog with use cases Riccardo Bruno - INFN gLite Tutorial Istanbul, July.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America AMGA Server Installation Tony Calanducci.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
The Advanced Data Searching System The Advanced Data Searching System with 24 February APCTP 2010 J.H Kim & S. I Ahn & K. Cho on behalf of the Belle-II.
EGRIS-1 E-infrastructure shared between Europe and Latin America AMGA Metadata Services: examples and usage scenarios Tony Calanducci INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Metadata Mòrag Burgon-Lyon University of Glasgow.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks A GRID based platform to host multiple repositories.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
FP6−2004−Infrastructures−6-SSA Enabling Grids for E-sciencE The AMGA Metadata Catalog Introduction and hands-on exercises Nuno Santos.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
FESR Consorzio COMETA - Progetto PI2S2 GSAF Grid Storage Access Framework Salvatore Scifo
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
INFSO-RI Enabling Grids for E-sciencE University of Coimbra gLite 1.4 Data Management System Salvatore Scifo, Riccardo Bruno Test.
Miguel Ángel Saúl Soto INFN - Sezione di Catania Supervisor: Antonio Calanducci
Storing digital assets on Grid/EGI FedCloud with gLibrary Giuseppe La Rocca, INFN DARIAH ERIC.
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) Integration of China Relics and gLite with gLibrary You MENG
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
REST API to develop application for mobile devices Mario Torrisi Dipartimento di Fisica e Astronomia – Università degli Studi.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA GRIDICOM G. Foti 1), S. Scifo 2), R. Barbera 3), F. Portuese 4), S. Parisi 5)
The eCSG Mobile App Mario Torrisi INFN – Division of Catania 24 June 2013 Webinar on the eCSG 1.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Scuola Grid INFN, Trieste, 1-12 Dic Managing Confidential Data in the gLite Middleware – The Secure Storage.
Web and mobile access to digital repositories Mario Torrisi National Institute of Nuclear Physics – Division of
Grid based telemedicine application
gLite Basic APIs Christos Filippidis
AMGA - Official Metadata Service for EGEE
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Security and Replication of Metadata with AMGA
Metadata Services on the GRID
AMGA Web Interface Salvatore Scifo INFN sez. Catania
Alice Off-line Week, February 24th, 2005
GSAF Grid Storage Access Framework
GSAF Grid Storage Access Framework
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
AMGA Web Interface Vincenzo Milazzo
The AMGA metadata catalog
EGEE Middleware: gLite Information Systems (IS)
Metadata Services on the GRID
Presentation transcript:

FESR Consorzio COMETA - Progetto PI2S2 The AMGA Metadata Catalog with use cases Salvatore Scifo, Tony Calanducci INFN Catania Grid Tutorial per l'Universita' di Catania Catania,

Catania, Grid Tutorial per l'Universita' di Catania, Outline Background and Basic concepts AMGA Architecture and Features Metadata Replication on AMGA Examples AMGA Web Interface gLibrary

Catania, Grid Tutorial per l'Universita' di Catania, Searching file on the Grid Storage Grids currently contain millions of files spread over several storage sites. Users and applications need an efficient mechanism –to find the files of interest –to discover and query information about contents of their files This is provided –by associating descriptive attributes (metadata) to files –by exposing this information in catalogues, accessible and searchable by user and client application

Catania, Grid Tutorial per l'Universita' di Catania, AMGA – ARDA Metadata Grid Application A Realisation of Distributed Analysis for LHC AMGA is a metadata service for the Grid –It represents a database access service for Grid applications which allows user, and user jobs to discovery data describing their files in order to access them in the appropriate way. AMGA service is based on RDBMS. –It allows to define metadata schemas according to users and applications needs –It provides a replication layer which makes databases locally available to user jobs and replicate the changes between the different participating databases. AMGA has been designed to provide a best integration with the Grid environment –Metadata Service is a Grid component –Grid security compliant –Hide DB heterogeneity

Catania, Grid Tutorial per l'Universita' di Catania, Metadata concepts and terminology Metadata Schema – A set of attributes Attribute – key/value pair –Type – The type (int, float, string,…) –Name/Key – The name of the attribute –Value - Value of an entry's attribute Entries – Entities/Objects which we are attaching metadata to Collection – A set of entries associated with a schema (AMGA collections are hierarchical organized ) Analogy to the RDBMS world: think of collections as tables, attributes as columns, entries as rows

Catania, Grid Tutorial per l'Universita' di Catania, Example: Movie Trailers Movie trailers files (entries) saved on Grid Storage Elements and registered into a LFC File Catalogue We have a LFN (Logical File Name) per movie files We want to add metadata to describe movie content. Possible schema: –Title -- varchar –Runtime -- int –Cast -- varchar –LFN -- varchar AMGA will be the repository of the movies’ metadata

Catania, Grid Tutorial per l'Universita' di Catania, Example: exchanging data among running jobs Suppose we have two sets of jobs: –Producers: they generate a file, store on a SE, register it onto the LFC File Catalogue assigning a LFN –Consumers: they will take a LFN, download the file and elaborate it AMGA can be used to share the information generated by the Producers, it could act as a “bag-of-LFNs” (bag- of-task model) from which Consumers can fetch file for further elaboration

Catania, Grid Tutorial per l'Universita' di Catania, AMGA Features Dynamic Schemas –Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes  Add, remove entries Metadata organised as an hierarchy –Collections can contain sub-collections –Analogy to file system:  Collection  Directory; Entry  File Flexible Queries –SQL-like query language –Joins between schemas selectattr /gLibrary:FileName /gLAudio:Author /gLAudio:Album '/gLibrary:FILE=/gLAudio:FILE and like(/gLibrary:FileName, “%.mp3")‘

Catania, Grid Tutorial per l'Universita' di Catania, Security Unix style permissions (rwx) ACLs – Per-collection or per-entry. Secure connections – SSL Client Authentication based on –Username/password –General X509 certificates –Grid-proxy certificates Access control via a Virtual Organization Management System (VOMS):

Catania, Grid Tutorial per l'Universita' di Catania, AMGA Implementation C++ multiprocess server –Runs on any Linux flavour Backends –Oracle, MySQL, PostgreSQL, SQLite Two frontends –TCP Streaming  High performance  Client API for C++, Java, Python, Perl, Ruby –SOAP  Interoperability Also implemented as standalone Python library –Data stored on filesystem

Catania, Grid Tutorial per l'Universita' di Catania, AMGA metadata types AMGA Datatypes –Using the above datatypes you are sure that your metadata can be easily moved to all supported back-ends –If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back- end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones)

Catania, Grid Tutorial per l'Universita' di Catania, Metadata Replication Motivation –Scalability – Support hundreds/thousands of concurrent users –Geographical distribution – Hide network latency –Reliability – No single point of failure –DB Independent replication – Heterogeneous DB systems –Disconnected computing – Off-line access (laptops) Architecture –Asynchronous replication –Master-slave – Writes only allowed on the master –Replication at the application level  Replicate Metadata commands, not SQL → DB independence –Partial replication – supports replication of only sub-trees of the metadata hierarchy

Catania, Grid Tutorial per l'Universita' di Catania, Metadata Replication Full replication Partial replication FederationProxy Some use cases

Catania, Grid Tutorial per l'Universita' di Catania, Early adopters of AMGA LHCb-bookkeeping –Migrated bookkeeping metadata to ARDA prototype  20M entries, 15 GB  Large amount of static metadata –Feedback valuable in improving interface and fixing bugs –AMGA showing good scalability Ganga –Job management system  Developed jointly by Atlas and LHCb –Uses AMGA for storing information about job status  Small amount of highly dynamic metadata

Catania, Grid Tutorial per l'Universita' di Catania, Accessing AMGA from UI/WNs TCP Streaming Front-end –mdcli & mdclient and C++ API (md_cli.h, MD_Client.h) –Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows !!) –Python and Perl Client API –PHP Client API – NEW  Developed totally by the GILDA team – INFN CT HTTP over TCP –AMGA Web Interface – NEW  Developed totally by the GILDA team – INFN CT  Pure 100% Java Based on Java AMGA Standard APIs SOAP Frontend (WSDL) –C++ gSOAP –AXIS (Java) –ZSI (Python)

Catania, Grid Tutorial per l'Universita' di Catania, AMGA WI: High Level Requirement Access Control –login as AMGA user providing a valid VOMS proxy file –permission for collection management (change mode – change owner) –ACL for collection management (list group, add group, drop group) Collection Management –collection tree browsing –collection creation –collection deletion –collection ACL management Metadata management –entry listing / searching –entry creation (insert entry name with attribute values) –entry modification (modify attribute values) –entry deletion –schema management  attribute listing  attribute creation  attribute deletion Note: all web functionalities are performed according to the business rules of the AMGA Server

Catania, Grid Tutorial per l'Universita' di Catania, AMGA WI: Collection Management

Catania, Grid Tutorial per l'Universita' di Catania, AMGA WI : Metadata Schema Management

Catania, Grid Tutorial per l'Universita' di Catania, AMGA WI : Access Control Management

Catania, Grid Tutorial per l'Universita' di Catania, AMGA WI : Beta DEMO Application URL : WIKI pages :

Catania, Grid Tutorial per l'Universita' di Catania, Use case : Biomed Medical Data Manager – MDM –Store and access medical images and associated metadata on the Grid –Built on top of gLite 1.5 data management system –Demonstrated at last EGEE conference (October 05, Pisa) Strong security requirements –Patient data is sensitive –Data must be encrypted –Metadata access must be restricted to authorized users AMGA used as metadata server –Demonstrates authentication and encrypted access –Used as a simplified DB More details at –

Catania, Grid Tutorial per l'Universita' di Catania, Use case : gLibrary Huge amounts of data can be saved on SEs (did we forget about the existence of Data Grids?) But how can we easily find later a file that we need? –(if you have good memory, its GUID could be a solution ) –File Catalogues just let us to arrange files in folders and subfolders, no way to query on their contents –Metadata Catalogues are a possible solution, but not always “affordable” especially for non expert users (powerful but complex to use) Our solution: a higher level application built on top of many gLite grid services: a Metadata Catalogue + File Catalogues + Storage Elements  Requirements: easy to use, fast, secure, extensible

Catania, Grid Tutorial per l'Universita' di Catania, gLibrary Application built on top of gLite grid services: –Metadata Catalogue + File Catalogue + Storage Elements –easy to use, fast, secure, extensible Attempt to create a Digital Asset Management System for the Grid –Examples of Digital Assets handled by gLibrary:  Images  Videos  Audio Files  Office Documents (Powerpoint, Word, Excel, OpenOffice)  s, PDFs, HTMLs  Customized versions of the previous well-know document type (ex. EGEE PPTs)  …. Keep track and organize in a uniform way all the additional details (metadata) of files saved in Storage Elements and registered in File Catalogues Provide users an easy way to locate and retrieve files based on their contents

Catania, Grid Tutorial per l'Universita' di Catania, Some usage scenarios Example 1: –Locate all theoretical (PPTType) PowerPoint (Type) presentations about “gLite DMS” (Keywords) given in 2005 (Date) by Uncle Sam (Speaker); –Find all the movies (Type) in which Julia Roberts (Cast) performed together with Hugh Grant (Cast) produced in USA (Country) in 2004 (ReleaseDate); or all the acoustic (Genre) mp3 (Format) audio files (Type) of Alanis Morissette (Singer) that last more than 3 minutes (Runtime). Example 2: –A doctor is looking for brain (keyword) DICOM (Type) images of male (Gender) patients older than 65 (Age). Example 3: –A job can behave as a storage crawler: it scans pre-existing files in Storage Elements to extract relevant metadata that will be published on gLibrary for further data mining.

Catania, Grid Tutorial per l'Universita' di Catania, Some gLibrary features Hierarchical types. Ex: –Audio  Music  Ringtones  SoundEffects –Video  Movies  Trailers  Clips Intuitive browsing (iTunes) with 3 customizable filter fields. Ex: –for Music you can browse by Genre, Artist, Album, Year, Rating, Format) –for Movie you can browse by Genre, ReleaseDate, Studio, Country, Director) Grouping of assets by Categories –to put together assets of different types but belonging to the same category (think for ex to all the files needed in a given project: images, ppt, pdf, sounds) –to narrow furtherly the assets of a given type (ex.: music playlists, preferred movies) Your digital libraries accessible from everywhere through a Web 2.0 frontend (AJAX based + Java Applets + PHP 5)

Catania, Grid Tutorial per l'Universita' di Catania, gLibrary Security User Requirements: –a valid proxy with VOMS extensions –VOMS Role and Group needed to be recognized by gLibrary as a contents manager. 3 kinds of users: –gLibraryManager: (s)he can create new content type and allows a generic VO user to become gLibrarySubmitter –gLibrarySubmitters: they can add new entries and define access rights on the entries they create.  Fine-grained permission (reading, writing, listing, decrypting) settings on each entry: whole VO members, VO groups, list of DNs –generic VO users: browse and make queries (on entries they have access to)

Catania, Grid Tutorial per l'Universita' di Catania, gLibrary Beta DEMO

Catania, Grid Tutorial per l'Universita' di Catania, gMOD: grid Movie On Demand gMOD provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed.

Catania, Grid Tutorial per l'Universita' di Catania, gMOD under the hood Built on top of gLite services: Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop

Catania, Grid Tutorial per l'Universita' di Catania, gMOD interactions VOMS LFC File Catalogue Metadata Catalogue WNWN WN CE Storage Elements User GENIUS Portal Workload Management System get Role AMGA

Catania, Grid Tutorial per l'Universita' di Catania, gMOD screenshot gMOD is accesible through the GENIUS Portal (

Catania, Grid Tutorial per l'Universita' di Catania, Conclusion AMGA – Metadata Service of gLite –Part of gLite (but still not certificed in gLite 3.0. it will be done with 3.1 release) –Useful for simplified DB access –Integrated on the Grid environment (Security) Replication/Federation features Tests show good performance/scalability Already deployed by several Grid Applications –LHCb, ATLAS, Biomed, … –AMGA WI, ADAT, gMOD, gLibrary –Grid Storage Access Framework

Catania, Grid Tutorial per l'Universita' di Catania, References AMGA Project Homepage – AMGA User Manual – Exercise documentation from ISSGC’06: – – rcise.htmhttp:// rcise.htm AMGA GILDA Wiki pages: – – –