The AMGA metadata catalog

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007.
Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
Oxford Jan 2005 RAL Computing 1 RAL Computing Implementing the computing model: SAM and the Grid Nick West.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
IST E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
FESR Trinacria Grid Virtual Laboratory The AMGA metadata catalog with use cases Riccardo Bruno - INFN gLite Tutorial Istanbul, July.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America AMGA Server Installation Tony Calanducci.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
Metadata Mòrag Burgon-Lyon University of Glasgow.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
Summary of Metadata Workshop Peter Hristov 28 February 2005 Alice Computing Day.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
FP6−2004−Infrastructures−6-SSA Enabling Grids for E-sciencE The AMGA Metadata Catalog Introduction and hands-on exercises Nuno Santos.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
EGEE is a project funded by the European Union under contract IST Feedback on the gLite middleware Dietrich Liko / IT - LCG ARDA Workshop,
FESR Consorzio COMETA - Progetto PI2S2 The AMGA Metadata Catalog with use cases Salvatore Scifo, Tony Calanducci INFN Catania Grid.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Web and mobile access to digital repositories Mario Torrisi National Institute of Nuclear Physics – Division of
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI solution for high throughput data analysis Peter Solagna EGI.eu Operations.
Grid based telemedicine application
Jean-Philippe Baud, IT-GD, CERN November 2007
gLite Basic APIs Christos Filippidis
StoRM: a SRM solution for disk based storage systems
AMGA - Official Metadata Service for EGEE
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Security and Replication of Metadata with AMGA
Cross-health enterprises Medical Data Management on the EGEE grid
Metadata Services on the GRID
AMGA Web Interface Salvatore Scifo INFN sez. Catania
Grid Computing.
Short update on the latest gLite status
A Messaging Infrastructure for WLCG
Alice Off-line Week, February 24th, 2005
GSAF Grid Storage Access Framework
gLite Information System
PHP / MySQL Introduction
LCG middleware and LHC experiments ARDA project
GSAF Grid Storage Access Framework
New developments on the LHCb Bookkeeping
Monitoring of the infrastructure from the VO perspective
SCL, Institute of Physics Belgrade, Serbia
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
AMGA Web Interface Vincenzo Milazzo
EGEE Middleware: gLite Information Systems (IS)
The GENIUS portal and the GILDA t-Infrastructure
Metadata Services on the GRID
Grid Engine Diego Scardaci (INFN – Catania)
Presentation transcript:

The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007

Background and Motivation for AMGA Contents Background and Motivation for AMGA Interface, Architecture and Implementation Metadata Replication on AMGA Use cases

Metadata is data about data On the Grid: information about files Metadata on the GRID Metadata is data about data On the Grid: information about files Describe files Locate files based on their contents But also makes DB access a simple task on the Grid Many Grid applications need structured data Many applications require only simple schemas Can be modelled as metadata Main advantage: better integration with the Grid environment Metadata Service is a Grid component Grid security Hide DB heterogeneity

ARDA/gLite Metadata Interface 2004 - ARDA evaluated existing Metadata Services from HEP experiments AMI (ATLAS), RefDB (CMS), Alien Metadata Catalogue (ALICE) Similar goals, similar concepts Each designed for a particular application domain Reuse outside intended domain difficult Several technical limitations: large answers, scalability, speed, lack of flexibility ARDA proposed an interface for Metadata access on the GRID Based on requirements of LHC experiments But generic - not bound to a particular application domain Designed jointly with the gLite/EGEE team Incorporates feedback from GridPP Adopted as the official EGEE Metadata Interface Endorsed by PTF (Project Technical Forum of EGEE)

ARDA developed a Project Task Force in order to develop: AMGA Implementation ARDA developed a Project Task Force in order to develop: AMGA – ARDA Metadata Grid Application Began as prototype to evaluate the Metadata Interface Evaluated by community since the beginning: LHCb and Ganga were early testers (more on this later) Matured quickly thanks to users feedback Now is part of the gLite middleware Official Metadata Service for EGEE First release with gLite 1.5 Also available as standalone component It is expanding to other user communities: HEP, Biomed, UNOSAT…

Some Concepts: Metadata - List of attributes associated with entries Metadata Concepts Some Concepts: Metadata - List of attributes associated with entries Attribute – key/value pair with type information Type – The type (int, float, string,…) Name/Key – The name of the attribute Value - Value of an entry's attribute Schema – A set of attributes Collection – A set of entries associated with a schema Think of schemas as tables, attributes as columns, entries as rows

Metadata organised as an hierarchy AMGA Features Dynamic Schemas Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes Metadata organised as an hierarchy Collections can contain sub-collections Analogy to file system: Collection  Directory; Entry  File Flexible Queries SQL-like query language Joins between schemas Example QUERY EXAMPLE: selectattr /gLibrary:FileName \ /gLibrary:Author \ ‘/gLibrary:FILE=/gLAudio:FILE \ and \ like(/gLibrary:FileName,“%.mp3")‘

Unix style permissions ACLs – per-collection or per-entry. AMGA Security Unix style permissions ACLs – per-collection or per-entry. Secure connections – SSL Client Authentication based on Username/password General X509 certificates Grid-proxy certificates Access control via a Virtual Organization Management System (VOMS)

AMGA Implementation C++ multiprocess server Backends Two frontends Runs on any Linux flavour Backends Oracle, MySQL, PostgreSQL, SQLite Two frontends TCP Streaming High performance Client API for: C++, Java, Python, Perl, Ruby SOAP Interoperability Also implemented as standalone Python library Data stored on filesystem

Architecture TCP-Streaming frontend Designed for scalability Asynchronous operation Reading from DB and sending data to client Response sent to client in chunks No limit on the maximum response size Example: TCP Streaming Text based protocol (like SMTP, POP3,…) Response streamed to client Client: listattr entry Server: 0 entry value1 value2 … <EOT>

Metadata Replication 1/2 Motivation Scalability – Support hundreds/thousands of concurrent users Geographical distribution – Hide network latency Reliability – No single point of failure DB Independent replication – Heterogeneous DB systems Disconnected computing – Off-line access (laptops) Architecture Asynchronous replication Master-slave – Writes only allowed on the master Replication at the application level Replicate Metadata commands, not SQL → DB independence Partial replication – supports replication of only sub-trees of the metadata hierarchy

Metadata Replication 2/2 Full replication Partial replication Federation Proxy

LHCb-bookkeeping (keep additional information from executed jobs) Early adopters of AMGA LHCb-bookkeeping (keep additional information from executed jobs) Migrated bookkeeping metadata to ARDA prototype 20M entries, 15 GB Large amount of static metadata Feedback valuable in improving interface and fixing bugs AMGA showing good scalability Ganga Job management system Developed jointly by Atlas and LHCb Uses AMGA for storing information about job status Small amount of highly dynamic metadata

TCP Streaming Front-end Accessing AMGA TCP Streaming Front-end mdcli & mdclient and C++ API (md_cli.h, MD_Client.h) Java Client API and command line mdjavaclient.sh & mdjavacli.sh (also under Windows) Python Client API SOAP Frontend (WSDL) C++ gSOAP AXIS (Java) ZSI (Python)

AMGA – Metadata Service of gLite Conclusion AMGA – Metadata Service of gLite Part of gLite (but still not certificed in gLite 3.0. it will be done with 3.1 release) Useful for simplified DB access Integrated on the Grid environment (Security) Replication/Federation features Tests show good performance/scalability Already deployed by several Grid Applications LHCb, ATLAS, Biomed, … AMGA Web Site http://project-arda-dev.web.cern.ch/project-arda-dev/metadata/

Biomed: Medical Data Manager AMGA usage examples Biomed: Medical Data Manager Deployed on EGEE production grid gMOD Deployed on GILDA I’ll start by giving a brief overview of what metadata means to the GRID. Then, I’ll present the Metadata Interface developed by ARDA and gLite, which addresses the most common use cases for GRID metadata. I’ll continue by describing the prototype implemented by ARDA to validate this interface. I’ll finish by presenting the results of a performance study made using this prototype, where SOAP is compared with a traditional RPC protocol based on streaming.

Biomed: Medical Data Manager Store and access medical images exploiting metadata on the Grid Built on top of gLite 1.5 data management system Demonstrated at last EGEE conference (October 05, Pisa) Strong security requirements Patient data is sensitive Data must be encrypted Metadata access must be restricted to authorized users AMGA used as metadata server Demonstrates authentication and encrypted access Used as a simplified DB More details at: http://www.i3s.unice.fr/~johan/mdm/mdm-051013.pdf

gMOD: grid Movie On Demand gMOD provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed.

gMOD under the hood Built on top of gLite services + GENIUS web portal: Storage Elements, sited in different places, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop

Workload Management System gMOD interactions VOMS LFC Catalogue Metadata WN CE Storage Elements User Genius Portal Workload Management System get Role AMGA

gMOD screenshot gMOD is accesible through the Genius Portal (https://glite-tutor.ct.infn.it) Selecting from left side menu: VO Services/gMOD