Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides.

Slides:



Advertisements
Similar presentations
Chapter 10: Designing Databases
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
Session 5: Working with MySQL iNET Academy Open Source Web Development.
KISTI’s Activities on the NA4 Biomed Cluster Soonwook Hwang, Sunil Ahn, Jincheol Kim, Namgyu Kim and Sehoon Lee KISTI e-Science Division.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
IST E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases Domenico Vicinanza, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
Dr. Mohamed Osman Hegazi 1 Database Systems Concepts Database Systems Concepts Course Outlines: Introduction to Databases and DBMS. Database System Concepts.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
NMED 3850 A Advanced Online Design January 12, 2010 V. Mahadevan.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
FESR Trinacria Grid Virtual Laboratory The AMGA metadata catalog with use cases Riccardo Bruno - INFN gLite Tutorial Istanbul, July.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America AMGA Server Installation Tony Calanducci.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
EGRIS-1 E-infrastructure shared between Europe and Latin America AMGA Metadata Services: examples and usage scenarios Tony Calanducci INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Getting started DIRAC Project. Outline  DIRAC information system  Documentation sources  DIRAC users and groups  Registration with DIRAC  Getting.
Metadata Mòrag Burgon-Lyon University of Glasgow.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
Glite. Architecture Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed.
Managing Data DIRAC Project. Outline  Data management components  Storage Elements  File Catalogs  DIRAC conventions for user data  Data operation.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
David Adams ATLAS ATLAS-ARDA strategy and priorities David Adams BNL October 21, 2004 ARDA Workshop.
Module 6: Administering Reporting Services. Overview Server Administration Performance and Reliability Monitoring Database Administration Security Administration.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
FP6−2004−Infrastructures−6-SSA Enabling Grids for E-sciencE The AMGA Metadata Catalog Introduction and hands-on exercises Nuno Santos.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
FESR Consorzio COMETA - Progetto PI2S2 The AMGA Metadata Catalog with use cases Salvatore Scifo, Tony Calanducci INFN Catania Grid.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) AMGA metadata catalogue and high level API Andrea Cortellese
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid based telemedicine application
Jean-Philippe Baud, IT-GD, CERN November 2007
gLite Basic APIs Christos Filippidis
AMGA - Official Metadata Service for EGEE
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Security and Replication of Metadata with AMGA
Cross-health enterprises Medical Data Management on the EGEE grid
Medical Data Manager use case: 3D medical images analysis workflow.
Metadata Services on the GRID
AMGA Web Interface Salvatore Scifo INFN sez. Catania
Alice Off-line Week, February 24th, 2005
GSAF Grid Storage Access Framework
PHP / MySQL Introduction
GSAF Grid Storage Access Framework
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
AMGA Web Interface Vincenzo Milazzo
The AMGA metadata catalog
EGEE Middleware: gLite Information Systems (IS)
Metadata Services on the GRID
Presentation transcript:

Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides based on: “AMGA metadata catalog with use cases” by Tony Calanducci

Outline  Background and Motivation for AMGA  Interface, Architecture and Implementation  Metadata Replication/Federation on AMGA  Use cases

 ARDA proposed an interface for Metadata access on the GRID  Based on requirements of LHC experiments  Designed jointly with the gLite/EGEE team  Adopted as the official EGEE Metadata Interface  Endorsed by PTF (Project Technical Forum of EGEE)  Released on December 07 in gLite 3.1(update 10)  All the release process was made by HPCL - University of Cyprus testing, test scripts, automatic configuration scripts, preparation for gLite environment  Initial release: glite-AMGA_postgres  Upcoming release(April 08): glite-AMGA_oracle now in preproduction services  Releases are officially supported by EGEE Since the first release Arda Metadata Grid Application (history)

Metadata on the GRID  Metadata is data about data  e.g. On a Data Grid: information about files  Describe files  Locate files based on their contents  AMGA makes DB access a simple task on the Grid  Many Grid applications need structured data  Many applications require only simple schemas Can be modelled as metadata  Main advantage: better integration with the Grid environment Metadata Service is a Grid component Grid security Hide DB heterogeneity

Metadata user requirements  I want to  store some information about files In a structured way  query a system about those information  keep information about jobs I want my jobs to have read/write access to those information  have easy access to structured data using my proxy certificate  NOT use a database

AMGA Features  Dynamic Schemas  Schemas can be modified at runtime by client Create, delete schemas Add, remove attributes  Metadata organised as an hierarchy  Collections can contain sub-collections  Analogy to file system: Collection  Directory; Entry  File; attribute  inode information  Flexible Queries  SQL-like query language  Joins between schemas  Example QUERY EXAMPLE: selectattr /gLibrary:FileName /gLibrary:Author ‘/gLibrary:FILE=/gLAudio:FILE \ and like(/gLibrary:FileName,“%.mp3")‘

Metadata Concepts  Some Concepts in AMGA:  Metadata - List of attributes associated with entries  Attribute – key/value pair with type information Type – The type (int, float, string,…) Name/Key – The name of the attribute Value - Value of an entry's attribute  Schema – A set of attributes  Entry – Lives in a schema – assigns values to attributes  Collection – A set of entries associated with a schema  Think of schemas as tables, attributes as columns, entries as rows

AMGA data organization  Relational schema  AMGA(hierarchy) /HOSPITAL/ PATIENTS/ DOCTORS/ john george #namesicknessage johnmalaria68 georgeotitis84 sicknessotitis age84 Attributes Entries Schema/Directory TABLE: PATIENTS #name PATIENTS DOCTORS TABLE: HOSPITAL Collection #type people_group

AMGA Implementation  C++ multiprocess server  Runs on any Linux flavour  Backends  Oracle, MySQL, PostgreSQL, SQLite  Two frontends  TCP Streaming High performance Client API for: C++, Java, Python, Perl, Ruby  SOAP Interoperability  Also implemented as standalone Python library  Data stored on filesystem

AMGA Security  Unix style permissions  user-group-others (e.g. rwxr--r--)  ACLs – per-collection or per-entry.  Secure connections – SSL  Client Authentication based on  Username/password  General X509 certificates  Grid-proxy certificates  Access control via a Virtual Organization Management System (VOMS)

Accessing AMGA  TCP Streaming Front-end  mdcli & mdclient and C++ API (md_cli.h, MD_Client.h)  Java Client API* and command line* (mdjavaclient.sh & mdjavacli.sh)  Python* & PHP* Client API  SOAP Frontend (WSDL)  C++ gSOAP  AXIS (Java)*  ZSI (Python)* * (also under Windows)

Python API example

AMGA Internals – Backend translation  To better understand how AMGA works  Example:  $mkdir /hpcl INSERT INTO schema(id,name) VALUES(“/hpcl”,”dir2”); CREATE TABLE dir2;  $addattr /hpcl id int ALTER TABLE dir2 ADD COLUMN "user:id" integer; AMGADB Backend CollectionsTables EntriesRows AttributesColumns

AMGA Internals – TCP-Streaming  Designed for scalability  Asynchronous operation Reading from DB and sending data to client  Response sent to client in chunks No limit on the maximum response size  Example: TCP Streaming  Text based protocol (like SMTP, POP3,…)  Response streamed to client Client: listattr entry Server: 0 entry value1 value2 …

Metadata Replication 1/2  Motivation  Scalability – Support hundreds/thousands of concurrent users  Geographical distribution – Hide network latency  Reliability – No single point of failure  DB Independent replication – Heterogeneous DB systems  Disconnected computing – Off-line access (laptops)  Architecture  Asynchronous replication  Master-slave – Writes only allowed on the master  Replication at the application level Replicate Metadata commands, not SQL → DB independence  Partial replication – supports replication of only sub-trees of the metadata hierarchy

Metadata Replication 2/2 Full replication Partial replication FederationProxy

Importing existing data  Suppose that you have the data  A reasonable question would be: Can I use my existing database data?? The answer is YES  Importing data to AMGA  Pretty simple  Connect a database to AMGA Execute the import command import table directory Ready to go!

Using AMGA along with an LFC  LFC uses a database backend(commonly MySQL)  AMGA integration on an LFC  Work on LFC’s database  Logical File names in LFC  collections,entries in AMGA  Very nice for managing files & directories Every new file entry is also put into AMGA  BUT  Currently broken feature   The AMGA developers are working on it

Conclusion (uses cases follow)  AMGA – Metadata Service of gLite  Part of gLite 3.1 Officially Supported from EGEE  Useful for simplified DB access  Integrated on the Grid environment Security (voms proxies, globus proxies)  Replication/Federation features  Tests show good performance/scalability  AMGA Web Site 

A generic use case 1. Use Storage Elements for storing files 2. Use LFN’s(Logical File Names) for having a file name (storing them on an LFC) 3. Use AMGA to store metadata about files 4. Query AMGA using complex queries about files  I want all files that have: type=image AND size > 6kb AND description LIKE “%breast%cancer%” 5. Use results to retrieve only specific files

AMGA usage examples  Biomed: Medical Data Manager  Deployed on EGEE production grid  gMOD  Deployed on GILDA

Biomed: Medical Data Manager Store and access medical images exploiting metadata on the Grid  Strong security requirements  Patient data is sensitive  Data must be encrypted  Metadata access must be restricted to authorized users  AMGA used as metadata server  Demonstrates authentication and encrypted access  Used as a simplified DB  NO ENCRYPTION on DB Backend – Anyone interested?  More details at: 

gMOD: grid Movie On Demand  gMOD provides a Video-On-Demand service  User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation  For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes  Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed.

gMOD screenshot gMOD is accesible through the Genius Portal ( Selecting from left side menu: VO Services/gMOD

gMOD under the hood  Built on top of gLite services + GENIUS web portal:  Storage Elements, sited in different places, physically contain the movie files  LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located  AMGA is the repository of the detailed information for each movie, and makes possible queries on them  The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users  The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop

gMOD interactions VOMS LFC Catalogue Metadata Catalogue WNWN WN CE Storage Elements User Genius Portal Workload Management System get Role AMGA

The End Questions - Discussion

Backup Slides

AMGA Web Interface

Metadata Schema Management

Entry Management

ACL Management

QBE like Query Engine

Query Result