Security and Replication of Metadata with AMGA

Slides:



Advertisements
Similar presentations
Remote Visualisation System (RVS) By: Anil Chandra.
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Data Management Expert Panel. RLS Globus-EDG Replica Location Service u Joint Design in the form of the Giggle architecture u Reference Implementation.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007.
Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
3.1 © 2004 Pearson Education, Inc. Exam Managing and Maintaining a Microsoft® Windows® Server 2003 Environment Lesson 3: Introducing Active Directory.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE The Medical Data Manager : the components Johan Montagnat, Romain Texier, Tristan.
Marianne BargiottiBK Workshop – CERN - 6/12/ Bookkeeping Meta Data catalogue: present status Marianne Bargiotti CERN.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
Database Systems: Design, Implementation, and Management Tenth Edition Chapter 12 Distributed Database Management Systems.
The LCG File Catalog (LFC) Jean-Philippe Baud – Sophie Lemaitre IT-GD, CERN May 2005.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America AMGA Server Installation Tony Calanducci.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
Usenix Annual Conference, Freenix track – June 2004 – 1 : Flexible Database Clustering Middleware Emmanuel Cecchet – INRIA Julie Marguerite.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
ALICE, ATLAS, CMS & LHCb joint workshop on
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
The Advanced Data Searching System The Advanced Data Searching System with 24 February APCTP 2010 J.H Kim & S. I Ahn & K. Cho on behalf of the Belle-II.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Replica Management Services in the European DataGrid Project Work Package 2 European DataGrid.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
INFSO-RI Enabling Grids for E-sciencE Experiences with LFC and comparison with RNS Erwin Laure Jean-Philippe.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Implementation and performance analysis of.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
Database authentication in CORAL and COOL Database authentication in CORAL and COOL Giacomo Govi Giacomo Govi CERN IT/PSS CERN IT/PSS On behalf of the.
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
1 Active Directory Service in Windows 2000 Li Yang SID: November 2000.
DGC Paris Spitfire A Relational DB Service for the Grid Leanne Guy Peter Z. Kunszt Gavin McCance William Bell European DataGrid Data Management.
FP6−2004−Infrastructures−6-SSA Enabling Grids for E-sciencE The AMGA Metadata Catalog Introduction and hands-on exercises Nuno Santos.
AMGA-Bookkeeping Carmine Cioffi Department of Physics, Oxford University UK Metadata Workshop Oxford, 05 July 2006.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
Replicazione e QoS nella gestione di database grid-oriented Barbara Martelli INFN - CNAF.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Data Management & Information Systems
October 2014 HYBRIS ARCHITECTURE & TECHNOLOGY 01 OVERVIEW.
INFSO-RI Enabling Grids for E-sciencE ESR Database Access K. Ronneberger,DKRZ, Germany H. Schwichtenberg, SCAI, Germany S. Kindermann,
EGEE Data Management Services
Grid based telemedicine application
Jean-Philippe Baud, IT-GD, CERN November 2007
LCG Storage Management Workshop, CERN, 7th April 2005
gLite Basic APIs Christos Filippidis
AMGA - Official Metadata Service for EGEE
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Cross-health enterprises Medical Data Management on the EGEE grid
Open Source distributed document DB for an enterprise
Metadata Services on the GRID
AMGA Web Interface Salvatore Scifo INFN sez. Catania
(ITI310) SESSIONS 6-7-8: Active Directory.
LCG Distributed Deployment of Databases A Project Proposal
Alice Off-line Week, February 24th, 2005
GSAF Grid Storage Access Framework
New developments on the LHCb Bookkeeping
POOL/RLS Experience Current CMS Data Challenges shows clear problems wrt to the use of RLS Partially due to the normal “learning curve” on all sides in.
Grid Data Integration In the CMS Experiment
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
AMGA Web Interface Vincenzo Milazzo
The AMGA metadata catalog
Metadata Services on the GRID
Presentation transcript:

Security and Replication of Metadata with AMGA B. Koblitz, IT-PSS with N. Santos, EPFL Lausanne EGEE User Forum, Manchester May 10th, 2007

Overview What is AMGA? What is Metadata on the Grid? Security in AMGA EGEE applications using AMGA Replication of Metadata with AMGA Security infrastructure for replicated Metadata Use Case: Health-e-Child Conclusions

What is AMGA, what Metadata? AMGA is the Metadata catalogue of gLite 1.5, back in 3.1 Metadata is relationally structured data for grid jobs (lives normally in databases) AMGA works in 2 modes: Side-by-Side a File Catalogue (LFC): File Metadata Standalone: General relational data on Grid So why can I not access DBs directly on the Grid? You can, but what about Authentication (Grid-Proxy certificates, VOMS)? Logging, tracing? Connection pooling? AMGA brings Grid-Idea to relational DBs AMGA hides DB differences AMGA allows replication and (some) federation of data AMGA has fine-grained access control to entries based on ACLs

Basic Concepts Schema (directory) Has hierarchical name and list of attributes /prod/events Attributes Have name and storage type Interface handles all types as strings Entry Live in a schema, assign values to attributes Query SELECT ... WHERE ... clause in SQL-like query language Examples createdir /jobs addattr /jobs jobStatus int addentry /jobs/job1 jobStatus 0 updateattr /jobs jobStatus 1 jobID>100 selectattr /DLibrary:FileName /DLAudio:Author /DLAudio:Album '/DLibrary:FILE=/DLAudio:FILE and like(/DLibrary:FileName, “%.mp3")‘

What can AMGA do? AMGA features: SOAP and Text frontends Streamed Bulk Operations Supports single calls, sessions & connections SSL security with grid certs Own User & Group management + VOMS PostgreSQL, Oracle, MySQL, SQLite backends Can access existing DBs Query parser supports good fraction of SQL: Supports complex queries: joins, math. functions (extensible) Abstracts DB data types Checks access permissions per directory/entry via ACLs

Security in AMGA AMGA has built-in user and group management SSL connections for secure connections (optional): session management for performance Grid proxy and VOMS supported (optional) Credentials are mapped to internal user Allows per entry or per table (faster) ACLs Views allow per-attribute restrictions

EGEE experience with Metadata Medical Data Management Climate Research AMGA Metadata Catalogue High Energy Physics Digital Library

Geographic Metadata UnoSat prototype uses AMGA to store GIS (Geographic Information System) Metadata for images

HEP: LHCb L & B LHCb uses AMGA to centrally store the entire file provenance information from jobs processing the data 100 Million entries required (successfully tested!) 150GB data 100 000 entries/day insert rate expected 10 entries/second read-rate Main challenges are reliablity, performance and size Use ORACLE RAC server as backend Production software access via Java (JSP) User (read) access: Python (inc. browser)

Replication in AMGA AMGA integrates replication of metadata Asynchronous replication: Ideal for WAN DBs are consistent (transactions supported) However: Not all DBs necessarily in same state Replication makes use of hierarchical table structure Global table tree Different masters for sub-trees Only one master per table! Writes only allowed on master. Top-level master controls users/groups hold information about participating DBs Master B Top-Level Master Master A

Repliction and Security AMGA integrates replication of metadata with security: Metadata and user/group configuration replicated separately Allows to have local users, different credentials Metadata and ACLs are replicated separately: Allows replicated metadata to be owned by single user

Replication Benchmarks AMGA keeps logs for disconnected slaves Reconnected slaves are brought up-to-date automatically Fast recovery Scalability test Setup: 1 master 10 slaves Inserts at 90/s 10% CPU overhead for 10 slaves

Replication in Health-e-Child Several dozens of hospitals providing case-data Central server with credentials for participating sites and users (replication mandatory) Data replicated from site to site on demand New sites need to be registered in base AMGA server 'Automount' mechanism for joining sites Site AMGA Base Config /configuration /siteA/patients /siteB/patients ... /common/patients } Union View First Prototype works well!

Summary AMGA metadata service in gLite 1.5, back in 3.1 AMGA provides Grid Layer to relational databases: Abstraction of different DB vendors Efficient LAN/WAN access Fast X509 Grid security, VOMS integration Rich set of features: Transactions, Views, Sequences, Complex Joins.... Building block for distributed DBs on Grid: Asynchronous replication Tight integration of Grid security and replication Used by several EGEE applications Health-e-Child prototype makes extensive use of replication AMGA Web Site: http://cern.ch/amga

Replication & Federation Modes AMGA replication makes use of hierarchical concept: Partial replication Full replication Federation Proxy

Performance Performance required to be comparable to direct DB access by HEP applications Lean C++ Implementation Fast TCP text streaming protocol, very fast SSL sessions 1e+06 AMGA 1000 rows JDBC 1000 rows AMGA 1 row JDBC 1 row 100000 10000 1000 100 Throughput [entries/s] 1 # clients 10 100 Throughput comparison between AMGA and direct access via JDBC reading same table on a LAN

WAN Performance Comparison with FC protocols, connection from Taiwan: Inserts Queries 1 10 100 1000 Throughput [entries/sec] # clients 100000 10000 AMGA Single AMGA Bulk 100 LFC FM Bulk 100 AMGA Single Entry FM Bulk 1000 Comparison with FC protocols, connection from Taiwan: 300ms latency dominates performance Reduce round-trips with sessions or holding connections (Streamed) bulk operations vital for WAN performance LFC & FM measurements: C. Munro

LAN Performance Protocol comparison with LFC and FiReMan catalogues: Inserts Queries AMGA Single 100000 AMGA Single Entry AMGA Bulk 100 AMGA Bulk 100 LFC LFC FM Bulk 100 FM Bulk 1000 1000 10000 1000 100 100 10 Throughput [entries/sec] 1 10 100 1 10 100 # clients # clients Protocol comparison with LFC and FiReMan catalogues: Authentication with X509 Certs, SSL connections LFN/GUID pairs inserted, query for GUID of LFN, Oracle DB AMGA scales very well up to 100 concurrent client Streamed bulk inserts/queries are very fast! LFC & FM measurements: C. Munro Measurements 2005