Www.consorzio-cometa.it FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.

Slides:



Advertisements
Similar presentations
EIONET Training Beginners Zope Course Miruna Bădescu Finsiel Romania Copenhagen, 27 October 2003.
Advertisements

Chapter 10: Designing Databases
Data Management Expert Panel - WP2. WP2 Overview.
FP7-INFRA Enabling Grids for E-sciencE EGEE Induction Grid training for users, Institute of Physics Belgrade, Serbia Sep. 19, 2008.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
E-science grid facility for Europe and Latin America A Data Access Policy based on VOMS attributes in the Secure Storage Service Diego Scardaci.
The AMGA metadata catalog Riccardo Bruno - INFN Madrid, 07-11/05/2007.
Asterios Katsifodimos Saturday, May 23, 2015 High Performance Computing systems Lab University of Cyprus The AMGA metadata catalog – An Overview Slides.
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
FutureGrid Image Repository: A Generic Catalog and Storage System for Heterogeneous Virtual Machine Images Javier Diaz, Gregor von Laszewski, Fugang Wang,
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Presented by Mina Haratiannezhadi 1.  publishing, editing and modifying content  maintenance  central interface  manage workflows 2.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Understanding Active Directory
Makrand Siddhabhatti Tata Institute of Fundamental Research Mumbai 17 Aug
MCTS Guide to Configuring Microsoft Windows Server 2008 Active Directory Chapter 3: Introducing Active Directory.
A Scalable Application Architecture for composing News Portals on the Internet Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta Famagusta.
CST203-2 Database Management Systems Lecture 2. One Tier Architecture Eg: In this scenario, a workgroup database is stored in a shared location on a single.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America The AMGA metadata catalog with use cases.
2. Database System Concepts and Architecture
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
QCDGrid Progress James Perry, Andrew Jackson, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
1 Introduction to Microsoft Windows 2000 Windows 2000 Overview Windows 2000 Architecture Overview Windows 2000 Directory Services Overview Logging On to.
INFSO-RI Enabling Grids for E-sciencE Distributed Metadata with the AMGA Metadata Catalog Nuno Santos, Birger Koblitz 20 June 2006.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Ocean Observatories Initiative Data Management (DM) Subsystem Overview Michael Meisinger September 29, 2009.
INFSO-RI Enabling Grids for E-sciencE AMGA Metadata Server - Metadata Services in gLite (+ ARDA DB Deployment Plans with Experiments)
Enabling Grids for E-sciencE EGEE-III INFSO-RI I. AMGA Overview What is AMGA Metadata Catalogue of EGEE’s gLite 3.1 Middleware Main Feature of.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks AMGA PHP API Claudio Cherubino INFN - Catania.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
EGEE User Forum Data Management session Development of gLite Web Service Based Security Components for the ATLAS Metadata Interface Thomas Doherty GridPP.
Digital Libraries1 David Rashty. Digital Libraries2 “A library is an arsenal of liberty” Anonymous.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Enabling Grids for E-sciencE EGEE-II INFSO-RI Medical Data Manager 1 Dicom retrieval : overview of the DPM One command line to retrieve a file:
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
FP6−2004−Infrastructures−6-SSA Enabling Grids for E-sciencE The AMGA Metadata Catalog Introduction and hands-on exercises Nuno Santos.
INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE.
FESR Consorzio COMETA - Progetto PI2S2 GSAF Grid Storage Access Framework Salvatore Scifo
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
FESR Consorzio COMETA - Progetto PI2S2 The AMGA Metadata Catalog with use cases Salvatore Scifo, Tony Calanducci INFN Catania Grid.
The AstroGrid-D Information Service Stellaris A central grid component to store, manage and transform metadata - and connect to the VO!
Miguel Ángel Saúl Soto INFN - Sezione di Catania Supervisor: Antonio Calanducci
FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA GRIDICOM G. Foti 1), S. Scifo 2), R. Barbera 3), F. Portuese 4), S. Parisi 5)
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Grid based telemedicine application
Grid2Win Porting of gLite middleware to Windows XP platform
AMGA - Official Metadata Service for EGEE
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
Security and Replication of Metadata with AMGA
Open Source distributed document DB for an enterprise
AMGA Web Interface Salvatore Scifo INFN sez. Catania
CHAPTER 3 Architectures for Distributed Systems
GSAF Grid Storage Access Framework
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
GSAF Grid Storage Access Framework
Database System Concepts and Architecture
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Plovdiv, Bulgaria,
AMGA Metadata Service Vladimir Dimitrov IPP-BAS “gLite middleware Application Developers Course”, Sofia, Bulgaria,
AMGA Web Interface Vincenzo Milazzo
The AMGA metadata catalog
Serpil TOK, Zeki BAYRAM. Eastern MediterraneanUniversity Famagusta
Presentation transcript:

FESR Consorzio COMETA - Progetto PI2S2 AMGA Official Metadata Service for EGEE Salvatore Scifo – Consorzio Cometa - Catania, ITALY Tutorial per Utenti dell’Università di Palermo Palermo, Italy 10th – 12th Dicembre 2007

Tutorial per utenti – Palermo - 10th-12th dicembre Contents Background and Motivation for AMGA Interface, Architecture and Implementation Metadata Replication with AMGA Web Interface to access AMGA remotely

Tutorial per utenti – Palermo - 10th-12th dicembre Why Grid needs Metadata? Grids often contain millions of files spread over several storage sites. Users and applications need an efficient mechanism –to find the files of their interest –to discover and query information about their contents This is provided –by associating descriptive attributes (metadata) to files –by exposing this information in catalogues, accessible and searchable by user and client application

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata service requirements Metadata service must expose a complete but simple interface, in order to make all users able to use it easily. It should be flexible and support dynamic schemas in order to serve many (all is wished) application domains. The service must also allow structured and hierarchical metadata in order to implement any logical collections. Collection refers metadata grouped by any logical entity meaning. (for example, a collection can describe all file video in any encoded format).

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata service requirements It must be designed with scalability in mind in order to deal with the large number of entries (several millions). security is required to provide different access levels to different users. Quality of service has to ensure –Hide network latency – Improved performance for WAN clients –Disconnected computing – Local replicas for off-line access (laptops) –DB Independent replication – GRID environment is heterogeneous –Improve reliability and scalability – No single point of failure

Tutorial per utenti – Palermo - 10th-12th dicembre What AMGA is? AMGA is a metadata service for the Grid –It represents a database access service for Grid applications which allows user, and user jobs to discovery data describing their files in order to access them in the appropriate way. AMGA is a service based on RDBMS. –It allows to define metadata schemas according to users and applications needs –It provides a replication layer which makes databases locally available to user jobs and replicate the changes between the different participating databases. AMGA has been designed to provide a best integration with the Grid environment –Metadata Service is a Grid component –Grid security compliant –Hide DB heterogeneity

Tutorial per utenti – Palermo - 10th-12th dicembre AMGA Features Dynamic Schemas –Schemas can be modified at runtime by client  Create, delete schemas  Add, remove attributes Metadata organised as an hierarchy –Schemas can contain sub-schemas –Analogy to file system:  Schema  Directory; Entry  File Flexible Queries –SQL-like query language –Joins between schemas are supported

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata Concepts To better understand how AMGA works think of –schema  database schema –collection  table –attribute  column –entry  row AMGA Metadata is list of attributes associated with entries according to a user defined schema. Schema is a set of attributes Entry is the abstraction of directory/file mapped by the metadata server Collection is a set of entries associated with a schema

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata Concepts Attribute – typed key/value pair associated with entries –Type – The type (int, float, string,…) –Name/Key – The name of the attribute –Value - Value of an entry's attribute Analogy Examples >createdir /jobs ( create table jobs ) >addattr /jobs jobStatus int ( alter table jobs add column jobStatus int ) >addentry /jobs/job1 jobStatus 0 ( insert into jobs (jobstatus) values(1) ) >updateattr /jobs jobStatus 1 jobID>100 ( update jobs set jobStatus=1 where JobID>100 )

Tutorial per utenti – Palermo - 10th-12th dicembre AMGA Datatypes Using the above datatypes you are sure that your metadata can be easily moved to all supported back-ends If you do not care about DB portability, you can use, in principle, as entry attribute type ALL the datatypes supported by the back-end, even the more esoteric ones (PostgreSQL Network Address type or Geometric ones)

Tutorial per utenti – Palermo - 10th-12th dicembre AMGA Implementation C++ multiprocess server –Backends  Oracle, MySQL, PostgreSQL, SQLite –Front Ends  TCP Streaming High performance Client API for C++, Java, Python, Perl, Ruby  SOAP (web services) Interoperability Scalability Standalone Python Library implementation –Data stored on file system

Tutorial per utenti – Palermo - 10th-12th dicembre Security Access control –All entries in a directory sharing the same ACL –Groups of users are also supported (Unix style permissions) Secure connections – SSL –Provided by web services Client Authentication is based on –Username/password –General X509 certificates –Grid-proxy certificates (VOMS - Virtual Organization Management System is supported)

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata Replication I AMGA provides an replication/federation mechanisms Motivation –Scalability – Support hundreds/thousands of concurrent users –Geographical distribution – Hide network latency –Reliability – No single point of failure –DB Independent replication – Heterogeneous DB systems –Disconnected computing – Off-line access (laptops) Models –Asynchronous replication –Master-slave – writes only allowed on the master –Application level replication  Replicate Metadata commands –Proxy  The proxy forwards metadata only (works as a front end)

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata Replication II Full replication Partial replication FederationProxy

Tutorial per utenti – Palermo - 10th-12th dicembre Conclusion AMGA – Metadata Service of gLite –Part of gLite 3.1 –Useful to realize simple Relational Schemas –Integrated on the Grid Environment (Security) Tests show good performance/scalability Already deployed by several Grid Applications –LHCb, ATLAS, Biomed, … AMGA Web Site

Tutorial per utenti – Palermo - 10th-12th dicembre Use case: Biomed Medical Data Manager – MDM –Store and access medical images and associated metadata on the Grid –Built on top of gLite 1.5 data management system –Demonstrated at last EGEE conference (October 05, Pisa) Strong security requirements –Patient data is sensitive –Data must be encrypted –Metadata access must be restricted to authorized users AMGA used as metadata server –Demonstrates authentication and encrypted access –Used as a simplified DB More details at –

Tutorial per utenti – Palermo - 10th-12th dicembre gMOD: grid Movie On Demand gMOD provides a Video-On-Demand service User chooses among a list of video and the chosen one is streamed in real time to the video client of the user’s workstation For each movie a lot of details (Title, Runtime, Country, Release Date, Genre, Director, Case, Plot Outline) are stored and users can search a particular movie querying on one or more attributes Two kind of users can interact with gMOD: TrailersManagers that can administer the db of movies (uploading new ones and attaching metadata to them); GILDA VO users (guest) can browse, search and choose a movie to be streamed.

Tutorial per utenti – Palermo - 10th-12th dicembre gMOD under the hood Built on top of gLite services: Storage Elements, sited in different place, physically contain the movie files LFC, the File Catalogue, keeps track in which Storage Element a particular movie is located AMGA is the repository of the detailed information for each movie, and makes possible queries on them The Virtual Organization Membership Service (VOMS) is used to assign the right role to the different users The Workload Management System (WMS) is responsible to retrieve the chosen movie from the right Storage Element and stream it over the network down to the user’s desktop or laptop

Tutorial per utenti – Palermo - 10th-12th dicembre 2007 gMOD interactions VOMS LFC File Catalogue Metadata Catalogue WNWN WN CE Storage Elements User GENIUS Portal Workload Management System get Role AMGA

Tutorial per utenti – Palermo - 10th-12th dicembre gMOD screenshot gMOD is accesible through the Genius Portal (

Tutorial per utenti – Palermo - 10th-12th dicembre gLibrary - Multimedia CMS Motivations –Huge amounts of data can be saved on SEs, but how can we easily find later a file that we need?  (if you have good memory, its GUID could be a solution but it is not so easy)  File Catalogues just let us to arrange files in folders and subfolders, no way to query on their contents  Metadata Catalogues are a possible solution, but not always “affordable” especially for non expert users (powerful but complex to use) Requirements –easy to use, fast, secure, extensible –Multimedia files  Images  Movies  Audio Files  Office Documents (Powerpoint, Word, Excel, OpenOffice)  s, PDFs, HTMLs  Customized versions of well-know document type (ex. EGEE PPTs)

Tutorial per utenti – Palermo - 10th-12th dicembre Usage scenarios Example 1: –Locate all theoretical PowerPoint presentations (Type) about FireMan (Keywords) written in 2005 (Date); –Find all the movies (Type) in which Julia Roberts (Cast) performed together with Hugh Grant (Cast) produced in USA (Country) in 2004 (ReleaseDate); –Find all the audio files (Type) in mp3 (Format) of Alanis Morissette (Singer) that last more than 3 minutes (Runtime). Example 2: –A doctor is looking for brain (keyword) DICOM (Type) images of male (Gender) patients older than 65 (Age). Example 3: –A job can work as a storage crawler: it scans pre-existing files in Storage Elements to extract relevant metadata that will be published on gLibrary for further data mining.

Tutorial per utenti – Palermo - 10th-12th dicembre gLibrary prototype implementation It is built on top of many gLite grid services: a Metadata Catalogue + File Catalogue + Storage Elements The SEs to contain Files The File Catalogues (LFC and/or FiReMan) to map files locations The Metadata Catalogue (AMGA) to store and organize metadata in order to provide information about their type and contents. gLibrary defines the following collections: –/gLibrary contains generic metadata for each entry (main collection) –/gLAudio, /gLImage, /gLVideo, /gLPPT, /EGEEPPT, /gLDoc, … (derived collection for “additional features”) –/gLTypes  It keeps the associations between document types and the names of the collection that contains the “additional features”  It is used by gLibrary to find out where it has to look when new document types are added into the system (extensibility) –/gLKeys is used to store Decryption Keys

Tutorial per utenti – Palermo - 10th-12th dicembre gLibrary Security User Requirements: –a valid proxy with VOMS extensions –VOMS Role and Group needed to be recognized by gLibrary as a contents manager. 3 kinds of users: –gLibraryManager: (s)he can create new content type and allows a generic VO user to become gLibrarySubmitter –gLibrarySubmitters: they can add new entries and define access rights on the entries they create.  Fine-grained permission (reading, writing, listing, decrypting) settings on each entry: whole VO members, VO groups, list of DNs –generic VO users: browse and make queries (on entries they have access to) Basic level of cryptography: –New files saved on SEs can be encrypted beforehand with a symmetric passphrase that will be saved in /gLKeys. Only selected users (that have a specific DN in the subject of their VOMS proxy) can access the passphrase and decrypt the file.

Tutorial per utenti – Palermo - 10th-12th dicembre Use case: ADAT “ Archivio Digitale Antichi Testi ” –It represents a Process Model built on: Methodologies Technologies Procedures Hardware and Software This model aims to preserve and deliver the true value of the antique manuscript also towards its own virtual representation

Tutorial per utenti – Palermo - 10th-12th dicembre Digital Archive Aspects STORAGE –Handling 5 Tera Byte regarding digital representations of antique manuscripts (Storage GRID). METADATA –“Translation” and Integration of standard metadata schema for antique manuscripts on Grid Metadata Service (AMGA). SERVICE –Implementation of a Web Oriented Application which interfaces Data Grid Services through a framework developed to hoc (GSAF). –Demanding management aspects for both net infrastructure and storage system (maintenance and security) to the Grid Site Management SECURITY –Centralized access control mechanism based on Virtual Organization roles that users belong to

Tutorial per utenti – Palermo - 10th-12th dicembre ADAT project

Tutorial per utenti – Palermo - 10th-12th dicembre Use Case: ADAT

Tutorial per utenti – Palermo - 10th-12th dicembre AMGA Web Interface

Tutorial per utenti – Palermo - 10th-12th dicembre Web Interface Vs Command Line Command Line –Expert User approach:  user account on a Grid UI is needed  net access to the Grid UI (internet eventually) is needed  troubles with Firewalls and security stuff, VPN solution is required  commands have to be typed, well knowledge of syntax is required  no wizards are present, user has to type all commands  user looses the high level view of its problem Web Interface –User Friendly (Beginner / Intermediate) approach:  no dependence on the Grid UI  only internet accesses is needed  simple and comfortable usage of the service  immediate interaction  fast and simple training and learning using wizards and encapsulated functionalities (no syntax knowledge is required)

Tutorial per utenti – Palermo - 10th-12th dicembre High Level Requirements Group Management –Group list/add/drop –Group membership list –Group ownership list –Add/Remove user and group association User Management –User list/create/delete –User subject change Collection Management –collection tree browse/create/delete –collection ACL management  list group, add group, drop group  change mode for owner/change owner Metadata management –entry listing/searching –entry create/modify/delete –schema management  attribute listing/create/clear/delete

Tutorial per utenti – Palermo - 10th-12th dicembre Standard Multi-layer Architecture These pages publish dynamic contents managed by DHTML and Ajax (Asynchronous JavaScript And XML) libraries. They work with both logic components to perform data manipulation and with access components to retrieve and publish data. Logic application layer is made up by all the software modules that encapsulate the implementation of the provided features (metadata handling and manipulation). Data Access Layer: implements all the software components than ensure the data extraction from the AMGA server. These components work as services invoked by the web pages and they provide a mechanism to retrieve data and publish dynamic content. Data Presentation Layer: consists of all web pages that make users able to access all provided features.

Tutorial per utenti – Palermo - 10th-12th dicembre Software Architecture The core of the application is designed to be a plug-in for general purpose applications that adopt metadata on Grid. Its design covers several Object Oriented Design Patterns (Singleton, Strategy method, Factory method, Template Method, Iterator and Composite). This ensures a very clean and simple software architecture with an high degree of cohesion and decoupling. engine is than generic for any application that needs to integrate Metadata Usage. Every component is built on top the Official AMGA Java API.

Tutorial per utenti – Palermo - 10th-12th dicembre Deployment Plan Application can be deployed on a dedicated server machine located inside the GRID boundaries or outside. Currently the GILDA AMGA Server machine also hosts the web interface. User uses a common Web Browser. Deployed on the GILDA t-Infrastructure Web front-end available at J2EE application Application server runs Apache Tomcat 5.0 on a Fedora Core 5 Linux Machine. Users interact to the catalog through functionalities provided by the web interface.

Tutorial per utenti – Palermo - 10th-12th dicembre Collection Management Modify Schema Instance Delete entry

Tutorial per utenti – Palermo - 10th-12th dicembre Tool Bars Overview new collection new entry bulk upload search entry Go!back to parent“Address” bar type collection name add collection Modify Schema ACL management

Tutorial per utenti – Palermo - 10th-12th dicembre Add Entry

Tutorial per utenti – Palermo - 10th-12th dicembre Modify Entry

Tutorial per utenti – Palermo - 10th-12th dicembre Metadata Schema Management

Tutorial per utenti – Palermo - 10th-12th dicembre ACL Management

Tutorial per utenti – Palermo - 10th-12th dicembre Group Management

Tutorial per utenti – Palermo - 10th-12th dicembre Group Ownership

Tutorial per utenti – Palermo - 10th-12th dicembre Group Membership

Tutorial per utenti – Palermo - 10th-12th dicembre User Management

Tutorial per utenti – Palermo - 10th-12th dicembre User Group Relationship

Tutorial per utenti – Palermo - 10th-12th dicembre Use Cases ADAT Project –embeds engine of AMGA WI within the Digital Archive Software Aiuri (Project COPPE/UFRJ - BRAZIL)‏ –aims to implement Grid Oriented platform to support data and text mining applications.. BM Portal project (Bio-Lab, DIST University of Genoa ) –embeds the engine of AMGAWI as a plug-in GILDA Team –adopts the AMGA Web Interface for dissemination and training purposes. EGEE Respect program –candidate as recommended external software

Tutorial per utenti – Palermo - 10th-12th dicembre Conclusions AMGA WI challenge is to offer –a flexible, multiplatform, secure, reusable and easy-to-use system to handle AMGA metadata for files stored on a distributed Grid infrastructure Flexible –it allows to handle any kind of metadata schema defined within the server Multiplatform –implemented as a Java Web Application can be used on every platforms Secure –GSI compliant (x509 proxy if required) Reusable –The engine can be embedded into bigger application Easy-to-use –its intuitive web interface allows to manage metadata with a just a few mouse clicks

Tutorial per utenti – Palermo - 10th-12th dicembre Questions… Than you very much for your kind attention!