INFSO-RI Enabling Grids for E-sciencE University of Coimbra GSAF Grid Storage Access Framework Salvatore Scifo INFN of Catania EGEE User Forum Manchester, UK - 10th-11th May 2007
Enabling Grids for E-sciencE INFSO-RI Manchester, May Partnership Grid Storage Access Framework –The project is carried out by a cooperation between the INFN and the IR&T engineering s.r.l. (a SME located in Catania, Italy). –The context of this work is the e-Infrastructure Trinacria Grid Virtual Laboratory Project and the ADAT Project (“Archivi Digitali Antico Testo”), aiming at the implementation of a Digital Archive for Cultural Heritage Data (antique manuscripts) with Digital Repository based on EGEE grid services. Resources –INFN S. Scifo A. Calanducci On behalf of the Gilda Team –IR&T engineering ( V. Milazzo A. Magrì
Enabling Grids for E-sciencE INFSO-RI Manchester, May Web Integration Requirements Main objectives of web application –Infrastructure side Organize and handle big amounts of information Share documents among several organizations –Security side Define and apply Access Control Policies –Development side Build application without specific technical knowledge of the adopted infrastructure (high level API) Build and maintain dynamic web content (simple tools to manage repositories for provisioning purposes) –User side Manage Groups and Users (administrative user profiles) Manage Digital Resources (authoring user profiles) Access, search and retrieve file/data easily (end user profiles)
Enabling Grids for E-sciencE INFSO-RI Manchester, May Grid as Digital Repository Repository Virtualization provided by –interfaces to manage DATA –interface to manage METADATA Data Management capabilities –Large and numerous file handling also in distributed environments –Ubiquity: data access independently by their location Security capabilities –Centralized access control mechanism based on x.509 certificates Systems capabilities –Availability, Scalability, Fault Tolerance
Enabling Grids for E-sciencE INFSO-RI Manchester, May Classic Web Application Data Presentation Layer consists of all graphical interfaces that make user able to interact with application Data Business Layer collects all software components that implement the behavior of the given application Data Access Layer is made up by software components that allow application to manage data (ascii files, xml files, digital object, metadata, SQL data) Data Access Layer components interact to several types of data sources –File System (for data stored into files) –Relational Database Management System (for data organized into SQL tables)
Enabling Grids for E-sciencE INFSO-RI Manchester, May Grid Web Application Grid environment porting a spects –files are stored inside a Storage Element (SE) –files can be replicated on several SEs for ubiquity, security and sharing needs; relationship among locations of files, replicas and theirs identifiers are kept within a specific File Catalogue Service –for each file is possible to associate descriptive metadata arranged through a specific Metadata Catalogue Service Technical Approach –replace Data Access Layer with an appropriate interface that permits: business components to manage data stored within the DMS presentation objects to search and retrieve data from DMS
Enabling Grids for E-sciencE INFSO-RI Manchester, May Designers point of view Development of applications (web or desktop) is not easy This fragmentariness forces software engineers and web designers to consider a vertical architecture Application must take care about the atomicity, coherence and synchronization of data manipulation –Grid Data Services are independent each from others –They work in a “stand a lone” mode –Any kind of coherence is ensured
Enabling Grids for E-sciencE INFSO-RI Manchester, May Users point of view User can use only Command Line Tools These tools are installed on specific machines called User Interface (UI) and located inside the Grid network boundaries Users encounter several problem about net access User has a personal UI –who does ensure its security? All logical relationships among data and metadata must be kept in his mind
Enabling Grids for E-sciencE INFSO-RI Manchester, May GSAF solution GSAF is an Object Oriented Framework –built on top of the Grid Metadata Service and Grid Data Service and exposes classes and related methods for applications located above Main objective –hide the complexity and the fragmentation of the several underlying APIs –grouping functional requirements shared among applications –ensure atomicity among different data manipulation
Enabling Grids for E-sciencE INFSO-RI Manchester, May GSAF System Architecture
Enabling Grids for E-sciencE INFSO-RI Manchester, May GSAF Functional Requirement Managing Metadata Schemas Managing ACLs to access Metadata Managing ACLs to access Data Uploading file to the SE Browsing Metadata Catalogue \ File Catalogue Search file by Metadata Deleting file
Enabling Grids for E-sciencE INFSO-RI Manchester, May GSAF Web Interface GSAF Web Interface to manage data and their metadata remotely –Initially, the main target of this application was to be a natural tester of the framework basic functionalities –it represents a useful tool to administrate the Grid Storage through internet Web Interface is the easiest approach –for new users which don’t have specific knowledge of the Grid environment. –no syntax rules are required and users don’t loose the high level view of data neither of metadata schemas. –immediate interaction thanks to comfortable and friendly driven procedures that make training and learning faster. –web application needs only a simple internet connection than it avoids any dependencies from the Grid UI machines.
Enabling Grids for E-sciencE INFSO-RI Manchester, May GSAF Web Interface
Enabling Grids for E-sciencE INFSO-RI Manchester, May ADAT project
Enabling Grids for E-sciencE INFSO-RI Manchester, May Use Cases ADAT Project –embeds GSAF within the Digital Archive Software Physics Department of the University of Catania (PI2S2 project) –aims to implement a Grid Oriented Digital Archive for DICOM images. BM Portal project (Bio-Lab, DIST University of Genoa ) –embeds GSAF framework as a plug-in GILDA Team –adopts the GSAF web interface for dissemination and training purposes.
Enabling Grids for E-sciencE INFSO-RI Manchester, May Outlook File Replica support VOMS Integration ACLs at Disk Pool Manager Level –for coherence between File Catalogue permissions and DPM permissions Transaction Manager –Serialization levels –Transaction pattern Execute() Commit() Rollback() All we need to integrate applications….
Enabling Grids for E-sciencE INFSO-RI Manchester, May Conclusions GSAF means –Useful API to develop Grid Storage based applications –Useful and simple web interface to access Data Management Services remotely extreme flexible, multi platform and multi user –to be a cross application domain plug-in comfortable usage of the Web Interface –to be a simple Content Management Tool to manage data remotely candidate at the EGEE Respect Program –to become a recommended external software for the EGEE middleware
Enabling Grids for E-sciencE INFSO-RI Manchester, May References GSAF wiki pages – Amga Web Interface wiki pages – AMGA Service and Java API – GFAL Java API – – LFC Java API – IR&T engineering s.r.l. – Trigrid VL –