FESR Consorzio COMETA - Progetto PI2S2 GSAF Grid Storage Access Framework Salvatore Scifo Consorzio Cometa, PI2S2 Project Tutorial per Utenti Palermo (Italy), Dic 10th - 12th 2007
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Outline The Context –TRIGRID Virtual Laboratory –ADAT –PI2S2 Data Grid Oriented Application –GRID Web Integration –GRID Oriented Digital Repository –Classic Application Vs Grid Application –Designers point of view –Users point of view –GSAF Solution –GSAF Architecture –GSAF Validation Tool Summary & Outlook
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Context TriGrid VL (Trinacria Grid Virtual Laboratory) –Regional Project Small and Medium Enterprises INFN – Istituto Nazionale Fisica Nucleare ADAT (Archivi Digitali Antico Testo) –Regional Project Small and Medium Enterprises Diocesan museum of Catania Department of Physic of the University of Catania Consorzio Cometa –PI2S2 project Small and Medium Enterprises UniCT, UniPA, UniMe
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th GRID –Grid computing : several and heterogeneous computational resource are combined together to solve problems which are not easy to solve with traditional system (also if computer clusters). –Grid Storage : several and heterogeneous storage devices and technologies are combined together to store and delivery several millions of data (Peta bytes order). GRID Computing GRID Storage What GRID is?
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Think of GRID as Digital Repository Storage Virtualization –Unique and uniform interface to manage DATA provided by the grid middleware –Unique and uniform interface to manage METADATA provided by the grid middleware –Large and numerous file handling capability also in a geographic distributed environment –Ubiquity: data access independently by their location. Security capabilities –Centralized access control mechanism based on x.509 certificates and user roles according to Virtual Organization policies that users belong to. Availability, Scalability, Fault Tolerance.
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Grid Web Integration Process Designing and developing Web Application on the Grid is not so easy. Main problems –simple systems that allow user to manage content for generic applications (e.g. web portal, digital libraries, authoring systems, …) are missing. –application developer needs specific technical knowledge about Grid Services –design patterns to help software engineers missing Developers must implement the same code anytime What about “write once use anywhere”? –independence of Grid Data Services –atomicity, coherence and synchronization of sequential operations on data missing
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Classic Web Application Data Presentation Layer consists of all graphical interfaces that make user able to interact with application Data Business Layer collects all software components that implement the behavior of the given application Data Access Layer is made up by software components that allow application to manage data (ascii files, xml files, digital object, metadata, SQL data) Classic Data Access Layer components interact to several types of data sources –File System (for data stored into files) –Relational Database Management System (for data organized into SQL tables)
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Grid Web Application Grid environment porting aspects –files are stored inside a Storage Element (SE) –files can be replicated on several SEs for ubiquity, security and sharing needs; relationship among locations of files, replicas and theirs identifiers are kept within a specific File Catalogue Service –for each file is possible to associate descriptive metadata arranged through a specific Metadata Catalogue Service Technical Approach –replace Data Access Layer with an appropriate interface that permits: business components to manage data stored within the DMS presentation objects to search and retrieve data from DMS
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th GSAF Functional Requirement Managing Metadata Schemas Managing ACLs to access Metadata Managing ACLs to access Data Uploading file to the SE ( coherence ) Deleting file ( coherence ) Browsing Metadata Catalogue \ File Catalogue Search file by Metadata
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Designing troubles Development of applications (web or desktop) is not easy This fragmentariness forces software engineers to consider a vertical architecture Every Application must take care itself about the atomicity, coherence and synchronization of data manipulation –Grid Data Services are independent each from others –They work in a “stand a lone” mode –Any kind of coherence is ensured
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th GSAF Solution Object Oriented Framework –built on top of the Grid Metadata Service and Grid Data Service –exposes classes and related methods for applications located above Main objective –hiding the complexity and the fragmentation of the several underlying APIs –grouping functional requirements shared among applications –ensuring atomicity among different data manipulation –Synchronization Data/Metadata Catalogues sub tree
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th GSAF System Architecture The core of the application is designed to be a plug-in. Its design covers several Object Oriented Design Patterns (Singleton, Strategy method, Factory method, Template Method, Iterator and Composite). This ensures a very clean and simple software architecture with an high degree of cohesion and decoupling. Built on top of Data Management Services of the Grid Middleware
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Class Diagram Object Model Logical Entities /Java Object mapping Iterator Pattern Factory Method Pattern Command Pattern Template Method Pattern
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Users troubles User can use only Command Line Tools These tools are installed on specific machines called User Interface (UI) and located inside the Grid network boundaries Users get several problem about net access User has a personal UI –who does control and ensure UIs security? –The user self ?!?!?!? All logical relationships among data and metadata must be kept in his mind
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th GSAF Web Interface vs Command Line GSAF Web Interface to manage data and their metadata remotely –Initially, the main target of this application was to be a natural tester of the framework functionalities –Now, it represents a useful tool to administrate the Grid Storage through internet Why a web interface? –Web Interface is the easiest approach –for new users which don’t have specific knowledge of the Grid environment. –no syntax rules are required and users don’t loose the high level view of data neither of metadata schemas. –immediate interaction thanks to comfortable and friendly driven procedures that make training and learning faster. –web application needs only a simple internet connection than it avoids any dependencies from the Grid UI machines.
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th ADAT project
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Use Cases ADAT Project –embeds GSAF within the Digital Archive Software Aiuri (Project COPPE/UFRJ - BRAZIL) –aims to implement Grid Oriented platform to support data and text mining applications. BM Portal project (Bio-Lab, DIST University of Genoa ) –embeds GSAF framework as a plug-in GILDA Team –adopts the GSAF web interface for dissemination and training purposes.
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th T-GSAF coming soon Serialization of multiple operation Transaction pattern –execute() –commit() –rollback()
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Outlook File Replica support VOMS Integration –To allow users to get proxy starting from their certificate ACLs at Disk Pool Manager Level –for coherence between File Catalogue permissions and DPM permissions
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Conclusions GSAF means –Useful API to develop Grid Storage based applications –Useful and simple web interface to access Data Management Services remotely extreme flexible, multi platform and multi user –to be a cross domain application plug-in comfortable usage of the Web Interface –to be a simple Content Management Tool to manage data remotely candidate at the EGEE Respect Program –to become a recommended external software for the EGEE middleware
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th References GSAF wiki pages – Amga Web Interface wiki pages – AMGA Service and Java API – GFAL Java API – – LFC Java API – IR&T engineering s.r.l. – COMETA –
Tutorial per utenti - Palermo (Italy), Dic 10th - 12th Questions… Than you very much for your kind attention!