How To Integrate an Application on Grid Valeria Ardizzone INFN Catania Tutorial per i Neo Assunti del progetto PI2S2 Messina, 09-11.01.2007
Contents Basic concepts: VOs, Grids and Grid Application Types of Grid Application How To Integrate an application on Grid Gridifications Levels Installation of application on resources Invocation of applications Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
BASIC CONCEPTS Virtual Organisations Computational Grid Grids JOB Computational Grids Grid Applications Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Virtual Organisation definition A Virtual Organisation is: People from different institutions working to solve a common goal Sharing distributed processing and data resources Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
The four pillars of Grid Computing Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Virtual Organisations and Grids Each grid is an infrastructure enabling one or more “virtual organisations” to share and access resources Each resource is exposed to the grid through an abstraction that masks heterogeneity, e.g. Multiple diverse computational platforms Multiple data resources Resources are usually owned by VO members. Negotiations lead to VOs sharing resources Virtual Organisations Computational Grids Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Grid and Application Development Basic Grid services: AA, job submission, info, … Higher-level grid services (brokering,…) Application toolkits, ….. Application Application development in the Grid implies the exploitation of APIs, tools and environments that provide the four basic Grid capabilities order to perform complex tasks and achieve diverse goals. The extend and approach that the four basic Grid concepts are materialized depends on the specific capabilities of the Grid enabling technologies. Computational Grids Grid Applications Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
VO and Applications What is being shared? resources of storage and/or compute cycles software and/or data Distinct groups of developers and of users? Some VOs have distinct groups of developers and users… Biomedical applications used by clinicians,…. …. Some don’t Physics application developers who share data but write own analyses Virtual Organisations Grid Applications Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Job concept gLite/LCG middleware follows the job submission concept for application execution and resource sharing. A job is a self contained entity that packages and conveys all the required information and artifacts for the successful remote execution of an application. Executable files Input/Output data Parameters Environment Infrastructure Requirements Workflows JOB Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Typical current grid Virtual organisations negotiate with sites to agree access to resources Grid middleware runs on each shared resource to provide – Data services – Computation services – Single sign-on Distributed services (both people and middleware) enable the grid Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
TYPES OF APPLICATIONS Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Simulation Characteristics Needs Jobs are CPU-intensive Large number of independent jobs Run by few (expert) users Small input; large output Needs Batch-system services Minimal data management for storage of results Examples: LHC Monte Carlo simulation, Fusion Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Bulk Processing Characteristics Needs Widely-distributed input data Significant amount of input and output data Needs Job management tools (workload management) Meta-data services More sophisticated data management Examples: HEP processing of raw data, analysis, Earth observation data Processing. Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Responsive Apps (I) Characteristics Needs Small amounts of input and output data Not CPU-intensive Short response time (few minutes) Needs Configuration which allows “immediate” execution (QoS) Services must treat jobs with minimum latency Examples: Prototyping new applications, Monitoring grid operations Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Responsive Apps (II) Characteristics Needs Rapid response: a human waiting for the result! Many small but CPU-intensive tasks User is not aware of “grid”! Needs Interfacing (data & computing) with non-grid application or portal User and rights management between front-end and grid Examples: Appls that use Grid as a backend infrastructure (gMOD, gLibrary, Hadrontherapy, GATE, Interactive Analysis of Medical images, Volcano Sonification) Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
WORKFLOW Characteristics Needs Use of grid and non-grid services Complex set of algorithms for the analysis Complex dependencies between individual tasks Needs Tools for managing the workflow itself Standard interfaces for services (I.e. web-services) Examples: Flood prediction Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
PARALLEL JOBS Characteristics Needs Many interdependent, communicating tasks Many CPUs needed simultaneously Use of MPI libraries Needs Configuration of resources for flexible use of MPI Pre-installation of optimized MPI libraries Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
HOW TO INTEGRATE AN APPLICATION ON GRID Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Complexities of grid applications Simple jobs – submitted to WMS to run in batch mode Job invokes grid services To read & write files on SE Monitoring For outbound connectivity (interactive jobs) To manage metadata … Complex jobs An environment controls multiple jobs on users’ behalf High-level services Portals with workflow Software written for the VO (or by the user) Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Gridification Consequences No development. Wrap existing applications as jobs. No source code modification is required Minor modifications. The application exposes minimal interaction with the grid services (e.g. Data Managements) Major modifications. A wide portion of the code is rewritten to adopt to the new environment (e.g. parallelization, metadata, information) Pure grid applications. Developed from scratch. Extensively exploit existing grid services to provide new capabilities customized for a specific domain (e.g. metadata, job management, credential management) Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
APIs Service Oriented vs Classic APIs Static compilation Shared libraries Libraries are transferred to precompiled Service clients May consume Web Service stubs and develop new clients from scratch Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Software Installation: Who and how? The Experiment Software Manager (ESM) is the member of the experiment VO entitled to install Application Software in the different sites. The ESM can manage (install, validate, remove...) Experiment Software on a site at any time through a normal Grid job, without previous communication to the site administrators. Such job has in general no scheduling priorities and will be treated as any other job of the same VO in the same queue. Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Software Installation: Site configuration (I) The site provides a dedicated space where each supported VO can install or remove software. The amount of available space must be negotiated between the VO and the site. An environmental variable holds the path for the location of a such space. Its format is the following: VO_<name_of_VO>_SW_DIR Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Software Installation: validate process Once the software is installed, the ESM can perform the validation. The validation is meant as a process or a series of processes and procedures that verify(ies) the installation of the software. It can be performed in the same job, after installation (validation on the fly) or later on, via a dedicated job. Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Software Installation: TAG’s publish If the ESM judges that the software is correctly installed; it publishes in the Information System (IS) the tag which identifies unequivocally the software. Such tag is added to the GlueHostApplicationSoftwareRunTimeEnvironment attribute of the IS, using the GRIS running in the CE. Jobs requesting for a particular piece of software can be directed to the appropriate CE just by setting special requirements on the JDL. Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
Invocation applications From the UI Command Line Interfaces / Scripts APIs Higher level tools From portals (like GENIUS developed by NICE srl) For recurring tasks: “core grid services” as well as application layer Accessible from any browser Tailored to applications Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007
The End Messina, Tutorial per I Neo Assunti del Progetto PI2S2, 09-11.01.2007