EGEE-II INFSO-RI Enabling Grids for E-sciencE A glance to the future Mike Mineter TOE-NeSC Application Developers Course May 12-13, 2007 Manchester, UK
Enabling Grids for E-sciencE EGEE-II INFSO-RI Overview WMProxy GridWay –RESPECT – EGEE Initiative to collect useful tools that work with gLite See EGEE application portal: Under construction –GridWay is one of the soon-to-be RESPECTed tools Projects to watch include –ETICS –OMII-Europe
Enabling Grids for E-sciencE EGEE-II INFSO-RI New Functionality: WMProxy WMProxy server –Will replace the old C++ based socket connection service –Implements an interoperable interface Web Service based WS-I compliant WMProxy client –Provides C++ based WMS command-line User Interface (UI), which executes all the needed operation automatically –Provides multi language (C++, Java and Python) provided APIs
Enabling Grids for E-sciencE EGEE-II INFSO-RI WMS Architecture overview Job Controller CondorG gLite WMS Workload Manager LB Proxy WMProxy UserInterface LB Server gLite CE LCG CE Job Controller CondorC Log Monitor
Enabling Grids for E-sciencE EGEE-II INFSO-RI JDL: Single Types Single Jobs –Normal: single and simple batch job with no peculiar requirements –MPICH: a parallel application to be run on the nodes of a cluster using the MPICH implementation of the message passing interface new MPI flavours support planned –Interactive: a job whose standard streams are forwarded to the submitting client, that can actually interact and steer the job execution by providing real-time input information
Enabling Grids for E-sciencE EGEE-II INFSO-RI JDL: Single Types Single Jobs –Normal: single and simple batch job with no peculiar requirements –MPICH: a parallel application to be run on the nodes of a cluster using the MPICH implementation of the message passing interface new MPI flavours support planned –Interactive: a job whose standard streams are forwarded to the submitting client, that can actually interact and steer the job execution by providing real-time input information Previously Supported Job Types –Not supported anymore: Checkpointable Jobs Partitionable Jobs –Deprecation due Lack of feedback from users It seems they are not used at all –Focus on improving support for “really used” job types
Enabling Grids for E-sciencE EGEE-II INFSO-RI JDL: Compound Jobs Definition –Aggregation of Single/Normal Jobs Benefits –One Shot submission for (up to thousands of) jobs Single call to WMProxy server Single AuthN and AuthZ process Submission time reduction –Single Identification to manage all jobs (father Job) Not an actual Job, used to monitor the whole bunch –Sharing of files between jobs
Enabling Grids for E-sciencE EGEE-II INFSO-RI Middleware structure Applications have access both to Higher-level Grid Services and to Foundation Grid Middleware Higher-Level Grid Services are supposed to help the users building their computing infrastructure but should not be mandatory Foundation Grid Middleware will be deployed on the EGEE infrastructure –Must be complete and robust –Should allow interoperation with other major grid infrastructures –Should not assume the use of Higher-Level Grid Services Foundation Grid Middleware Security model and infrastructure Computing (CE) and Storage Elements (SE) Accounting Information and Monitoring Higher-Level Grid Services Workload Management Replica Management Visualization Workflow Grid Economies... Applications
Enabling Grids for E-sciencE EGEE-II INFSO-RI GridWay –one of the tools recognised by EGEE’s RESPECT –Alternative to WMS Examples of use: –Many similar jobs –Short jobs –Resources outside EGEE also to be used –User-site-specific policies are required (priorities of users’ jobs) –…
The GridWay Metascheduler
Contents What is GridWay? Which are the benefits of using GridWay? How do I use GridWay in EGEE? Who is using GridWay in EGEE? Where can I get GridWay?
What is GridWay? For the user A LRMS-like environment for submitting, controlling & monitor applications A way to execute your applications on the Grid, without having to worry about resource brokering, file staging or failures GridWay is a meta-scheduler that works on top of Globus-based services (e.g. GRAM, MDS & GridFTP) on a variety of infrastructures (EGEE, OSG, TeraGrid, NorduGrid…) For the Application Developer A standard-base development framework for Grid Applications JAVA and C bindings of DRMAA API For the System Administrator A policy-driven job scheduler, implementing a wide range of access and Grid-aware policies.
What is GridWay? Application-Infrastructure decoupling PBS GridWay SGE $> CLI Results.C,.java DRMAA.C,.java Infrastructure Grid Middleware Applications Globus, gLite, … Grid Meta- Scheduler standard API (OGF DRMAA) Command Line Interface open source job execution management resource brokering Basic Grid services Standard interfaces end-to-end (e.g. TCP/IP) highly dynamic & heterogeneous high fault rate
Which are the benefits of using GridWay? FEATURESBENEFITS Support for DRMAA standard (C and JAVA bindings) & JSDL standard Compatibility of applications with DRM systems that implements the standard, such as SGE, Torque,... DRM Command Line Interface (allow users to submit, kill, migrate, monitor and synchronize jobs) CLI similar to that provided by local resource managers Lightweight middlewareHigher efficiencies for given application profiles Site-level accounting and scheduling policies Analysis of resource utilization, determining trends in usage and monitoring user behavior Installation: minimal requirements – also portable (Mac..),.. Easy and fast deployment and maintenance Interoperability with resources outside EGEE Simultaneous access to different infrastructures (LCG, gLite…) GridWay Complements gLite Providing the Following Benefits
How do I use GridWay in EGEE? Execution Manager Transfer Manager Information Manager Dispatch Manager Request Manager Scheduler Job PoolHost Pool DRMAA libraryCLI GridWay Core SE /WN GridFTP Servers CE gatekeepers GridFTP pre-WS GRAM BDII Information Servers MDS2 GLUE Resource Discovery Resource Monitoring Resource Discovery Resource Monitoring Job Preparation Job Termination Job Migration Job Preparation Job Termination Job Migration Job Submission Job Monitoring Job Control Job Migration Job Submission Job Monitoring Job Control Job Migration User Interface (UI) UI or site Workstations EGEE Infrastructure
How do I use GridWay in EGEE? # Execution variables EXECUTABLE = job ARGUMENTS = ${TASK_ID} ${TOTAL_TASKS} ENVIRONMENT = LD_LIBRARY_PATH=/usr/local/lib # Resource selection parameters REQUIREMENTS = HOSTNAME= "*.dacya.ucm.es" RANK = CPU_MHZ # I/O files INPUT_FILES = my_inputfile OUTPUT_FILES = my_outputfile # Standard streams STDOUT_FILE = stdout_file.${TASK_ID} STDERR_FILE = stderr_file.${TASK_ID} Job Template
How do I use GridWay in EGEE? gwps : display job information and status USER JID AID TID DM EM START END EXEC XFER EXIT NAME HOST. ruben done :31:57 15:44:08 0:10:01 0:01:26 0 job1.jt cluster.pnpi.nw.ru rgh done :31:58 15:44:11 0:09:59 0:01:26 0 MPI.jt e1.egee.fr.cgg.com rgh done :07:44 17:21:09 0:11:27 0:01:28 0 maratra.jt aquila.dacya.ucm.es nacho prol :07:47 --:--:-- 0:11:19 0:01:43 -- maratra.jt e1.egee.fr.cgg.com rgh done :41:29 17:55:07 0:11:29 0:01:27 0 maratra.jt heplnx201.pp.ac.uk rgh done :41:32 17:54:05 0:10:24 0:01:28 0 test.jt e1.egee.fr.cgg.com jlvazq pend :58:38 --:--:-- 0:54:06 0:58:37 -- test.jt gridgate.cs.tcd.ie HID OS ARCH MHZ %CPU MEM(F/T) DISK(F/T) N(U/F/T) LRMS HOSTNAME 0 Scientific i /513 0/0 0/169/224 jobmanager-lcgpbs cg02.ciemat.es 1 Scientific i /1536 0/0 0/2/30 jobmanager-lcgpbs lcgce01.jin.ru 2 Scientific i /2048 0/0 0/1/98 jobmanager-lcgpbs lcg6.smsu.ru 3 Scientific i /2048 0/0 0/0/6 jobmanager-pbs ce1.cgg.com 4 Scientific i /2048 0/0 0/0/56 jobmanager-pbs cluster.nw.ru 5 Linux x / / /1/1 Fork cygnus.ucm.es 6 Linux x / / /2/2 SGE aquila.ucm.es 7 Linux x / / /1/1 Fork draco.ucm.es 8 Linux x /513 0/0 0/1/2 SGE ursa.ucm.es 9 Linux x / / /2/2 PBS hydrus.ucm.es gwhost: display resources information and status
How do I use GridWay in EGEE? gwkill: signals a job (kill, stop, resume, reschedule) gwsubmit: submits a job, or an array job gwwait: waits for zombie state of a job (any, all, set) gwuser: displays information about users gwacct: prints accounting information HID START END PROLOG WRAPPER EPILOG MIGR REASON QUEUE HOST. 2 15:40:22 15:44:11 0:00:15 0:03:15 0:00:19 0:00: fusion e1.egee.fr.cgg.com 1 15:36:22 15:40:09 0:00:09 0:03:21 0:00:17 0:00:00 err fusion e2.egee.cesga.es 0 15:32:22 15:36:11 0:00:07 0:03:23 0:00:19 0:00:00 err fusion ce-egee.bifi.unizar gwhistory: display job execution history Other Commands
Who is using GridWay in EGEE? SGE Cluster Users PBS Cluster gLite GridWay gLite Services: BDII, GRAM, GridFTP EGEE Resource Broker DRMAA interface VO Schedulers GridWay Users BiomedFusion EGEE RB Massive Ray Tracing CD-HIT workflow
Where can I get GridWay? From the Gridway webpage: From the ETICS repository From the Globus CVS repository (cvs.globus.org) Download the software Gridway webpage: Application porting with GridWay Infrastructures using GridWay More Information
Enabling Grids for E-sciencE EGEE-II INFSO-RI Two projects adding value to EGEE Significant for future application developers…. ETICS –Build/test for grid services –Spin-off from gLite development & certification –Used in OMII-Europe software repository (among others) OMII-Europe –Creating / re-engineering services that use standards –Effect will include bridge-building across grids The vision: a VO will be able to use services across gLite / UNICORE/ Globus/ CROWN / OMII-UK grids
EU project: RIO31844-OMII-EUROPE 22 What will OMII-Europe do? Initial focus on providing common interfaces and integration of major Grid software infrastructures Common interoperable services: –Data Access, Virtual Organisation Management, Portal, Accounting, Job Submission and Job Monitoring –Capability to add additional services Infrastructure integration –Initial EGEE/UNICORE/Globus/CROWN interoperability –Interoperable security framework
EU project: RIO31844-OMII-EUROPE 23 OMII-Europe Repository and ETICS Project Repository PP Software Repository (CVS, Subversion, tar.gz, zip) RRR Build Artefact Repository (rpm, deb, tar.gz, zip) BBB PUBLIC VIEW ETICS (Build & Test) NMI Scripts Condor Pools NMI Build Config Created Artefact NMI Build Config NMI Test Config SS
Enabling Grids for E-sciencE EGEE-II INFSO-RI Summary Application developers will benefit from up-coming functionality from: –gLite with WMProxy –RESPECT – prominent is GridWay –Related projects ETICS: build and test of grid services OMII-Europe: components that will permit a VO’s resources to span grids