The Catania Grid Engine and some implementations of the framework Diego Scardaci INFN The Catania Science Gateway Framework - EGI/CHAIN-REDS/eI4Africa Webinar, 15 May 2013
Outline The Catania Grid & Cloud Engine Job Engine Code Sample MyWorkspace & MyJobs Data Engine Implementations of the framework Summary and conclusions 2
The Catania Science Gateway model Science Gateway Science Gateway App. 1 App. 2 App. N Embedded Applications Administrator Power User Basic User Users from different organisations having different roles and privileges Standard-based (SAGA) middleware-independent Grid Engine Standard-based (SAGA) middleware-independent Grid Engine Grid/Cloud/Local middleware supported so far
The Catania Grid & Cloud Engine Grid/Cloud Engine Users Tracking DB Science GW Interface SAGA/JSAGA API Job Engine Data Engine Users Track & Monit. Science GW 1 Science GW 2 Science GW 3 Grid/Cloud/Local MWs Liferay Portlets eToken Server New ModifiedNewModified New Modified
Job Engine The Job Engine is made of a set of libraries to develop applications able to submit and manage jobs on a grid infrastructure; It is compliant with the OGF SAGA standard; It is optimized to be used in a Web Portal running an application server (e.g. Glassfish, Tomcat,…) based on J2EE; It could be used also in stand-alone mode; JSAGA is the SAGA implementation adopted.
A Simple API for Grid Applications (SAGA) SAGA is an API that provides the basic functionality required to build distributed applications, tools and frameworks; It is independent of the details of the underlying infrastructure (e.g., the middleware); SAGA is an OGF specification: Several Implementations are available: A C++ and a Java implementation developed at the Louisiana State University / CCT and Vrije Universiteit Amsterdam ( A Java implementation developed at CCIN2P3 ( A Python implementation based on those above.
A Simple API for Grid Applications (SAGA) SAGA is composed by: SAGA Core Libraries: containing the SAGA base system, the runtime and the API packages (file management, job management, etc.); SAGA Adaptors: libraries providing access to the underlying grid infrastructure (adaptors are available for Globus, gLite, etc.); SAGA defines a standard We then need an implementation!
JSAGA JSAGA is a Java implementation of SAGA developed at CCIN2P3; JSAGA: Enables uniform data and job management across different grid infrastructures/middleware; Makes extensions easy: adaptor interfaces are designed to minimize coding effort for integrating support of new technologies/middleware; Is OS indenpendent: most of the provided adaptors are written in full Java and they are tested both on Windows and Linux.
Job Engine - Requirements The Job Engine has been designed with the following requirements in mind: FeatureDescriptionStatus Middleware Independent Capacity to submit job towards resources running different middleware DONE EasinessCreate code to run applications on the grid in a very short time DONE ScalabilityManage a huge number of parallel job submissions fully exploiting the HW of the machine where the Job Engine is installed DONE PerformanceHave a good response timeDONE AccountingRegister every grid operation performed by the usersDONE Fault ToleranceHide middleware failure to the final usersDONE WorkflowProviding a way to easily create workflowPARTIALLY DONE
Job Engine - Architecture WT Worker Threads for Job Submission WT Worker Threads for Job Check Status USERS TRACKING DB MONITORING MODULE DCIs Jobs Queue WT Jobs Submission Jobs Check status/ Get output
Job Engine Middleware Independent JSAGA supports gLite, Globus, ARC, UNICORE, etc. Adding new adaptors in JSAGA is a easy job
Job Engine Easiness Allow to develop application able to submit jobs on the grid in a very short time; A very intuitive API is exposed to the developers; Support MPI applications; The developer has only to submit the job: The Job Engine periodically check the job status; When the Job is done, the job output is automatically downloaded by the Job Engine in the local machine.
Job Engine Scalability (1/2) The Job Engine is able to manage a huge number of parallel job submissions fully exploiting the HW of the machine where it is installed; It enqueues all the parallel requests received serving it according to the HW capabilities; The Job Engine thread pools can be configured to optimally exploit the HW capabilities; A burst of parallel job submissions cannot damage the Job Engine responsiveness thanks to the protection provided by the thread pool mechanism.
Job Engine Scalability (2/2) The answer time is linear; Response time depends on the HW capabilities and thread pools configuration.
Job Engine Performance All the delays due to grid interactions are hidden to the final users: The Job Engine provide asynchronous functions for each job management actions (submit, check status, download output, cancel); Final users “feel” a response time equals to 0. The Job Engine is able to submit thousands of jobs in a short time: The Job Engine submit jobs using a configurable thread pool; The Job Engine is able to submit jobs in less than 1 hour with 50 threads in the thread pool; We could improve this measurement increasing the number of threads in the thread pool.
Job Engine Accounting A very powerful accounting system is included in the job engine; It is fully compliant with EGI VO Portal Policy and EGI Grid Security Traceability and Logging Policy; The following values are stored in the DB for each job submitted: Users; Job Submission timestamp; Job Done timestamp; Application submitted; Job ID; Proxy used; VO; Site where the job is running (e.g., CE for EMI-gLite).
Job Engine Fault Tolerance Job Engine implements an advanced mechanism to guarantee job submission: Developers can set an appropriate value of “shallow retry”; Automatic re-submission mechanism when a job is aborted: Hide every failure to the final users
Job Engine Collection, Parametric & Workflow Currently the Job Engine natively support: Collection jobs; Parametric jobs; N-1 workflows. In the next releases we’ll add the support for DAG (Direct Acyclic Graph)
Example Job Submission String wmsList[] = {"wms://wms-4.dir.garr.it:7443/glite_wms_wmproxy_server“, "wms://wms005.cnaf.infn.it:7443/glite_wms_wmproxy_server"/}; Resource Manager List Job Description GEJobDescription description = new GEJobDescription(); description.setExecutable("/bin/sh"); description.setArguments("hostname.sh"); description.setInputFiles("/home/mario/Documenti/hostname.sh"); description.setOutput(“Output.txt"); description.setError(“Error.txt"); Submit the Job JSagaJobSubmission tmpJSaga = new JSagaJobSubmission(description); tmpJSaga.useRobotProxy("etokenserver.ct.infn.it", "8082", "332576f78a4fe70a e90cd11f", "gridit", "gridit", true); tmpJSaga.setResourceManagerList(wmsList); tmpJSaga.submitJobAsync("mtorrisi", " :8162", 1, "gLite Test job“);
Example Workflow N-1 (1/3) Job 1Job 2Job 3Job N … Final Job OUTPUT InfrastructureInfo infrastructures[] = new InfrastructureInfo[2]; String wmsList[] = { "wms://wms005.cnaf.infn.it:7443/glite_wms_wmproxy_server" }; infrastructures[0] = new InfrastructureInfo("gridit", "ldap://gridit-bdii-01.cnaf.infn.it:2170", wmsList, "etokenserver.ct.infn.it", "8082“, "332576f78a4fe70a e90cd11f", "gridit", "gridit"); String globusList[] = {"wsgram://xn03.ctsf.cdacb.in:8443/GW"}; Infrastructures[1] = new InfrastructureInfo("GARUDA","wsgram","", globusList, "etokenserver.ct.infn.it","8082","332576f78a4fe70a e90cd11f","gridit","gridit"); Define Infrastructures
Example Workflow N-1 (2/3) for (int i = 0; i < 3; i++) { GEJobDescription description = new GEJobDescription(); description.setExecutable("/bin/sh"); switch (i) { case 0: description.setArguments("hostname.sh"); description.setInputFiles("/home/diego/test_wf/hostname.sh"); break; case 1: description.setArguments("ls.sh"); description.setInputFiles("/home/diego/test_wf/ls.sh"); break; case 2: description.setArguments("pwd.sh"); description.setInputFiles("/home/diego/test_wf/pwd.sh"); break; } description.set… descriptions.add(description); } Define First Level Jobs
Example Workflow N-1 (3/3) GEJobDescription finalJobDescription = new GEJobDescription(); finalJobDescription.setExecutable("/bin/sh"); finalJobDescription.setArguments("ls.sh"); String tmp = ""; for(int i = 0; i < descriptions.size(); i++){ if(tmp.equals("")) tmp=descriptions.get(i).getOutput(); else tmp+=","+descriptions.get(i).getOutput(); } finalJobDescription.setInputFiles(tmp + ",/home/diego/test_wf/ls.sh,/home/diego/test_wf/ifconfig.sh"); finalJobDescription.setOutput("myOutput-FinalJob.txt"); finalJobDescription.setError("myError-FinalJob.txt"); Define Final Job JobCollection wf = new WorkflowN1("scardaci", "Workflow N-1", "/tmp", descriptions, finalJobDescription); JobCollectionSubmission tmpJobCollectionSubmission = new JobCollectionSubmission(wf); tmpJobCollectionSubmission.submitJobCollection(infrastructures, " :8162", 1); Submit the Workflow
My Workspace User’ Data User Jobs Help User Job Map
MyJobs
Data Engine Make interfaces simple for non expert users CLI-based Grid storage interface is not straightforward Grid transactions require user certificates Complexity of current protocols to manage grid storage elements Very little or no support for access through modern browsers or others web-based applications
26 1. Sign in eTokenServer User Tracking DB DOGS DB 5. File Upload 3. Proxy request 4. Proxy transfer 7. Update DB 6. Upload on Grid 7. Tracking 2. Upload request
Data On Grid Services DOGS A file browser shows Grid files in a tree File system exposed by the SG is virtual Easy transfer from/to Grid (by SG) is done in a few clicks Users do not need to care about how and where their files are really located
Grid & Cloud Engine Javadoc
The VRC-driven GISELA Science Gateway ( The GISELA SG is a Web site that allows users to fully exploit the e- Infrastructure both computing (jobs) and storage (data) services through a normal web browser. Multi Languages support
The VRC-driven GISELA SG: A new concept of Application Database Links to applications run pages A new concept of Application registry allowing users to directly submit selected high impact applications on the GISELA e-infrastructure in an easy way
The GISELA Science Gateway: Phylogenetics (MrBayes) Input Parameters
The GISELA Science Gateway Feedback from the final review The main technological achievement of the project was the adoption of a Science Gateway approach for the LA user communities decreasing significantly the entry barrier for new users. The SG approach has proven to successfully answer the needs and expectations of user communities and subsequently be an attractor for Latin American stakeholders Impact on the Latin American scientific and research community has been very good. The technical objectives have been met and even exceeded expectations. In particular the Science Gateway approach has taken advantage of current developments and considered sensibly the demands of user communities and future e-infrastructure providers.
DECIDE ( 33
The agINFRA Science Gateway R ( Well known statistical data analysis tool
Summary and conclusions The Catania Grid & Cloud Engine provides developers with a powerful API to port applications on a e-Infrastrucure in a esay way The same code can be used to execute an application in different types of e-Infrastructures: Grid, Cloud, HPC, local Several projects already adopted and successfully used the CSGF and the Grid & Cloud Engine 35
References The Catania Science Gateway Framework: Grid & Cloud Engine Javadoc: gateways.it/training-materialhttp:// gateways.it/training-material A Simple API for Grid Applications (SAGA): JSAGA:
Thank you ! 37 Would you like to evaluate the CSGF and the Grid & Cloud Engine ? Please, write to