Download presentation
Presentation is loading. Please wait.
Published byThomas Lawson Modified over 9 years ago
1
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu1
2
The Grid ? 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu2
3
The Problem At one end are computing resources managed by batch queuing systems and other middleware At the other end are end-users and their jobs/applications Need software and protocols for submitting jobs to the computing resources 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu3
4
Job Submission More motivation stuff? 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu4
5
Batch Queuing Systems Submitting a job directly to the batch queuing system One or more queues –Priorities Two common architectures –Client/server –Dynamic offloading User credential (delegation) Jobs have states (e.g. Pending, Running) 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu5
6
Batch Queuing Systems Important examples: –Portable Batch System –TORQUE –Xgrid –Sun Grid Engine –Load Sharing Facility –Condor 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu6
7
Portable Batch System (PBS) Originally developed for NASA Client/server architecture Server: pbs_server Client: pbs_mom Works with MPI with built-in shell script variables 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu7
8
PBS Example litherum@gras:~$ cat test.sh #!/bin/sh #testpbs echo This is a test echo today is `date` echo This is `hostname` echo The current working directory is `pwd` ls -alF /home uptime 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu8
9
PBS Example litherum@gras:~$ qsub test.sh 6.gras.carrion.rit.edu litherum@gras:~$ qstat Job id Name User Time Use S Queue ------------------------- ---------------- --------------- -------- - ----- 6.gras test.sh litherum 00:00:00 C batch litherum@gras:~$ cat test.sh.o6 This is a test today is Sat Jan 17 18:20:20 EST 2009 This is carrion02 The current working directory is /home/litherum total 20 drwxr-xr-x 31 litherum litherum 4096 Jan 17 18:19 litherum/ 18:20:20 up 131 days, 21:20, 0 users, load average: 0.00, 0.00, 0.00 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu9
10
Torque Built on top of PBS Supports reservations, where you can reserve specific resources for specific times. Supports partitions, where you can partition a cluster into smaller sub-clusters. 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu10
11
Torque litherum@gras:~$ showq ACTIVE JOBS-------------------- JOBNAME USERNAME STATE PROC REMAINING STARTTIME 0 Active Jobs 0 of 4 Processors Active (0.00%) 0 of 2 Nodes Active (0.00%) IDLE JOBS---------------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 0 Idle Jobs BLOCKED JOBS---------------- JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 0 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 0 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu11
12
Xgrid Apple Essentially the same as Condor GUI! =) Client/server model http://upload.wikimedia.org/wikipedia/en/6/62/XgridAdminTool.jpg 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu12
13
Sun Grid Engine Open source, like everything new Sun puts out Supports –Reservations –Job dependencies, –Checkpointing –Multiple scheduling algorithms –Web interface Professional! 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu13
14
Load Sharing Facility Used by GRAM, which we’ll talk about later 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu14
15
Condor More about this later, but it implements its own scheduler 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu15
16
Challenging! These queuing systems are hard to use There may be many systems employed in a given grid Wouldn’t it be nice if all this were unified in a single implementation? 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu16
17
A tool for pooling and “scavenging” computing resources and distributing jobs Similar to a batch queuing system [2] –job management –scheduling policy –priority scheme –resource monitoring –resource management. Also focuses on high-throughput and “opportunistic computing” [2] 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu17 Condor image from: http://www.cs.wisc.edu/condor/
18
Condor Universes [1] Standard Vanilla –Simpler, can run universal binaries (do not need to be “condor compiled”) –No support for partial execution or job relocation Others –PVM –MPI –Java 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu18
19
Condor Submission File Example [1] #hello.sub #condor job file example Universe = Vanilla Executable = hello Output = hello.out Input = hello.in Error = hello.err Log = hello.log Queue 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu19
20
Condor Commands condor_submit 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu20
21
Condor Daemons On all condor deployed machines –Master –Startd –Schedd On the condor pool master –Collector –Negotiator 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu21
22
GRAM [4] Globus Resource Allocation Manager (GRAM) –Resource allocation –Process creation –Monitoring –Management –Maps requests expressed in a Resource Specification Language (RSL) into commands to local schedulers and computers. 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu22
23
GRAM Pluggable! Can’t make up their mind how to describe jobs Will submit jobs to: –Condor –LSF –PBS/Torque –??? Unified interface, identifier for which cluster/service to use 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu23
24
GRAM Example maxfield@tg-login1:~> globusrun-ws -submit -factory https://tg-login.ornl.teragrid.org:84 44/wsrf/services/ManagedJobFactoryService -factory-type PBS -streaming -job-command /bin/ hostname Delegating user credentials...Done. Submitting job...Done. Job ID: uuid:89538014-e4f2-11dd-81df-0010180bb4e6 Termination time: 01/18/2009 23:57 GMT Current job state: Pending Current job state: Active tg-c15 Current job state: CleanUp-Hold Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu24
25
Condor-G [4] Condor-G is a Globus-enabled version of the Condor scheduler. It uses Globus to handle inter-organizational problems like: –Security –Resource management for supercomputers, –Executable staging. The same Condor tools that access local resources are now able to use the Globus protocols to access resources at multiple sites. It communicates with these resources and transfers files to and from these resources using Globus mechanisms, such as: –GSI –GRAM protocol for job submission Condor-g can be used to submit jobs to systems managed by Globus. Globus tools can be used to submit jobs to systems managed by Condor 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu25
26
Condor-G 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu26
27
UNICORE 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu27
28
Upperware Talk about motivation for upperware applications 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu28
29
GridShell 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu29
30
References 1.http://www.linuxjournal.com/node/9058/print - Getting started with Condorhttp://www.linuxjournal.com/node/9058/print 2.Thain, D., Tannenbaum, T., & Livny, M. (2005). Distributed computing in practice: the Condor experience. 3.http://grid.rit.edu/seminar/lib/exe/fetch.php/users:jeremy_espenshade:condorjobsu bmission.ppt – Jeremy Espenshade’s condor job submission presentationhttp://grid.rit.edu/seminar/lib/exe/fetch.php/users:jeremy_espenshade:condorjobsu bmission.ppt 4.http://iag.iucc.ac.il/presentations/front2.ppthttp://iag.iucc.ac.il/presentations/front2.ppt 10/19/2015Service Oriented Cyberinfrastructure Lab, http://grid.rit.edu30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.