Batch Systems In a number of scientific computing environments, multiple users must share a compute resource: –research clusters –supercomputing centers.

Slides:



Advertisements
Similar presentations
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Advertisements

Southgreen HPC system Concepts Cluster : compute farm i.e. a collection of compute servers that can be shared and accessed through a single “portal”
Batch Queuing Systems The Portable Batch System (PBS) and the Load Sharing Facility (LSF) queuing systems share much common functionality in running batch.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
DCC/FCUP Grid Computing 1 Resource Management Systems.
Dr Mohamed Menacer College of Computer Science and Engineering Taibah University CS-334: Computer.
Introducing the Command Line CMSC 121 Introduction to UNIX Much of the material in these slides was taken from Dan Hood’s CMSC 121 Lecture Notes.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Uniprocessor Scheduling Chapter 9. Aim of Scheduling The key to multiprogramming is scheduling Scheduling is done to meet the goals of –Response time.
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh ssh.fsl.byu.edu You will be logged in to an interactive node.
Computer Organization and Architecture
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
Quick Tutorial on MPICH for NIC-Cluster CS 387 Class Notes.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Introduction to UNIX/Linux Exercises Dan Stanzione.
CPU Scheduling Chapter 6 Chapter 6.
Chapter 9 Uniprocessor Scheduling Spring, 2011 School of Computer Science & Engineering Chung-Ang University.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Zellescher Weg 12 Trefftz-Building – HRSK/151 Phone Guido Juckeland Center for Information Services.
VIPBG LINUX CLUSTER By Helen Wang March 29th, 2013.
Scientific Computing Division Juli Rew CISL User Forum May 19, 2005 Scheduler Basics.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Linux+ Guide to Linux Certification, Third Edition
Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh You will be logged in to an interactive.
Operating Systems Process Management.
Research Computing Environment at the University of Alberta Diego Novillo Research Computing Support Group University of Alberta April 1999.
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems with Multi-programming Chapter 4.
The Foundation API How does it work?. How It Runs... At the DE: The job is started, it runs the Foundation API App, with the information provided in json#1.
1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi
1 Lattice QCD Clusters Amitoj Singh Fermi National Accelerator Laboratory.
Chapter 11: Operating System Support Dr Mohamed Menacer Taibah University
Linux Commands C151 Multi-User Operating Systems.
Software Tools Using PBS. Software tools Portland compilers pgf77 pgf90 pghpf pgcc pgCC Portland debugger GNU compilers g77 gcc Intel ifort icc.
Running Parallel Jobs Cray XE6 Workshop February 7, 2011 David Turner NERSC User Services Group.
Uniprocessor Scheduling Chapter 9. Aim of Scheduling Assign processes to be executed by the processor or processors: –Response time –Throughput –Processor.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
Introduction to Hartree Centre Resources: IBM iDataPlex Cluster and Training Workstations Rob Allan Scientific Computing Department STFC Daresbury Laboratory.
Agenda The Bourne Shell – Part I Redirection ( >, >>,
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
Grid Computing: An Overview and Tutorial Kenny Daily BIT Presentation 22/09/2016.
Gridengine Configuration review ● Gridengine overview ● Our current setup ● The scheduler ● Scheduling policies ● Stats from the clusters.
Advanced Computing Facility Introduction
Welcome to Indiana University Clusters
Assumptions What are the prerequisites? … The hands on portion of the workshop will be on the command-line. If you are not familiar with the command.
OpenPBS – Distributed Workload Management System
Welcome to Indiana University Clusters
Chapter 2: System Structures
William Stallings Computer Organization and Architecture
Architecture & System Overview
Practical aspects of multi-core job submission at CERN
Introduction to Operating System (OS)
CMU Access via Launch Cluster Management Utility GUI.
Bruce Pullig Solution Architect
Operating Systems.
Operating systems Process scheduling.
Bruce Pullig Solution Architect
Chapter 2: Operating-System Structures
Quick Tutorial on MPICH for NIC-Cluster
Chapter 2: Operating-System Structures
Working in The IITJ HPC System
Presentation transcript:

Batch Systems In a number of scientific computing environments, multiple users must share a compute resource: –research clusters –supercomputing centers On multi-user HPC clusters, the batch system is a key component for aggregating compute nodes into a single, sharable computing resource The batch system becomes the “nerve center” for coordinating the use of resources and controlling the state of the system in a way that must be “fair” to its users As current and future expert users of large-scale compute resources, you need to be familiar with the basics of a batch system

Batch Systems The core functionality of all batch systems are essentially the same, regardless of the size or specific configuration of the compute hardware: –Multiple Job Queues: queues provide an orderly environment for managing a large number of jobs queues are defined with a variety of limits for maximum run times, memory usage, and processor counts; they are often assigned different priority levels as well may be interactive or non-interactive –Job Control: submission of individual jobs to do some work (eg. serial, or parallel HPC applications) simple monitoring and manipulation of individual jobs, and collection of resource usage statistics (e.g., memory usage, CPU usage, and elapsed wall- clock time per job) –Job Scheduling policy which decides priority between individual user jobs allocates resources to scheduled jobs

Batch Systems Job Scheduling Policies: –the scheduler must decide how to prioritize all the jobs on the system and allocate necessary resources for each job (processors, memory, file- systems, etc) –scheduling process can be easy or non-trivial depending on the size and desired functionality first in, first out (FIFO) scheduling: jobs are simply scheduled in the order in which they are submitted political scheduling: enables some users to have more priority than others fairshare scheduling, scheduler ensures users have equal access over time –Additional features may also impact scheduling order: advanced reservations - resources can be reserved in advance for a particular user or job backfill - can be combined with any of the scheduling paradigms to allow smaller jobs to run while waiting for enough resources to become available for larger jobs –back-fill of smaller jobs helps maximize the overall resource utilization –back-fill can be your friend for small duration jobs

Batch Systems Common batch systems you may encounter in scientific computing: –Platform LSF –PBS –Loadleveler (IBM) –SGE All have similar functionality but different syntax Reasonably straight forward to convert your job scripts from one system to another Above all include specific batch system directives which can be placed in a shell script to request certain resources (processors, queues, etc). We will focus on LSF primarily since it is the system running on Lonestar

Batch Submission Process interne t Server Head C1C2C3C4 Submission: bsub < job Queue: Job Script waits for resources on Server Master: Compute Node that executes the job script, launches ALL MPI processes Launch: ssh to each compute node to start executable (e.g. a.out) Launch mpirun Master Queue Compute Nodes mpirun –np #./a.out ibrun./a.out

LSF Batch System Lonestar uses Platform LSF for both the batch queuing system and scheduling mechanism (provides similar functionality to PBS, but requires different commands for job submission and monitoring) LSF includes global fairshare, a mechanism for ensuring no one user monopolizes the computing resources Batch jobs are submitted on the front end and are subsequently executed on compute nodes as resources become available Order of job execution depends on a variety of parameters: –Submission Time –Queue Priority: some queues have higher priorities than others –Backfill Opportunities: small jobs may be back-filled while waiting for bigger jobs to complete –Fairshare Priority: users who have recently used a lot of compute resources will have a lower priority than those who are submitting new jobs –Advanced Reservations: jobs my be blocked in order to accommodate advanced reservations (for example, during maintenance windows) –Number of Actively Scheduled Jobs: there are limits on the maximum number of concurrent processors used by each user

Lonestar Queue Definitions Queue Name Max Runtime Min/Max Procs SU Charg e Rate Use normal24 hours2/2561.0Normal usage high24 hours2/2561.8Higher priority usage development15 min1/321.0 Debugging and development Allows interactive jobs hero24 hours> Large job submission Requires special permission serial12 hours1/11.0 For serial jobs. No more than 4 jobs/user requestSpecial Requests spruce Debugging & development, special priority, urgent comp. env. systestSystem Use (TACC Staff only)

Lonestar Queue Definitions Additional Queue Limits –In the normal and high queues, only a maximum of 512 processes can be used at one time. Jobs requiring more processors are deferred for possible scheduling until running jobs complete. For example, a single user can have the following job combinations eligible for scheduling: Run 2 jobs requiring 256 procs Run 4 jobs requiring 128 procs each Run 8 jobs requiring 64 procs each Run 16jobs requiring 32 procs each –A maximum of 25 queued jobs per user is allowed at one time

LSF Fairshare A global fairshare mechanism is implemented on Lonestar to provide fair access to its substantial compute resources Fairshare computes a dynamic priority for each user and uses this priority in making scheduling decisions Dynamic priority is based on the following criteria –Number of shares assigned –Resources used by jobs belonging to the user: Number of job slots reserved Run time of running jobs Cumulative actual CPU time (not normalized), adjusted so that recently used CPU time is weighted more heavily than CPU time used in the distant past

LSF Fairshare bhpart : Command to see current fairshare priority. For example: lslogin1--> bhpart -r HOST_PARTITION_NAME: GlobalPartition HOSTS: all SHARE_INFO_FOR: GlobalPartition/ USER/GROUP SHARES PRIORITY STARTED RESERVED CPU_TIME RUN_TIME avijit chona ewalker minyard phaa bbarth milfeld karl vmcalo Priority

Commonly Used LSF Commands bhosts Displays configured compute nodes and their static and dynamic resources (including job slot limits) lsload Displays dynamic load information for compute nodes (avg CPU usage, memory usage, available /tmp space) bsub submits a batch job to LSF bqueues displays information about available queues bjobs displays information about running and queued jobs bhist displays historical information about jobs bstop suspends unfinished jobs bresume resumes one or more suspended jobs bkill Sends signal to kill, suspend, or resume unfinished jobs bhpart Displays global fairshare priority lshosts Displays hosts and their static resource configuration lsuser Shows user job information Note: most of these commands support a “-l” argument for long listings. For example: bhist –l will give a detailed history of a specific job. Consult the man pages for each of these commands for more information.

LSF Batch System LSF Defined Environment Variables: LSB_ERRORFILEname of the error file LSB_JOBIDbatch job id LS_JOBPIDprocess id of the job LSB_HOSTS list of hosts assigned to the job. Multi-cpu hosts will appear more than once (may get truncated) LSB_QUEUEbatch queue to which job was submitted LSB_JOBNAMEname user assigned to the job LS_SUBCWD directory of submission, i.e. this variable is set equal to $cwd when the job is submitted LSB_INTERACTIVEset to ‘y’ when the –I option is used with bsub

LSF Batch System Comparison of LSF, PBS and Loadleveler commands that provide similar functionality LSFPBSLoadleveler bresumeqrls | qsitllhold -r bsubqsubllsubmit bqueuesqstatllclass bjobsqstatllq bstopqholdllhold bkillqdelllcancel

Batch System Concerns Submission (need to know) –Required Resources –Run-time Environment –Directory of Submission –Directory of Execution –Files for stdout/stderr Return – Notification Job Monitoring Job Deletion –Queued Jobs –Running Jobs

LSF: Basic MPI Job Script #!/bin/csh #BSUB -n 32 #BSUB -J hello #BSUB -o %J.out #BSUB -e %J.err #BSUB -q normal #BSUB -P A-ccsc #BSUB -W 0:15 echo "Master Host = "`hostname` echo "LSF_SUBMIT_DIR: $LS_SUBCWD" echo "PWD_DIR: "`pwd` ibrun./hello Total number of processes Job name Stdout Output file name (%J = jobID) Submission queue Echo pertinent environment info Execution command executableParallel application manager and mpirun wrapper script Stderr Output file name Your Project Name Max Run Time (15 minutes)

LSF: Extended MPI Job Script #!/bin/csh #BSUB -n 32 #BSUB -J hello #BSUB -o %J.out #BSUB -e %J.err #BSUB -q normal #BSUB -P A-ccsc #BSUB -W 0:15 #BSUB -w ‘ended(1123)' #BSUB -u #BSUB -B #BSUB -N echo "Master Host = "`hostname` echo "LSF_SUBMIT_DIR: $LS_SUBCWD" ibrun./hello Total number of processes Job name Stdout Output file name (%J = jobID) Submission queue Stderr Output file name Your Project Name address when job begins execution job report information upon completion Dependency on Job Max Run Time (15 minutes)

LSF: Job Script Submission When submitting jobs to LSF using a job script, a redirection is required for bsub to read the commands. Consider the following script: lslogin1> cat job.script #!/bin/csh #BSUB -n 32 #BSUB -J hello #BSUB -o %J.out #BSUB -e %J.err #BSUB -q normal #BSUB -W 0:15 echo "Master Host = "`hostname` echo "LSF_SUBMIT_DIR: $LS_SUBCWD“ echo "PWD_DIR: "`pwd` ibrun./hello To submit the job: lslogin1% bsub < job Re-direction is required!

LSF: Interactive Execution Several ways to run interactively –Submit entire command to bsub directly: > bsub –q development -I -n 2 -W 0:15 ibrun./hello Your job is being routed to the development queue Job is submitted to queue. > > Hello, world! --> Process # 0 of 2 is alive. ->compute > Process # 1 of 2 is alive. ->compute-1-0 –Submit using normal job script and include additional -I directive: > bsub -I < job.script

Batch Script Suggestions Echo issuing commands –(“set -x” and “set echo” for ksh and csh). Avoid absolute pathnames –Use relative path names or environment variables ($HOME, $WORK) Abort job when a critical command fails. Print environment –Include the "env" command if your batch job doesn't execute the same as in an interactive execution. Use “./” prefix for executing commands in the current directory –The dot means to look for commands in the present working directory. Not all systems include "." in your $PATH variable. (usage:./a.out). Track your CPU time

LSF Job Monitoring (showq utility) lslogin1% showq ACTIVE JOBS JOBID JOBNAME USERNAME STATE PROC REMAINING STARTTIME _90_96x6 vmcalo Running 64 18:09:19 Fri Jan 9 10:43: naf phaa406 Running 16 17:51:15 Fri Jan 9 10:25: N phaa406 Running 16 18:19:12 Fri Jan 9 10:53:46 23 Active jobs 504 of 556 Processors Active (90.65%) IDLE JOBS JOBID JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME poroe8 xgai Idle :00:00 Thu Jan 8 10:17: meshconv019 bbarth Idle 16 24:00:00 Fri Jan 9 16:24:18 3 Idle jobs BLOCKED JOBS JOBID JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME _90_96x6 vmcalo Deferred 64 24:00:00 Thu Jan 8 18:09: _90_96x6 vmcalo Deferred 64 24:00:00 Thu Jan 8 18:09:11 17 Blocked jobs Total Jobs: 43 Active Jobs: 23 Idle Jobs: 3 Blocked Jobs: 17

LSF Job Monitoring (bjobs command) lslogin1% bjobs JOBID USER STAT QUEUE FROM_HOST EXEC_HOST JOB_NAME SUBMIT_TIME bbarth RUN normal lonestar 2*compute-8 *shconv009 Jan 9 16:24 2*compute *compute *compute *compute *compute-4-2 2*compute-3-9 2*compute bbarth RUN normal lonestar 2*compute-3 *shconv014 Jan 9 16:24 2*compute-6-2 2*compute-6-5 2*compute *compute *compute *compute-3-5 2*compute bbarth PEND normal lonestar *shconv028 Jan 9 16: bbarth PEND normal lonestar *shconv029 Jan 9 16: bbarth PEND normal lonestar *shconv033 Jan 9 16: bbarth PEND normal lonestar *shconv034 Jan 9 16: bbarth PEND normal lonestar *shconv038 Jan 9 16: bbarth PEND normal lonestar *shconv039 Jan 9 16:38 Note: Use “ bjobs -u all” to see jobs from all users.

LSF Job Monitoring (lsuser utility) lslogin1$ lsuser -u vap JOBID QUEUE USER NAME PROCS SUBMITTED normal vap vap_hd_sh_p96 14 Tue Jun 7 10:37: HOST R15s R1m R15m PAGES MEM SWAP TEMP compute P/s 1840M 2038M 24320M compute P/s 1839M 2041M 23712M compute P/s 1838M 2038M 24752M compute P/s 1847M 2041M 23216M compute P/s 1851M 2040M 24752M compute P/s 1845M 2038M 24432M compute P/s 1841M 2040M 24752M

LSF Job Manipulation/Monitoring To kill a running or queued job (takes ~30 seconds to complete): bkill bkill -r (Use when bkill alone won’t delete the job) To suspend a queued job: bstop To resume a suspended job: bresume To see more information on why a job is pending: bjobs –p To see a historical summary of a job: bhist lslogin1> bhist Summary of time in seconds spent in various states: JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL karl hello