J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Using tcpdump. tcpdump is a powerful tool that allows us to sniff network packets and make some statistical analysis out of those dumps. tcpdump operates.
Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.
More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
Running Jobs on Jacquard An overview of interactive and batch computing, with comparsions to Seaborg David Turner NUG Meeting 3 Oct 2005.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Job Submission on WestGrid Feb on Access Grid.
1 Processes Professor Jennifer Rexford
Processes CSCI 444/544 Operating Systems Fall 2008.
Understanding Operating Systems 1 Overview Introduction Operating System Components Machine Hardware Types of Operating Systems Brief History of Operating.
1 Introduction Chapter What is an operating system 1.2 History of operating systems 1.3 The operating system zoo 1.4 Computer hardware review 1.5.
1 Operating Systems Ch An Overview. Architecture of Computer Hardware and Systems Software Irv Englander, John Wiley, Bare Bones Computer.
Slide 1 of 9 Presenting 24x7 Scheduler The art of computer automation Press PageDown key or click to advance.
Jaeyoung Yoon Computer Sciences Department University of Wisconsin-Madison Virtual Machines in Condor.
Chapter 13: Sharing Printers on Windows Server 2008 R2 Networks BAI617.
Utilizing Condor and HTC to address archiving online courses at Clemson on a weekly basis Sam Hoover 1 Project Blackbird Computing,
Chapter 3 Operating Systems Introduction to CS 1 st Semester, 2015 Sanghyun Park.
Task Farming on HPCx David Henty HPCx Applications Support
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
High Throughput Computing with Condor at Purdue XSEDE ECSS Monthly Symposium Condor.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.
Operating Systems.  Operating System Support Operating System Support  OS As User/Computer Interface OS As User/Computer Interface  OS As Resource.
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Operating Systems.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.
Office of Science U.S. Department of Energy Evaluating Checkpoint/Restart on the IBM SP Jay Srinivasan
CS 1308 Computer Literacy and the Internet. Introduction  Von Neumann computer  “Naked machine”  Hardware without any helpful user-oriented features.
IBM Systems & Technology Group LoadLeveler 3.3 Dr. Roland Kunz, IT Specialist l.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
Chapter 41 Processes Chapter 4. 2 Processes  Multiprogramming operating systems are built around the concept of process (also called task).  A process.
Grid Computing I CONDOR.
Intermediate Condor Rob Quick Open Science Grid HTC - Indiana University.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 3: Operating-System Structures System Components Operating System Services.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid job submission using HTCondor Andrew Lahiff.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.
© 2005 IBM MPI Louisiana Tech University Ruston, Louisiana Charles Grassl IBM January, 2006.
Using hpc Instructor : Seung Hun An, DCS Lab, School of EECSE, Seoul National University.
Operating System Structure A key concept of operating systems is multiprogramming. –Goal of multiprogramming is to efficiently utilize all of the computing.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
CIS250 OPERATING SYSTEMS Chapter One Introduction.
1 HPCI Presentation Kulathep Charoenpornwattana. March 12, Outline Parallel programming with MPI Running MPI applications on Azul & Itanium Running.
Chapter 2 Process Management. 2 Objectives After finish this chapter, you will understand: the concept of a process. the process life cycle. process states.
Portable Batch System – Definition and 3 Primary Roles Definition: PBS is a distributed workload management system. It handles the management and monitoring.
CSC414 “Introduction to UNIX/ Linux” Lecture 3
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
GangLL Gang Scheduling on the IBM SP Andy B. Yoo and Morris A. Jette Lawrence Livermore National Laboratory.
Copyright © Curt Hill More on Operating Systems Continuation of Introduction.
An operating system (OS) is a collection of system programs that together control the operation of a computer system.
Debugging Lab Antonio Gómez-Iglesias Texas Advanced Computing Center.
Active-HDL Server Farm Course 11. All materials updated on: September 30, 2004 Outline 1.Introduction 2.Advantages 3.Requirements 4.Installation 5.Architecture.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Operating Systems Overview: Using Hardware.
Processes and threads.
OpenPBS – Distributed Workload Management System
The Scheduling Strategy and Experience of IHEP HTCondor Cluster
Integration of Singularity With Makeflow
Basic Grid Projects – Condor (Part I)
Introduction to High Throughput Computing and HTCondor
Processes Hank Levy 1.
HTCondor Training Florentia Protopsalti IT-CM-IS 1/16/2019.
Process Description and Control
Introduction to OS (concept, evolution, some keywords)
The performance of NAMD on a large Power4 system
Introduction to OS (concept, evolution, some keywords)
Presentation transcript:

J. Skovira 5/05 v11 Introduction to IBM LoadLeveler Batch Scheduling System

J. Skovira 5/05 v12 Agenda l Batch Scheduling Basics l LoadLeveler basics l LoadLeveler configuration Basic Commands l Job Submission l Job cancellation l Job monitoring l Job command files l Advanced Functions l Questions and Answers

J. Skovira 5/05 v13 Who Needs a Job Scheduler? Single Machine Job 1 Job 2 …. Job N HPC Machine OS multi-tasks single CPU: time-shared scheduling User 1: Job 1 Job 2 …. Job N User 2: Job 1 Job 2 …. Job N User 3: Job 1 Job 2 …. Job N Parallel Dimension Many Machines and Users: More Jobs Parallel Dimension User may impact a distant job Scheduler runs jobs according to: Scheduling Theory Site-defined Policy

J. Skovira 5/05 v14 Scheduling Terms HPC Cluster Resource manager Scheduler Start jobs on specific resources at specific times Job Queue Job 1 Job 2 Job 3 …. Batch Scheduler

J. Skovira 5/05 v15 More Tasks for User? Job Command File is a small set of job directives Job Command files can be “borrowed” from samples Simple Command files take predefined defaults Experienced users may enhance command files Application Code Job Meta Data Once control is handed to the job, scheduler is out of the way

J. Skovira 5/05 v16 LoadLeveler Components Loadleveler Central Manager Negotiator Daemon IBM Cluster Worker Nodes Startd daemon Schedd Machine High Performance Switch

J. Skovira 5/05 v17 LoadLeveler Components

J. Skovira 5/05 v18 Priority and Scheduling Jobs arrive: from different users at different time in different job classes with different priorities Job A 82 Job B121 Job C101 Job D 41 Job E 45 JobE JobA JobC JobD JobB Loadleveler sorts the job queue Loadleveler schedules the jobs in queue order

J. Skovira 5/05 v19 Reservation vs Backfill Reservation (standard) Scheduling Top job waits a short time for resources to free Defer if not available Backfill Top job starts if it can If not enough resources, compute when available which resources job will use Backfill jobs onto available nodes Backfill superior for parallel machines

J. Skovira 5/05 v110 Backfill Job Queue JobNodesTime Job A 82 Job B121 Job C101 Job D 41 Job E 45

J. Skovira 5/05 v111 Backfill Job Queue JobNodesTime Job A 82 Job B121 Job C101 Job D 41 Job E 45

J. Skovira 5/05 v112 Job Command File Basics Command file contains job “directives” Basic items include: Shell Class Input/output directories Notification control Queue keyword 2 ways to specify job executable: Executable keyword Script invocation after the keyword Application Code Job Command File

J. Skovira 5/05 v113 Basic Job Command File #!/bin/ksh class = demo queue perlspin2 > /tmp

J. Skovira 5/05 v114 More Job Command File Keywords Requirements allow you to select: I/O directives Node requirements Wallclock limit Locally defined requirements Etc… notification controls what LL sends about the job From never to always notify_user tells LL where to send job info An address

J. Skovira 5/05 v115 Serial Job Command File #!/bin/ksh error =./out/job2.$(jobid).err output =./out/job2.$(jobid).out wall_clock_limit = 180 class = demo notification = complete notify_user = queue perlspin2

J. Skovira 5/05 v116 Communication on the System Each node has a connection to the high-performance switch There are 2 ways to use the switch ip mode "unlimited" channels slower communication performance User space mode limited number of channels faster than ip mode Can be selected in job command file

J. Skovira 5/05 v117 Parallel Job Command File Keywords node How many nodes your job requires tasks_per_node How many tasks will run on each node network How your job will communicate wall_clock_limit An estimate of how long your job runs

J. Skovira 5/05 v118 The Network Keyword network.protocol = network_type, usage, mode protocol: MPI, LAPI, PVM network_type: sn_single or sn_all for switch adapter usage: shared or not_shared mode: IP, US An example: network.MPI = sn_single, shared, us

J. Skovira 5/05 v119 Parallel Job Command File #!/bin/ksh job_type = parallel node = 1 tasks_per_node = 4 error =./out/job3.$(jobid).err output =./out/job3.$(jobid).out wall_clock_limit = 05:00 class = demo notification = complete notify_user = network.MPI = sn_all,shared,us queue poe perlspin2

J. Skovira 5/05 v120 Basic Loadleveler Commands llsubmit – submits a job to Loadleveler llcancel – cancels a submitted job llq – queries the status of jobs in the job queue llstatus – queries the status of machines in the cluster

J. Skovira 5/05 v121 llq example v01n08:/u/skoviraj $ llsubmit mybasic.cmd llsubmit: The job "v01n08.vendor.pok.ibm.com.205" has been submitted Id Owner Submitted ST PRI Class Running On v01n skoviraj 11/11 22:29 R 50 No_Class v01n02 v01n skoviraj 11/11 22:30 R 50 No_Class v01n02 v01n skoviraj 11/11 22:28 I 50 No_class 3 job steps in queue, 1 waiting, 0 pending, 2 running, 0 held v01n08:/u/skoviraj $ llq

J. Skovira 5/05 v122 llstatus example v01n08:/u/skoviraj/suspender1.0/suspender_stuff $ llstatus v01n02 Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys v01n02.vendor.pok.ibm.com Avail 0 0 Run R6000 AIX43 v01n08:/u/skoviraj/suspender1.0/suspender_stuff $ llstatus | more Name Schedd InQ Act Startd Run LdAvg Idle Arch OpSys v01n01.vendor.pok.ibm.com Avail 0 0 Idle R6000 AIX43 v01n02.vendor.pok.ibm.com Avail 0 0 Run R6000 AIX43 v01n03.vendor.pok.ibm.com Avail 0 0 Idle R6000 AIX43 v01n04.vendor.pok.ibm.com Avail 0 0 Idle R6000 AIX43 v01n05.vendor.pok.ibm.com Avail 0 0 Idle R6000 AIX43 v01n06.vendor.pok.ibm.com Avail 0 0 Idle R6000 AIX43 v01n07.vendor.pok.ibm.com Avail 1 0 Idle R6000 AIX43 v01n08.vendor.pok.ibm.com Avail 1 0 Idle R6000 AIX43 v01n09.vendor.pok.ibm.com Avail 0 0 Idle R6000 AIX43

J. Skovira 5/05 v123 llctl Examples llctl -h hostname command Useful Commands: reconfig - Forces all daemons to reread the configuration files. start - Starts the LoadLeveler daemons on the specified machine. stop - Stops the LoadLeveler daemons on the specified machine. Commands sometimes used: flush - Terminates running jobs on this machine, places jobs in idle recycle - Stops all LoadLeveler daemons and restarts them.

J. Skovira 5/05 v124 llctl Example drain [schedd|startd [classlist |allclasses]] With no options: (1) no more LoadLeveler jobs can begin running on this machine, (2) no more LoadLeveler jobs can be submitted through this machine. When you issue drain schedd, the following happens: (1) the schedd machine accepts no more LoadLeveler jobs for submission. (2) jobs in the Starting or Running state in the queue are allowed to continue running. (3) jobs in the Idle state in the schedd queue are drained When you issue drain startd, the following happens: (1) the startd machine accepts no more LoadLeveler jobs to be run (2) jobs already running on the startd machine are allowed to complete.

J. Skovira 5/05 v125 More Loadleveler Commands llclass - returns information about available classes llprio - changes the user priority of a job step

J. Skovira 5/05 v126 llclass Example v60n129:/u/skoviraj $ llclass -l X_Class =============== Class X_Class =============== Name: X_Class Priority: 0 Exclude_Users: Include_Users: Exclude_Groups: Include_Groups: Admin: NQS_class: F NQS_submit: NQS_query: Max_processors: -1 Maxjobs: -1 Resource_requirement: Class_comment: Class_ckpt_dir: Ckpt_limit: undefined, undefined Wall_clock_limit: 11+13:46:39, 11+13:46:39 ( seconds, seconds) Job_cpu_limit: undefined, undefined … v60n129:/u/skoviraj $ llclass Name MaxJobCPU MaxProcCPU Free Max Description d+hh:mm:ss d+hh:mm:ss Slots Slots inter_class undefined undefined X_Class undefined undefined

J. Skovira 5/05 v127 llprio Example v01n07:/u/skoviraj/suspender1.0/suspender_stuff $ llq Id Owner Submitted ST PRI Class Running On v01n skoviraj 11/11 22:51 I 50 No_class 1 job steps in queue, 1 waiting, 0 pending, 0 running, 0 held v01n07:/u/skoviraj/suspender1.0/suspender_stuff $ llprio -p 100 v01n llprio: Priority command has been sent to the central manager. v01n07:/u/skoviraj/suspender1.0/suspender_stuff $ llq Id Owner Submitted ST PRI Class Running On v01n skoviraj 11/11 22:51 I 100 No_class 1 job steps in queue, 1 waiting, 0 pending, 0 running, 0 held

J. Skovira 5/05 v128 Advanced Topics Job Preemption Job Checkpointing Submit filter Loadleveler APIs (data access, scheduling) Workload Manager (WLM) integration Advance Reservation Consumable resource control

J. Skovira 5/05 v129 Job Suspension 4 way restarts 16 way job runs 4 Node job runs 4 Node suspended 16 way job completes

J. Skovira 5/05 v130 Job Checkpoint 4 way restarts from saved state 16 way job runs 4 Node job runs 4 Node Checkpoints and ends 16 way job completes 4 Node job state saved GPFS

J. Skovira 5/05 v131 Submit Filter $NetKey = FALSE; while ( ) { chomp($value = $_); if ( $value =~ /network/ ) { # If we find the network keyword.... $NetKey = TRUE; # remember it! } if ( $value =~ /queue/ ) { # If at the end of LL keywords for this job step... if ( $NetKey eq FALSE ) { # if No network keyword... # Add one which uses the switch print network.MPI = sn_all,not_shared,US\n" } $NetKey = FALSE; # Reset network keyword memory } print "$value\n"; # Copy a single ll cmd file line to new cmd file }

J. Skovira 5/05 v132 Tips for Efficient Job Processing Assumptions: One task per CPU Classes Configured Get your job to the TOP of the queue: Short run Small number of nodes Use ip communication over the switch Priority? Submit during low use periods (evening) These are FREE! all above tips (except priority) will impact no other job

J. Skovira 5/05 v133 More Tips for Efficient Job Processing Allow your job to run as QUICKLY as possible: Balance node operations Keep data entirely in physical memory Use processors of similar types (system admin?) Use distributed data load and store Profile your applications for efficient compiler use This could be an entirely new presentation!

J. Skovira 5/05 v134 Questions and Answers