Batch Systems P. Nilsson, PROOF Meeting, October 18, 2005.

Slides:



Advertisements
Similar presentations
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
Advertisements

Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
SLA-Oriented Resource Provisioning for Cloud Computing
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Copyright 2007, Information Builders. Slide 1 Workload Distribution for the Enterprise Mark Nesson, Vashti Ragoonath June, 2008.
Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.
Asynchronous Solution Appendix Eleven. Training Manual Asynchronous Solution August 26, 2005 Inventory # A11-2 Chapter Overview In this chapter,
16: Distributed Systems1 DISTRIBUTED SYSTEM STRUCTURES NETWORK OPERATING SYSTEMS The users are aware of the physical structure of the network. Each site.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
EstiNet Network Simulator & Emulator 2014/06/ 尉遲仲涵.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.
Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
1 Lecture 4: Threads Operating System Fall Contents Overview: Processes & Threads Benefits of Threads Thread State and Operations User Thread.
Silberschatz, Galvin and Gagne ©2011Operating System Concepts Essentials – 8 th Edition Chapter 4: Threads.
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
High Performance Computing: Concepts, Methods & Means Scheduling Chirag Dekate Department of Computer Science Louisiana State University March 20 th, 2007.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Grid Computing I CONDOR.
Transparent Process Migration: Design Alternatives and the Sprite Implementation Fred Douglis and John Ousterhout.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
Computer Emergency Notification System (CENS)
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Copyright © George Coulouris, Jean Dollimore, Tim Kindberg This material is made available for private study and for direct.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
CERN - IT Department CH-1211 Genève 23 Switzerland Castor External Operation Face-to-Face Meeting, CNAF, October 29-31, 2007 CASTOR2 Disk.
CE Operating Systems Lecture 3 Overview of OS functions and structure.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Derek Wright Computer Sciences Department University of Wisconsin-Madison MPI Scheduling in Condor: An.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.
Experimental Comparative Study of Job Management Systems George Washington University George Mason University
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition Chapter 4: Threads Modified from the slides of the text book. TY, Sept 2010.
Chapter 4: Threads. 4.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts – 7 th edition, Jan 23, 2005 Chapter 4: Threads Overview Multithreading.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 4: Threads.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
CSC414 “Introduction to UNIX/ Linux” Lecture 3
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.
Silberschatz, Galvin and Gagne ©2009Operating System Concepts – 8 th Edition Chapter 4: Threads.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
Introduction to threads
Chapter 1: Introduction
OpenPBS – Distributed Workload Management System
Process Management Presented By Aditya Gupta Assistant Professor
1. 2 VIRTUAL MACHINES By: Satya Prasanna Mallick Reg.No
Chapter 2: System Structures
Modified by H. Schulzrinne 02/15/10 Chapter 4: Threads.
Basic Grid Projects – Condor (Part I)
Sun Grid Engine.
Presentation transcript:

Batch Systems P. Nilsson, PROOF Meeting, October 18, 2005

Overview Sun’s Grid Engine [Open Source] What is the Grid Engine project? Daemon components Execution daemon Priorities Job life cycle Platform LSF [Commercial system] What is Platform LSF? CERN Batch Service Product features Architecture Load sharing Other Batch Systems OpenPBS, Condor, BQS, … Maui Scheduler [Open Source] October 18, 2005 Paul Nilsson

Sun’s Grid Engine What is the Grid Engine project? “An open source community effort to facilitate the adoption of distributed computing solutions”, sponsored by Sun The project provides distributed resource management software for wide ranging requirements from compute farms to grid computing The Grid Engine has been ported to many operating systems, including Sun Solaris, Linux, SGI IRIX, Compaq/HP Tru64, IBM AIX, HP HP/UX, Apple Mac OS/X and others. The project welcomes those who are interested in implementing new ports or in taking over the maintenance of an existing port Good documentation… More information at http://gridengine.sunsource.net/ October 18, 2005 Paul Nilsson

Grid Engine Components Sun’s Grid Engine Grid Engine Components Qmaster [“queue master”] controls the overall behavior in a cluster, responsible for answering requests from clients and for delivering dispatched jobs to the assigned Execd's Schedd [“scheduling daemon”] gets notified about all scheduling relevant information. Resulting scheduling decisions are sent as orders to Qmaster Execd [“execution daemon”] provides Qmaster with information about utilization and availability of resources. A job sent to Execd is started by writing all relevant information into files describing the job and forking a “Shepherd”. After Shepherd’s termination, Execd reports back to Qmaster Shepherd starts all kinds of jobs according to what he finds in the per-job configuration files written by Execd Commd [“communication daemon”] handles network communication in a cluster Shadowd [“shadow daemon”] detects failures of the Qmaster and starts a new Qmaster if necessary October 18, 2005 Paul Nilsson

Execd – The Execution Daemon Sun’s Grid Engine Execd – The Execution Daemon The Execution Daemon is the instance that: Starts jobs Controls jobs (e.g. it can suspend/unsuspend a job, reprioritize the processes associated with a job, etc) Gathers information about jobs (e.g. resource usage, exit code, etc) Gathers information about the execution host it controls (e.g. load, free memory, etc) There is one execd on each host of a cluster October 18, 2005 Paul Nilsson

Much more information about the scheduler is in the documentation… Sun’s Grid Engine On Priorities… The Grid Engine has the feature of a share-based scheduler, where each job gets a certain share of the system resources The sum of all shares for a job is expressed in tickets. A job has a certain number of tickets enabling it to run with certain process priorities If multiple jobs are running concurrently on a host, their different share of system resources (their different number of tickets) can be mapped to priorities in the OS Setting priorities in the OS is done by either setting the nice value for all processes of a job or by using special priority mapping facilities provided by the OS The Grid Engine reassigns the number of tickets per job in a regular interval. It then maps the number of tickets of a job to nice values (or another operating system priority representation) and renices all processes of the job Much more information about the scheduler is in the documentation… October 18, 2005 Paul Nilsson

Job Life Cycle Execds report load information to Qmaster Sun’s Grid Engine Job Life Cycle Execds report load information to Qmaster User submits job using qsub command Qmaster notifies Schedd about new job Schedd dispatches job to an Execd Qmaster delivers job to Execd; Execd starts job using Shepherd At job end Execd notifies Qmaster about job finish Qmaster feeds jobs resource consumption in accounting database Execd1 Execd2 ExecdN 1 1 1 5 4 6 Schedd Qmaster 3 7 2 qsub October 18, 2005 Paul Nilsson

Platform LSF What is Platform LSF? CERN Batch Service “Platform LSF (Load Sharing Facility) is a [commercial] workload management solution that optimizes the use of enterprise-wide resources by providing transparent, on demand access to valuable computing resources” CERN Batch Service CERN Batch Service provides an “LSF” farm with 1500 dual-processor machines for data analysis and simulation (used CPU time is accounted to the experiments!!!) LXPLUS is used for the public logon (i.e. for the job submits) Depending on the load and resource requirements of jobs up to 3 jobs are running in parallel on the same node More information at http://batch.web.cern.ch/batch/ http://www.platform.com/Products/Platform.LSF.Family/Platform.LSF http://www.hp.com/techservers/software/lsf.html October 18, 2005 Paul Nilsson

Platform LSF Product features Dynamic Load Balancing by continuously monitoring of system resources; CPU and memory usage, swap space, software license availability (!) Resource-based Queuing and Scheduling. Resources are dynamically managed based on policies, schedules and thresholds; jobs submitted to any network-based queues are automatically processed as resources become available Optimal Resource Sharing. Continuous resource management, even in the event of host failures: failed jobs are automatically re-run and failed servers restarted Administrative Control and Policies. Admins can suspend, stop, and submit jobs from any node in a network; users can modify their own jobs once submitted to queues. Varied options are available for configuring workload policies, supporting resource sharing by users, user groups, and projects October 18, 2005 Paul Nilsson

Load Sharing Load index Description Platform LSF Load Sharing To achieve load sharing, LSF must have up to date information about the load on each machine in a cluster. The Load Information Manager (LIM) component is responsible for this. A LIM daemon runs on each host of the cluster. It gathers information about its host and makes the information available to all hosts. The information is organized as a load vector. The load vector comprises a number of load indices as described in the following table Load index Description r15s Load average for last 15 seconds r1m Load average for last minute exponentially averaged CPU run queue lengths r15m Load average for last 15 minutes ut Percent CPU utilization averaged over last minute pg Paging (in/out) activity over last 20 seconds ls Number of login sessions it Idle time - number of minutes since last keyboard or mouse activity tmp Available space (MB) in /tmp file system swp Available swap space (MB) mem Available real memory (MB) October 18, 2005 Paul Nilsson

Job Life Cycle 1 Platform LSF User submits a job to LSF for execution The submitted job proceeds through the batch library to the Load Information Manager (LIM) LIM communicates the job's information to the cluster's master LIM. Periodically, the LIM on individual machines gathers its 12 built-in load indices and forwards this information to the master LIM [see previous slide] The master LIM determines the best host to run the job and sends this information back to the submission host's LIM (Information about the chosen execution host is passed through the batch library) Information about the host to execute the job is passed back to the bsub process or lsb_submit() function To enter the batch system, bsub or lsb_submit() sends the job to the batch library Using batch library services, the job is sent to the mbatchd running on the cluster's master host 1 October 18, 2005 Paul Nilsson

Platform LSF Job Life Cycle The mbatchd puts the job in an appropriate queue and waits for the appropriate time to dispatch the job. User jobs are held in batch queues by mbatchd, which checks the load information on all candidate hosts periodically The mbatchd dispatches the job when an execution host with the necessary resources becomes available where it is received by the host's sbatchd sbatchd controls the execution of the job and reports the job's status to mbatchd. The sbatchd creates a child sbatchd to handle job execution The child sbatchd sends the job to the Remote Execution Server (RES) The RES creates the execution environment to run the job The job is run in the execution environment The results of the job are sent to the email system The email system sends the job's results to the user 2 October 18, 2005 Paul Nilsson

Other Batch Systems What other batch systems are on the market? OpenPBS - Open source Portable Batch System [Unsupported version – development stopped in 1999]. “Flexible batch queuing system developed for NASA in the early to mid-1990s. It operates on networked, multi-platform UNIX environments”. Developed into a commercial PBS Pro version (http://www.openpbs.org, compilation requires hacker intervention..). Public home: http://www-unix.mcs.anl.gov/openpbs Condor – “Specialized workload management system for compute-intensive jobs. Like other full-featured batch systems, Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Users submit their serial or parallel jobs to Condor, Condor places them into a queue, chooses when and where to run the jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion” (http://www.cs.wisc.edu/condor) BQS - Batch Queuing System. Some information at CC-IN2P3 web site (http://webcc.in2p3.fr/man/bqs/intro) … October 18, 2005 Paul Nilsson

Maui Scheduler Open source batch queuing and scheduling software designed to schedule parallel jobs Maui can schedule the order of job execution for queued jobs (from other batch systems) Has lots of scheduling concepts; FIFO (First-in first-out) like reservations, back-filling of jobs, job priorities, time-of-day scheduling, etc Written in Java Maui Scheduler has been designed to communicate directly with a database through an abstraction layer BUT currently MySQL is the only one implemented [MySQL is required] Considered for CASTOR2 (until LSF was chosen) What is “backfilling”? Maui allows a lower priority job to be executed before a higher priority job if it does not delay the start of the prioritized job [apparently not found in other schedulers] More information at http://mauischeduler.sourceforge.net October 18, 2005 Paul Nilsson