1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004

Slides:



Advertisements
Similar presentations
Tivoli Software from IBM Storage Resource Management Webcast
Advertisements

The Moab Grid Suite CSS´ 06 – Bonn – July 28, 2006.
Multiple Processor Systems
Copyright © 2010, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
SLA-Oriented Resource Provisioning for Cloud Computing
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Job Submission.
Introduction CSCI 444/544 Operating Systems Fall 2008.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
6/2/20071 Grid Computing Sun Grid Engine (SGE) Manoj Katwal.
Microsoft Virtual Server 2005 Product Overview Mikael Nyström – TrueSec AB MVP Windows Server – Setup/Deployment Mikael Nyström – TrueSec AB MVP Windows.
NetFlow Analyzer Drilldown to the root-QoS Product Overview.
Workload Management Massimo Sgaravatto INFN Padova.
VMware vCenter Server Module 4.
Grid Computing Meets the Database Chris Smith Platform Computing Session #
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Chapter 2 Operating System Overview Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
Computer System Architectures Computer System Software
Operating System A program that controls the execution of application programs An interface between applications and hardware 1.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
Gilbert Thomas Grid Computing & Sun Grid Engine “Basic Concepts”
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Tools and Utilities for parallel and serial codes in ENEA-GRID environment CRESCO Project: Salvatore Raia SubProject I.2 C.R. ENEA-Portici. 11/12/2007.
1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – April 14, 2004
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
◦ What is an Operating System? What is an Operating System? ◦ Operating System Objectives Operating System Objectives ◦ Services Provided by the Operating.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Lecture 3 Process Concepts. What is a Process? A process is the dynamic execution context of an executing program. Several processes may run concurrently,
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
Managing in Multiple Operating System Environments OS administration in an hp-ux and Linux environment Steeve Daigle, HP & Steve Cooke, HP.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
When the Grid Comes to Town Chris Smith, Senior Product Architect Platform Computing
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
Headline in Arial Bold 30pt HPC User Forum, April 2008 John Hesterberg HPC OS Directions and Requirements.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
1 Alexandru V Staicu 1, Jacek R. Radzikowski 1 Kris Gaj 1, Nikitas Alexandridis 2, Tarek El-Ghazawi 2 1 George Mason University 2 George Washington University.
Scalable Systems Software for Terascale Computer Centers Coordinator: Al Geist Participating Organizations ORNL ANL LBNL.
Experimental Comparative Study of Job Management Systems George Washington University George Mason University
Cluster Software Overview
Faucets Queuing System Presented by, Sameer Kumar.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
Copyright © 2012, SAS Institute Inc. All rights reserved. SAS ® GRID AT PHAC SAS OTTAWA PLATFORM USERS SOCIETY, NOVEMBER 2012.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Batch Systems P. Nilsson, PROOF Meeting, October 18, 2005.
Virtualization Vitalis Konopelec Technology Solution Professional Microsoft Slovakia s.r.o.
Open Source and Business Issues © 2004 Northrop Grumman Corp. All rights reserved. 1 Grid and Beowulf : A Look into the Near Future NorthNorth F-18C Weapons.
INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.
1 Platform LSF6 What’s new in LSF6
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Workload Management Workpackage
Organizations Are Embracing New Opportunities
OpenPBS – Distributed Workload Management System
OpenMosix, Open SSI, and LinuxPMI
Introduction to Operating System (OS)
CRESCO Project: Salvatore Raia
Management of Virtual Execution Environments 3 June 2008
Chapter 1: Introduction
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Chapter 2: System Structures
Example of usage in Micron Italy (MIT)
CLUSTER COMPUTING.
Faucets: Efficient Utilization of Multiple Clusters
Building and running HPC apps in Windows Azure
Chapter 2 Operating System Overview
Lecture Topics: 11/1 Hand back midterms
Presentation transcript:

1 Integrated Workload Management for Beowulf Clusters Bill DeSalvo – August 18, 2004

© Platform Computing Inc What We’ll Cover Platform LSF Family of Products What is Platform LSF HPC Key Features & Benefits How it Works Q&A

© Platform Computing Inc Platform’s Grid Solution Architecture

© Platform Computing Inc Technical Computing Product Family

© Platform Computing Inc Platform LSF Family of Products Platform LSF Intelligent, policy-driven batch application workload processing Manage & accelerate batch workloads for compute- and data-intensive applications Platform LSF HPC Intelligent, policy-driven high performance computing (HPC) workload processing Manage & accelerate High Performance Computing (HPC) mission-critical workload Platform LSF MultiCluster Intelligent, policy-driven batch application workload processing across multiple Platform LSF clusters Share between autonomously managed departments or organizations spanning geographical locations Complimentary Products Platform LSF License Scheduler Intelligent, policy-driven application license optimization for Platform LSF clusters Optimize the usage of all application licenses based on an organization’s established distribution policy Platform LSF Analytics Intelligent delivery of precise information for better project decisions Better co-ordinate projects, estimate project completion times and provision resources more accurately Platform LSF Reports Intelligent cluster operation reporting for Platform LSF clusters Visibility into cluster utilization

© Platform Computing Inc What Problems Are We Solving? Solve large, grand challenge, complex problems by optimizing the placement of workload in High Performance Computing environments

© Platform Computing Inc Platform LSF HPC Intelligent, policy-driven high performance computing (HPC) workload processing Parallel & sequential batch workload management for High Performance Computing (HPC) Includes patent-pending topology-based scheduling Intelligently schedules parallel batch jobs Virtualizes resources Prioritizes service levels based on policies Based on Platform LSF: Standards-based, OGSI-compliant, grid-enabled solution Commercial production quality product

© Platform Computing Inc Platform Customers

© Platform Computing Inc Platform Customers

© Platform Computing Inc Platform Customers

© Platform Computing Inc Platform LSF HPC Platform LSF HPC AlphaServer SC Platform LSF HPC for IBM Platform LSF HPC for Linux Platform LSF HPC for SGI Platform LSF HPC for Cray

© Platform Computing Inc Extensive Hardware Support HP HP AlphaServer SC HP XC HP Superdome HP-UX 11i SGI SGI IRIX SGI TRIX SGI Altix, SGI Propack IBM IBM RS/6000 AIX IBM SP2/SP3 Linux IA-64 systens with RedHat Intel, AMD 32-bit systems with LINUX kernel Sun SUN Solaris High Performance Interconnects Myrinet with GM Quadrics QsNet SGI Numa Flex SGI NumaLink IBM SP Switch

© Platform Computing Inc Platform LSF HPC – Linux Support HP HP XC Systems running Unlimited Linux HP Itanium 2 systems running LINUX 2.4.x kernel, glibc 2.2 with RMS on Quadrics QsNet/Elan3 HP Alpha/AXP systems running LINUX 2.4.x kernel, glibc 2.2.x with RMS on Quadrics QsNet/Elan3 Linux IA-64 systems, Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 7.3 x86 systems: Kernel 2.2.x, compiled with glibc 2.1.x, tested on Debian 2.2, OpenLinux 2.4, RedHat 6.2 and 7.0, SuSE 6.4 and 7.0, TurboLinux 6.1 Kernel 2.4.x, compiled with glibc 2.1.x, tested on RedHat 7.x and 8.0, and SuSE 7.0, and RedHat Linux Advanced Server 2.1 Clustermatic Linux 3.0 Kernel 2.4.x, compiled with glibc 2.2.x, tested on RedHat 8.0 Scyld Linux, Kernel 2.4.x, compiled with glibc 2.2.x. SGI SGI Altix systems running Linux Kernel 2.4.x compiled with glibc 2.2.x and SGI Propack 2.2 and higher

Key Features and Benefits Platform LSF HPC

© Platform Computing Inc Key Features Optimized Application, System and Hardware Performance Enhanced Accounting, Auditing & Control Commercial Grade System Scalability & Reliability Extensive Hardware Support Comprehensive, Extensible and Standards-based Security

© Platform Computing Inc Key Features – Platform LSF HPC Optimized Application, System and Hardware Performance Enhanced Accounting, Auditing & Control Commercial Grade System Scalability & Reliability Comprehensive, Extensible and Standards-based Security

© Platform Computing Inc Adaptive Interconnect Performance Optimization Scheduling that takes advantage of unique interconnect properties IBM SP Switch at the POE software level RMS on AlphaServer SC (Quadrics) SGI topology hardware graph Out-of-the-box functionality without any customization required

© Platform Computing Inc Generic Parallel Job Launcher Generic support for all different types of Parallel Job Launchers LAMMPI, MPICH-GM, MPICH-P4, POE, SCALI, CHAMPION PRO, etc Customizable for any vendor or publicly available parallel solution Control - ensuring no jobs can escape the workload management system

© Platform Computing Inc HPC Workload Scheduling Dynamic load balancing supporting heterogeneous workloads IBM SP switch aware scheduling Scheduling of parallel jobs Number of CPUs, min/max, node span Backfill on processor & memory Processor & memory reservation Topology aware scheduling Exclusive scheduling Advance Reservation Fairshare, Preemption Accounting

© Platform Computing Inc Intelligent Scheduling Policies Fairshare (User & Project-based) Ensure job resources are used for the right work Guarantees resource allocation among users and projects are met Co-ordinate access to the right number of resources for different users and projects according to pre-defined shares Differentiation Hierarchal & guaranteed Policy-based Preemption Maximizes throughput of high priority critical work based on priority and load conditions Prevents starvation of lower priority work Differentiation Platform LSF supports multiple preemption policies Goal-oriented SLA driven policies Based on customer SLA driven goals: Deadline, Velocity, Throughput Guarantees projects are completed on time Reduces projects and administration costs Provides visibility into the progress of projects Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated Intelligent Scheduler Fairshare Preemption Resource Reservation Advance Reservation SLA Scheduling Service Level Agreement MultiCluster Other Scheduling Modules Plugin Schedulers License Scheduling

© Platform Computing Inc Advanced Self-Management Flexible, Comprehensive Resource Definitions Resources defined on a node basis across an entire cluster or subset of the nodes in a cluster Auto-detectable or user defined resources Adaptive membership – nodes join and leave Platform LSF clusters dynamically and automatically without administration effort Dynamic or static resources Job Level Exception Management Exception-based error detection to take automatic, configurable, corrective actions Increased job reliability & predictability Improved visibility on job and system errors & reduced administration overhead and costs Automatic Job Migration and Requeue Automatically migrate and requeue jobs based on policies in the event of host or network failures Reduce user and administrator overhead in managing failures & reduce risk of running critical workloads Master Scheduler Failover Automatically fail over to another host if the master host is unavailable Continuous scheduling service and execution of jobs & eliminate manual intervention

© Platform Computing Inc Backfill Policy configured at the queue level and applies to all jobs in a queue Smaller sequential jobs are ‘backfilled’ behind larger parallel jobs Improves hardware utilization Users provided with an accurate time when their job will start

Key New Feature & Benefits Platform LSF V6.0

© Platform Computing Inc Feature Overview OGSI Compliance Goal-Oriented SLA-Driven Scheduling License-Aware Scheduling Job-Level Exception Management (Self Management Enhancement) Job Group Support Other Scheduling Enhancements Queue-Based Fairshare User Fairshare by Queue Priority Job Starvation Prevention plug-in

© Platform Computing Inc Feature Overview (Cont.) HPC Enhancements Dynamic ptile Enforcement Resource Requirement Specification for Advance Reservation Thread Limit Enforcement General Parallel Support Parallel Job Size Scheduling Job Limit Enhancements Non-normalized Job Run Limit Resource Allocation Limit Display Administration and Diagnostics Scheduler Dynamic Debug Administrator Action Messages

© Platform Computing Inc Goal-Oriented SLA-Driven Scheduling What is it? A new scheduling paradigm. Unlike current scheduling policies based on configured shares or limits, SLA-driven scheduling is based on customer provided goals: Deadline based goal: Specify the deadline for a group of jobs. Velocity based goal: Specify the number of jobs running at any one time. Throughput based goal: Specify the number of finished jobs per hour. This scheduling policy works on top of queues and host partitions. Benefits Guarantees projects are completed on time according to explicit SLA definitions. Provides visibility into the progress of projects to see how well projects are tracking to SLAs Allows the admin focus on “What work and When” needs to be done, not “how” the resources are to be allocated. Guarantees service level deliveries to the user community, reduces the risks of projects and administration cost.

© Platform Computing Inc User case Problem: we need to finish all simulation jobs before 15:00pm. Solution: Configure a deadline service class in lsb.serviceclasses file. Begin ServiceClass NAME=simulation PRIORITY=100 GOALS = [deadline timeWindow (13:00 – 15:00)] DESCRIPTION = A simple deadline demo End ServiceClass Submitting and monitoring jobs $bsub –sla simulation –W 10 –J A[1-50] mySimulation $date;bsla Wed Aug 20 14:00:16 EDT 2003 SERVICE_CLASS_NAME: simulation GOAL: DEADLINE ACTIVE_WINDOW: (13:00 – 15:00) STATUS: Active:Ontime DEAD_LINE: (Wed Aug 20 15:00) ESTIMATED_FINISH_TIME: (Wed Aug 20 14:30) Optimum Number of Running Jobs: 5 NJOBS PEND RUN SSUSP USUSP FINISH

© Platform Computing Inc Job-Level Exception Management (Self Management Enhancement) What is it? Platform LSF can monitor the exception behavior and take action accordingly. Benefits Increased reliability of job execution Improved visibility on job and system errors Reduced administration overhead and costs How it works Platform LSF V6 handles following exceptions: “Job eating” machine (or “black-hole” machine): for some reason, jobs keep exiting abnormally on a machine (e.g. no processes, mount daemon dies, etc.) Job underrun (job run time less than configured minimum time) Job overrun (job run time more than configured maximum time) Job run idle (job run without cpu usage increasing).

© Platform Computing Inc Job Starvation Prevention Plug-in What is it? External scheduler plug-in allows users to define their own equation for job priority Benefits Low priority work is guaranteed to run after ‘waiting’ for a specified time ensuring that the job does not wait forever (i.e. starvation). How it works By default, the scheduler provides the following calculation Job priority =A * (q_priority) *MIN(1, int(wait_time/T0)) * (B*requested_processors+MAX(C*wait_time*(1+1/run_time),D) +E*requested_memory) Where A, B, C, D, E are coefficients. T0 is the grace period. Default run_time= INFINIT Admin can define different coefficients for each queue with the following format: MANDATORY_EXTSCHED=JOBWEIGHT[A=val1; B=val2; …]

© Platform Computing Inc Resource Requirement Specification For Advance Reservation What is it? Enable users to select the hosts for advance reservation based on the resource requirement. Benefit More flexible to reserve the host slots for the mission critical job. How it works brsvadd command supports select string: brsvadd –R “select[type==LINUX]” –n 4 –u xwei –b 10:00 –e 12:00

© Platform Computing Inc Key Features – Platform LSF HPC Enhanced Accounting, Auditing & Control Optimized Application, System and Hardware Performance Commercial Grade System Scalability & Reliability Comprehensive, Extensible and Standards-based Security

© Platform Computing Inc Job Termination Reasons Accounting log with detailed audit & error information for every job in the system Indicates why a job was terminated Difference between an abnormal termination or caused by Platform LSF HPC

© Platform Computing Inc Key Features – Platform LSF HPC Optimized Application, System and Hardware Performance Enhanced Accounting, Auditing & Control Comprehensive, Extensible and Standards-based Security Commercial Grade System Scalability & Reliability

© Platform Computing Inc Enterprise Proven Running on several of the top 10 supercomputers in the world on the “TOP500” (#3,5,9,11) More than 250,000 licenses in use spanning 1,500 customer sites Scales to over 100 clusters, 200,000 CPUs and 500,000 active jobs per cluster 11+ years experience in distributed & grid computing Risk free investment – proven solution Commercial production quality

© Platform Computing Inc Key Features – Platform LSF HPC Optimized Application, System and Hardware Performance Enhanced Accounting, Auditing & Control Commercial Grade System Scalability & Reliability Comprehensive, Extensible and Standards-based Security

© Platform Computing Inc Comprehensive, Extensible, Standards-based Security Scalable scheduler architecture Multiple scheduler plug-in API support External executable support Web GUI Open source components Risk free investment – proven solution Commercial grade Scalability and flexibility as a business grows

How It Works Platform LSF HPC

© Platform Computing Inc Master Host Election Process Host 2Host N /dev/kmem Host 1 /dev/kmem SBD LIM Exchange load information Master MBD MBSCHD Master announcement Am I the master?

© Platform Computing Inc Platform LSF Daemons Host 2Host N /dev/kmem Host 1 /dev/kmem SBD LIM RES Exchange load information Master MBD MBSCHD MELIM PIM SELIM

© Platform Computing Inc Grid-enabled, Scalable Architecture Open, modular plug-in schedulers scale with the growth of your business

© Platform Computing Inc Multiple Scheduling Modules Pre- Processing Matching / Limits Order / Allocation Post- Processing Internal Module Pre- Processing Matching / Limits Order / Allocation Post- Processing... Add-on Module 1 Pre- Processing Matching / Limits Order / Allocation Post- Processing Add-on Module N Vendor specific matching policies (without changing the existing scheduler Support for external scheduler

© Platform Computing Inc Maui Integration MBD SCH_FM RMGetInfo Post-Processing Pre-processing Order jobs UIProcessClients QueueScheduleSJobs QueueScheduleRJobs QueueScheduleIJobs QueueBackFill Job, Host, Res Info Decisions and ack Sync MAUI Plugin Event Handle (wait until GO event) MAUI Scheduler

Linux-specific Solutions

© Platform Computing Inc Controlling an MPI job On a distributed system (Linux cluster) there are many problems to address: 1. Job launch across multiple nodes 2. Gather resource usage while job executes 3. Propagate signals 4. Job “clean-up” to eliminate “dangling” MPI processes 5. Comprehensive job accounting

© Platform Computing Inc Resource manager Resource manager submit mpirun a.out Jobscript “traditional” MPI sequence Job launcher Job launcher

© Platform Computing Inc Platform LSF HPC for Linux - MPICH-GM mbatchd sbatchd Job script mpirun TS res gmmpirun_wrapper a.out TS res PIM bsub a.out pam res PIM

© Platform Computing Inc Execution Host H1 PIMLIM master LIM Master Host lsblib LIMPIM bsub SBD MBDSBD SBD child pam high med hpc_queue Queues MBSCHD Submission host H2 esub elim Mpirun.ch_g m TaskStarter a.out: process 1 TaskStarter a.out: process 2 Gmmpirun_w rapper Root res LIM elim Set LSF_PJL_TYPE To mpich_gm Report resource availability Signals and rusage collection Report resource availability Hostname & pid rsh Platform LSF HPC for Linux/Myrinet - MPICH_GM Mpirun.lsf

© Platform Computing Inc Scyld Beowulf Integration Scyld Beowulf handles the systems management challenge effectively No OS to distribute / synchnronize Central point of control from master Single process space makes it appear as large SMP Platform integrates with Scyld treating cluster as SMP and allocating resources Integrate with mpirun, mpprun or bpsh to start tasks Collect resource usage from BPROC Collect load information via BPROC APIs Singe user interface across Sycld & non-Scyld env.

© Platform Computing Inc Platform LSF HPC for Linux/BProc Bproc Front-end Node PIMLIM master LIM Master Host lsblib LIMPIM bsub SBD MBD SBD high med low Queues 1A 1B 1C B 6C MBSCHD 5 Submission host Job file H3 Res SBD child –exec() res allocated nodes Computing Nodes Bpsh/mpirun User Job Processes esub Modify submission options

© Platform Computing Inc More info at:

Q & A