Jefferson Lab and the Portable Batch System Walt Akers High Performance Computing Group.

Slides:



Advertisements
Similar presentations
Libra: An Economy driven Job Scheduling System for Clusters Jahanzeb Sherwani 1, Nosheen Ali 1, Nausheen Lotia 1, Zahra Hayat 1, Rajkumar Buyya 2 1. Lahore.
Advertisements

CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Introduction CSCI 444/544 Operating Systems Fall 2008.
Workload Management Massimo Sgaravatto INFN Padova.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 8 Introduction to Printers in a Windows Server 2008 Network.
1 I/O Management in Representative Operating Systems.
Operating Systems: Principles and Practice
Distributed Systems Early Examples. Projects NOW – a Network Of Workstations University of California, Berkely Terminated about 1997 after demonstrating.
Fundamentals of Networking Discovery 1, Chapter 2 Operating Systems.
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide Configuring Resources for the Grid Jerry Perez.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Chapter 6 Operating System Support. This chapter describes how middleware is supported by the operating system facilities at the nodes of a distributed.
PCGRID ‘08 Workshop, Miami, FL April 18, 2008 Preston Smith Implementing an Industrial-Strength Academic Cyberinfrastructure at Purdue University.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
©Brooks/Cole, 2003 Chapter 7 Operating Systems. ©Brooks/Cole, 2003 Define the purpose and functions of an operating system. Understand the components.
Scientific Computing Division Juli Rew CISL User Forum May 19, 2005 Scheduler Basics.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Recall: Three I/O Methods Synchronous: Wait for I/O operation to complete. Asynchronous: Post I/O request and switch to other work. DMA (Direct Memory.
Week 5 Lecture Distributed Database Management Systems Samuel ConnSamuel Conn, Asst Professor Suggestions for using the Lecture Slides.
Processes and Threads Processes have two characteristics: – Resource ownership - process includes a virtual address space to hold the process image – Scheduling/execution.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
ATCA based LLRF system design review DESY Control servers for ATCA based LLRF system Piotr Pucyk - DESY, Warsaw University of Technology Jaroslaw.
Batch Scheduling at LeSC with Sun Grid Engine David McBride Systems Programmer London e-Science Centre Department of Computing, Imperial College.
1 University of Maryland Linger-Longer: Fine-Grain Cycle Stealing in Networks of Workstations Kyung Dong Ryu © Copyright 2000, Kyung Dong Ryu, All Rights.
Grid Workload Management Massimo Sgaravatto INFN Padova.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
Chapter 7 Operating Systems. Define the purpose and functions of an operating system. Understand the components of an operating system. Understand the.
Integrating JASMine and Auger Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
The GRID and the Linux Farm at the RCF HEPIX – Amsterdam HEPIX – Amsterdam May 19-23, 2003 May 19-23, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Tarball server (for Condor installation) Site Headnode Worker Nodes Schedd glidein - special purpose Condor pool master DB Panda Server Pilot Factory -
The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,
Scheduling in HPC Resource Management System: Queuing vs. Planning Matthias Hovestadt, Odej Kao, Alex Keller, and Achim Streit 2003 Job Scheduling Strategies.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
We will focus on operating system concepts What does it do? How is it implemented? Apply to Windows, Linux, Unix, Solaris, Mac OS X. Will discuss differences.
SAN DIEGO SUPERCOMPUTER CENTER Inca Control Infrastructure Shava Smallen Inca Workshop September 4, 2008.
Application Software System Software.
Peter Couvares Associate Researcher, Condor Team Computer Sciences Department University of Wisconsin-Madison
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Different Microprocessors Tamanna Haque Nipa Lecturer Dept. of Computer Science Stamford University Bangladesh.
SPI NIGHTLIES Alex Hodgkins. SPI nightlies  Build and test various software projects each night  Provide a nightlies summary page that displays all.
The Gateway Computational Web Portal Marlon Pierce Indiana University March 15, 2002.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Next Generation of Apache Hadoop MapReduce Owen
CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.
Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
NFV Compute Acceleration APIs and Evaluation
Workload Management Workpackage
DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S
OpenPBS – Distributed Workload Management System
OpenMosix, Open SSI, and LinuxPMI
LQCD Computing Operations
Main Memory Management
湖南大学-信息科学与工程学院-计算机与科学系
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
CPU SCHEDULING.
Chapter 2: Operating-System Structures
Wide Area Workload Management Work Package DATAGRID project
Sun Grid Engine.
Chapter 2: Operating-System Structures
OPERATING SYSTEMS MEMORY MANAGEMENT BY DR.V.R.ELANGOVAN.
Presentation transcript:

Jefferson Lab and the Portable Batch System Walt Akers High Performance Computing Group

Jefferson Lab and PBS:Motivating Factors New Computing Cluster –Alpha Based Compute Nodes –16 XP1000 Single Processor Nodes (LINPACK 5.61 GFlop/Sec) –8 UP2000 Dual Processor Nodes (LINPACK 7.48 GFlop/Sec) Heterogeneous Job Mix –Combination of Parallel and Non-Parallel Jobs –Job execution times range from a few hours to weeks –Data requirements range from minimal to several gigabytes Modest Budget –Much of our funding was from internal sources –Initial hardware expense was relatively high Expandability –Can the product be expanded from a few nodes to hundreds

Jefferson Lab and PBS:Alternative Systems PBS - Portable Batch System –Open Source Product Developed at NASA Ames Research Center DQS - Distributed Queuing System –Open Source Product Developed by SCRI at Florida State University LSF - Load Sharing Facility –Commercial Product from Platform Computing –Already Deployed by the Computer Center at Jefferson Lab Codine –Commercial Version of DQS from Gridware, Inc. Condor –A Restricted Source ‘Cycle Stealing’ Product From The University of Wisconsin Others To Numerous To Mention

Jefferson Lab and PBS:Why We Chose PBS? Portability –The PBS distribution compiled and ran immediately on both the 64 bit Alpha and 32 bit Intel platforms. Documentation –PBS comes with comprehensive documentation including an Administrators Guide, External Reference, and Internal Reference. Active Development Community –There is a large community worldwide that continues to improve and refine PBS. Modularity –PBS is a component oriented system. –A well defined API is provided to allow components to be replaced with locally defined modules. Open Source –The source code for the PBS system is available without restriction. Price –Hey, its free…

Jefferson Lab and PBS:The PBS View Of The World PBS Server –Mastermind of the PBS System –Central Point of Contact PBS Scheduler –Prioritizes Jobs –Signals Server to Start Jobs Machine Oriented Mini-Server (MOM) –Executes Scripts on Compute Nodes –Performs User File Staging

Jefferson Lab and PBS:The PBS Server Routing Queues –Can move jobs between multiple PBS Servers Execution Queues –Defines default characteristics for submitted jobs –Defines a priority level for queued jobs –Holds jobs before, during and after execution Node Capabilities –The server maintains a table of nodes, their capabilities and their availability. Job Requirements –The server maintains a table of submitted jobs that is independent of the queues. Global Policy –The server maintains global policies and default job characteristics.

Jefferson Lab and PBS:The PBS Scheduler Prioritizes Jobs –Called periodically by the PBS Server –Downloads job lists from the server sorts them based on locally defined requirements. Tracks Node Availability –Examines executing jobs to determine projected availability time for nodes. –Using this data the scheduler can calculate future deployments and determine when back-filling should be performed. Recommends Job Deployment –At the end of the scheduling cycle, the scheduler will submit a list of jobs that can be started immediately to the server. –The PBS Server is responsible for verifying that the jobs can be started, and then deploying them.

Jefferson Lab and PBS:Machine Oriented Mini-Server Executes Scripts –At the direction of the PBS Server, MOM executes the user provided scripts –For parallel jobs, the primary MOM (Mother Superior) starts the jobs on itself and all other assigned nodes. Stages Data Files –Prior to script execution, the MOM is responsible for remotely copying user specified data files to Mother Superior. –Following execution, the resultant data files are remote copied back to the user specified host. Tracks Resource Usage –MOM tracks the cpu time, wall time, memory and disk that has been used by the job. Kills Rogue Jobs –Kills jobs at the PBS Server’s request

Jefferson Lab and PBS:Our Current Implementation

Jefferson Lab and PBS:What We’ve Learned So Far PBS Is Reasonably Reliable, But Has Room For Improvement –PBS Server and PBS Scheduler components work well and behave predictably –PBS MOM works okay, but behaves bizarrely in certain situations Disk full = chaos Out of process slots = chaos Improper file transfer or staging = chaos Note: The first two can be avoided by conspicuous system management, the last is the responsibility of the job submitter. Red Hat Linux 6.2 –We’ve seen many problems associated with NFS. After upgrading to Kernel many of these problems went away. –Klogd occasionally spins out of control and uses all available CPU cycles. –Sshd on SMP machines dies for no apparent reason. –Crontab works intermittently on SMP nodes. We’re considering experimenting with True 64 Unix to see if these problems exist there. Writing a Scheduler Is Hard Work –We have developed two interim schedulers and are now working on the ‘final’ implementation.

Jefferson Lab and PBS:Ongoing Development Underlord Scheduling System –Built on the existing PBS Scheduler Framework Plug-in replacement for the default scheduler Uses an object oriented interface to the PBS Server –Comprehensive match making scheme Starts from an ordered list of jobs Works with a collection of homogeneous or heterogeneous nodes Locates the optimal node or combination of nodes where a job should be deployed Uses user specified job parameters to project future job deployment Uses future job scheduling in combination with backfilling to maximize system utilization. –Multi-layered job sorting algorithm Time in queue Projected execution time Number of processors requested Queue priority Progressive user share (similar to the LSF scheme) –Generates a projection table Allows users to determine when their job is projected to start

Jefferson Lab and PBS:Future Directions Data Grid Server –In order to provide greater flexibility to the Batch System and allow it to accommodate data provided through the proposed Data Grid system, a Data Grid Server will be added to the existing system components. –This module will have the following capabilities Will provide time projections for when data will be available Will perform data migration to a script accessible host Will provide mechanisms to transfer resultant data to a specified location Will replace the existing staging capabilities of the PBS Server and PBS MOM. PBS Meta-Facility - The Overlord Scheduler –The Overlord Scheduler will be a centralized location where jobs are submitted that can be forwarded to other PBS Clusters for execution. The Overlord Scheduler will have the following capabilities. Will prioritize and sort all jobs based on global Meta-Facility rules Will consider job requirements, data location and network throughput and will forward each job to the PBS Server where it will be scheduled earliest. Will not forward jobs to one of the ‘Underlord’ systems until it is eligible for immediate execution there. –We don’t have all of this figured out yet… but, we are confident.

Jefferson Lab and PBS:Places On The Web Jefferson Lab HPC Home Page – Currently we have most of the PBS documentation and some statistics about our cluster and its development. PBS Home Page – Register and download PBS and all documentation from this site`