Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

Slides:

Advertisements

Similar presentations

CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or

Advertisements

Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.

1 Concepts of Condor and Condor-G Guy Warner. 2 Harvesting CPU time Teaching labs. + Researchers Often-idle processors!! Analyses constrained by CPU time!

Setting up of condor scheduler on computing cluster Raman Sehgal NPD-BARC.

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.

A Computation Management Agent for Multi-Institutional Grids

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Sun Grid Engine Grid Computing Assignment – Fall 2005 James Ruff Senior Department of Mathematics and Computer Science Western Carolina University.

Using the BYU Supercomputers. Resources Basic Usage After your account is activated: – ssh ssh.fsl.byu.edu You will be logged in to an interactive node.

GRID Workload Management System Massimo Sgaravatto INFN Padova.

Workload Management Massimo Sgaravatto INFN Padova.

First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova

Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.

Overview of TeraGrid Resources and Usage Selim Kalayci Florida International University 07/14/2009 Note: Slides are compiled from various TeraGrid Documentations.

Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen

Grid Toolkits Globus, Condor, BOINC, Xgrid Young Suk Moon.

National Alliance for Medical Image Computing Grid Computing with BatchMake Julien Jomier Kitware Inc.

Track 1: Cluster and Grid Computing NBCR Summer Institute Session 2.2: Cluster and Grid Computing: Case studies Condor introduction August 9, 2006 Nadya.

Prof. Heon Y. Yeom Distributed Computing Systems Lab. Seoul National University FT-MPICH : Providing fault tolerance for MPI parallel applications.

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

Condor Tugba Taskaya-Temizel 6 March What is Condor Technology? Condor is a high-throughput distributed batch computing system that provides facilities.

Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.

Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

Grid Computing I CONDOR.

Through the development of advanced middleware, Grid computing has evolved to a mature technology in which scientists and researchers can leverage to gain.

COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.

Condor Birdbath Web Service interface to Condor

GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.

3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.

GridShell + Condor How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner Edward Walker Miron Livney Todd Tannenbaum The Condor Development Team.

Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.

Grid job submission using HTCondor Andrew Lahiff.

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.

Using the BYU SP-2. Our System Interactive nodes (2) –used for login, compilation & testing –marylou10.et.byu.edu I/O and scheduling nodes (7) –used for.

Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.

Review of Condor,SGE,LSF,PBS

1 High-Performance Grid Computing and Research Networking Presented by David Villegas Instructor: S. Masoud Sadjadi

Campus grids: e-Infrastructure within a University Mike Mineter National e-Science Centre 14 February 2006.

Pilot Factory using Schedd Glidein Barnett Chiu BNL

Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.

CE: compute element TP: CE & WN Compute Element Worker Node Installation configuration.

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid2Win : gLite for Microsoft Windows Roberto.

Rochester Institute of Technology 1 Job Submission Andrew Pangborn & Myles Maxfield 01/19/09Service Oriented Cyberinfrastructure Lab,

LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.

Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.

Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.

Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group

Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.

HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.

Consorzio COMETA - Progetto PI2S2 UNIONE EUROPEA Grid2Win : gLite for Microsoft Windows Elisa Ingrà - INFN.

A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.

Workload Management Workpackage

OpenPBS – Distributed Workload Management System

Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.

Basic Grid Projects – Condor (Part I)

Mike Becher and Wolfgang Rehm

Condor-G Making Condor Grid Enabled

Condor-G: An Update.

Presentation transcript:

Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

The Grid ? 10/19/2015Service Oriented Cyberinfrastructure Lab,

The Problem At one end are computing resources managed by batch queuing systems and other middleware At the other end are end-users and their jobs/applications Need software and protocols for submitting jobs to the computing resources 10/19/2015Service Oriented Cyberinfrastructure Lab,

Job Submission More motivation stuff? 10/19/2015Service Oriented Cyberinfrastructure Lab,

Batch Queuing Systems Submitting a job directly to the batch queuing system One or more queues –Priorities Two common architectures –Client/server –Dynamic offloading User credential (delegation) Jobs have states (e.g. Pending, Running) 10/19/2015Service Oriented Cyberinfrastructure Lab,

Batch Queuing Systems Important examples: –Portable Batch System –TORQUE –Xgrid –Sun Grid Engine –Load Sharing Facility –Condor 10/19/2015Service Oriented Cyberinfrastructure Lab,

Portable Batch System (PBS) Originally developed for NASA Client/server architecture Server: pbs_server Client: pbs_mom Works with MPI with built-in shell script variables 10/19/2015Service Oriented Cyberinfrastructure Lab,

PBS Example cat test.sh #!/bin/sh #testpbs echo This is a test echo today is `date` echo This is `hostname` echo The current working directory is `pwd` ls -alF /home uptime 10/19/2015Service Oriented Cyberinfrastructure Lab,

PBS Example qsub test.sh 6.gras.carrion.rit.edu qstat Job id Name User Time Use S Queue gras test.sh litherum 00:00:00 C batch cat test.sh.o6 This is a test today is Sat Jan 17 18:20:20 EST 2009 This is carrion02 The current working directory is /home/litherum total 20 drwxr-xr-x 31 litherum litherum 4096 Jan 17 18:19 litherum/ 18:20:20 up 131 days, 21:20, 0 users, load average: 0.00, 0.00, /19/2015Service Oriented Cyberinfrastructure Lab,

Torque Built on top of PBS Supports reservations, where you can reserve specific resources for specific times. Supports partitions, where you can partition a cluster into smaller sub-clusters. 10/19/2015Service Oriented Cyberinfrastructure Lab,

Torque showq ACTIVE JOBS JOBNAME USERNAME STATE PROC REMAINING STARTTIME 0 Active Jobs 0 of 4 Processors Active (0.00%) 0 of 2 Nodes Active (0.00%) IDLE JOBS JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME 0 Idle Jobs BLOCKED JOBS JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME Total Jobs: 0 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 0 10/19/2015Service Oriented Cyberinfrastructure Lab,

Xgrid Apple Essentially the same as Condor GUI! =) Client/server model 10/19/2015Service Oriented Cyberinfrastructure Lab,

Sun Grid Engine Open source, like everything new Sun puts out Supports –Reservations –Job dependencies, –Checkpointing –Multiple scheduling algorithms –Web interface Professional! 10/19/2015Service Oriented Cyberinfrastructure Lab,

Load Sharing Facility Used by GRAM, which we’ll talk about later 10/19/2015Service Oriented Cyberinfrastructure Lab,

Condor More about this later, but it implements its own scheduler 10/19/2015Service Oriented Cyberinfrastructure Lab,

Challenging! These queuing systems are hard to use There may be many systems employed in a given grid Wouldn’t it be nice if all this were unified in a single implementation? 10/19/2015Service Oriented Cyberinfrastructure Lab,

A tool for pooling and “scavenging” computing resources and distributing jobs Similar to a batch queuing system [2] –job management –scheduling policy –priority scheme –resource monitoring –resource management. Also focuses on high-throughput and “opportunistic computing” [2] 10/19/2015Service Oriented Cyberinfrastructure Lab, Condor image from:

Condor Universes [1] Standard Vanilla –Simpler, can run universal binaries (do not need to be “condor compiled”) –No support for partial execution or job relocation Others –PVM –MPI –Java 10/19/2015Service Oriented Cyberinfrastructure Lab,

Condor Submission File Example [1] #hello.sub #condor job file example Universe = Vanilla Executable = hello Output = hello.out Input = hello.in Error = hello.err Log = hello.log Queue 10/19/2015Service Oriented Cyberinfrastructure Lab,

Condor Commands condor_submit 10/19/2015Service Oriented Cyberinfrastructure Lab,

Condor Daemons On all condor deployed machines –Master –Startd –Schedd On the condor pool master –Collector –Negotiator 10/19/2015Service Oriented Cyberinfrastructure Lab,

GRAM [4] Globus Resource Allocation Manager (GRAM) –Resource allocation –Process creation –Monitoring –Management –Maps requests expressed in a Resource Specification Language (RSL) into commands to local schedulers and computers. 10/19/2015Service Oriented Cyberinfrastructure Lab,

GRAM Pluggable! Can’t make up their mind how to describe jobs Will submit jobs to: –Condor –LSF –PBS/Torque –??? Unified interface, identifier for which cluster/service to use 10/19/2015Service Oriented Cyberinfrastructure Lab,

GRAM Example globusrun-ws -submit -factory 44/wsrf/services/ManagedJobFactoryService -factory-type PBS -streaming -job-command /bin/ hostname Delegating user credentials...Done. Submitting job...Done. Job ID: uuid: e4f2-11dd-81df bb4e6 Termination time: 01/18/ :57 GMT Current job state: Pending Current job state: Active tg-c15 Current job state: CleanUp-Hold Current job state: CleanUp Current job state: Done Destroying job...Done. Cleaning up any delegated credentials...Done. 10/19/2015Service Oriented Cyberinfrastructure Lab,

Condor-G [4] Condor-G is a Globus-enabled version of the Condor scheduler. It uses Globus to handle inter-organizational problems like: –Security –Resource management for supercomputers, –Executable staging. The same Condor tools that access local resources are now able to use the Globus protocols to access resources at multiple sites. It communicates with these resources and transfers files to and from these resources using Globus mechanisms, such as: –GSI –GRAM protocol for job submission Condor-g can be used to submit jobs to systems managed by Globus. Globus tools can be used to submit jobs to systems managed by Condor 10/19/2015Service Oriented Cyberinfrastructure Lab,

Condor-G 10/19/2015Service Oriented Cyberinfrastructure Lab,

UNICORE 10/19/2015Service Oriented Cyberinfrastructure Lab,

Upperware Talk about motivation for upperware applications 10/19/2015Service Oriented Cyberinfrastructure Lab,

GridShell 10/19/2015Service Oriented Cyberinfrastructure Lab,

References Getting started with Condorhttp:// 2.Thain, D., Tannenbaum, T., & Livny, M. (2005). Distributed computing in practice: the Condor experience. 3. bmission.ppt – Jeremy Espenshade’s condor job submission presentationhttp://grid.rit.edu/seminar/lib/exe/fetch.php/users:jeremy_espenshade:condorjobsu bmission.ppt /19/2015Service Oriented Cyberinfrastructure Lab,