GRAM: Grid Resource Allocation & Management Globus Toolkit™ Developer Tutorial The Globus Project™ Argonne National Laboratory USC Information Sciences.

Slides:



Advertisements
Similar presentations
Three types of remote process invocation
Advertisements

The Anatomy of the Grid: An Integrated View of Grid Architecture Carl Kesselman USC/Information Sciences Institute Ian Foster, Steve Tuecke Argonne National.
Grid Resource Allocation Management (GRAM) GRAM provides the user to access the grid in order to run, terminate and monitor jobs remotely. The job request.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
6a.1 Globus Toolkit Execution Management. Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS.
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
Basics Globus Toolkit™ Developer Tutorial The Globus Project™ Argonne National Laboratory USC Information Sciences Institute Copyright.
Workload Management Massimo Sgaravatto INFN Padova.
Grid Computing, B. Wilkinson, 20046c.1 Globus III - Information Services.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Grids and Globus at BNL Presented by John Scott Leita.
Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
A Java Based Prototype Grid User Interface Janice Drohan Project Supervisor: Prof. Peter Clarke.
Grid Computing 7700 Fall 2005 Lecture 10 and 12: Globus V2 Gabrielle Allen
The Glidein Service Gideon Juve What are glideins? A technique for creating temporary, user- controlled Condor pools using resources from.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting June 13-14, 2002.
Robert Fourer, Jun Ma, Kipp Martin Copyright 2006 An Enterprise Computational System Built on the Optimization Services (OS) Framework and Standards Jun.
The Globus APIs 11 th January 2001.
Process Management Working Group Process Management “Meatball” Dallas November 28, 2001.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
Computational grids and grids projects DSS,
Grid Computing I CONDOR.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Part 6: (Local) Condor A: What is Condor? B: Using (Local) Condor C: Laboratory: Condor.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid Workload Management Massimo Sgaravatto INFN Padova.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
August 13, 2003Eric Hjort Getting Started with Grid Computing in STAR Eric Hjort, LBNL STAR Collaboration Meeting August 13, 2003.
Grid Compute Resources and Job Management. 2 Local Resource Managers (LRM)‏ Compute resources have a local resource manager (LRM) that controls:  Who.
Giuseppe Codispoti INFN - Bologna Egee User ForumMarch 2th BOSS: the CMS interface for job summission, monitoring and bookkeeping W. Bacchi, P.
CHEP03 Mar 25Mary Thompson Fine-grained Authorization for Job and Resource Management using Akenti and Globus Mary Thompson LBL,Kate Keahey ANL, Sam Lang.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.
Review of Condor,SGE,LSF,PBS
Resource Management Task Report Thomas Röblitz 19th June 2002.
APST Internals Sathish Vadhiyar. apstd daemon should be started on the local resource Opens a port to listen for apst client requests Runs on the host.
Basic Grid Projects - Globus Sathish Vadhiyar Sources/Credits: Project web pages, publications available at Globus site. Some of the figures were also.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
UNIX Unit 1- Architecture of Unix - By Pratima.
High-Performance Computing Lab Overview: Job Submission in EDG & Globus November 2002 Wei Xing.
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Globus Grid Tutorial Part 2: Running Programs Across Multiple Resources.
Grid Compute Resources and Job Management. 2 Job and compute resource management This module is about running jobs on remote compute resources.
Int.eu.grid: Experiences with Condor to Run Interactive and Parallel Applications on the Grid Elisa Heymann Department of Computer Architecture and Operating.
Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Open Science Grid Build a Grid Session Siddhartha E.S University of Florida.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
STAR Scheduler Gabriele Carcassi STAR Collaboration.
The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.
First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Peter Kacsuk – Sipos Gergely MTA SZTAKI
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Oxana Smirnova (Lund, EPF) 3rd NorduGrid Workshop, May23, 2002
Presentation transcript:

GRAM: Grid Resource Allocation & Management Globus Toolkit™ Developer Tutorial The Globus Project™ Argonne National Laboratory USC Information Sciences Institute Copyright (c) 2002 University of Chicago and The University of Southern California. All Rights Reserved. This presentation is licensed for use under the terms of the Globus Toolkit Public License. See for the full text of this license.

June 1, Globus Toolkit™ Developer Tutorial: GRAM Resource Management Review l Resource Specification Language (RSL) is used to communicate requirements l The Grid Resource Allocation and Management (GRAM) API allows programs to be started on remote resources, despite local heterogeneity l A layered architecture allows application- specific resource brokers and co-allocators (e.g. DUROC) to be defined in terms of GRAM services

June 1, Globus Toolkit™ Developer Tutorial: GRAM GRAM Components Globus Security Infrastructure Job Manager GRAM client API calls to request resource allocation and process creation. MDS client API calls to locate resources Query current status of resource Create RSL Library Parse Request Allocate & create processes Process Monitor & control Site boundary ClientMDS: Grid Index Info Server Gatekeeper MDS: Grid Resource Info Server Local Resource Manager MDS client API calls to get resource info GRAM client API state change callbacks

June 1, Globus Toolkit™ Developer Tutorial: GRAM Resource Management APIs l Globus Toolkit has APIs for RSL, GRAM, and DUROC: –globus_rsl –globus_gram_client –globus_gram_myjob –globus_duroc_control –globus_duroc_runtime

June 1, Globus Toolkit™ Developer Tutorial: GRAM Resource Specification Language l Much of the power of GRAM is in the RSL l Common language for specifying job requests –GRAM service translates this common language into scheduler specific language l GRAM service constrains RSL to a conjunction of (attribute=value) pairs –E.g. &(executable=“/bin/ls”)(arguments=“-l”) l GRAM service understands a well defined set of attributes

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_rsl l Module for manipulating RSL expressions –Parse an RSL string into a data structure –Functions to manipulate the data structure –Unparse the data structure into a string l Can be used to assist in writing brokers or filters which refine an RSL specification

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For GRAM l (executable=string) –Program to run –A file path (absolute or relative) or URL l (directory=string) –Directory in which to run (default is $HOME) l (arguments=arg1 arg2 arg3...) –List of string arguments to program l (environment=(E1 v1)(E2 v2)) –List of environment variable name/value pairs

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For GRAM l (stdin=string) –Stdin for program –A file path (absolute or relative) or URL l (stdout=string) –Stdout for program –A file path (absolute or relative) or URL l (stderr=string) –Stdout for program –A file path (absolute or relative) or URL

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For GRAM l (count=integer) –Number of processes to run (default is 1) l (hostCount=integer) –On SMP multi-computers, number of nodes to distribute the “count” processes across l (project=string) –Project (account) against which to charge l (queue=string) –Queue into which to submit job

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For GRAM l (maxTime=integer) –Maximum wall clock or cpu runtime (schedulers’s choice) in minutes l (maxWallTime=integer) –Maximum wall clock runtime in minutes l (maxCpuTime=integer) –Maximum CPU runtime in minutes

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For GRAM l (maxMemory=integer) –Maximum amount of memory for each process in megabytes l (minMemory=integer) –Minimum amount of memory for each process in megabytes

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For GRAM l (jobType=value) –Value is one of “mpi”, “single”, “multiple”, or “condor” >mpi: Run the program using “mpirun -np ” >single: Only run a single instance of the program, and let the program start the other count-1 processes. >multiple: Start instances of the program using the appropriate scheduler mechanism >condor: Start a Condor processes running in “standard universe”

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes for GRAM l (gramMyjob=value) –Value is one of “collective”, “independent” –Defines how the globus_gram_myjob library will operate on the processes >collective: Treat all processes as part of a single job >independent: Treat each of the processes as an independent uniprocessor job l (dryRun=true) –Do not actually run job

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes for GRAM l (save_state=yes) –Causes the jobmanager to save job state/information to a persistent file on disk –Recover from a jobmanager crash –New in Globus Toolkit v2.0 l (two_phase= ) –Implement a two-phase commit for job submission and completion – =seconds to wait before job times out –New in Globus Toolkit v2.0

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes for GRAM l (restart= ) –Start a new jobmanager but instead of submitting a new job, start watching over an existing job. –New in Globus Toolkit v2.0 l (stdout_position= ) l (stderr_position= ) –specified as part of a job restart –restart file streaming from this byte –New in Globus Toolkit v2.0

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Substitutions l RSL supports simple variable substitutions l Substitutions are declared using a list of pairs –(rslSubstitution=(SUB1 val1)(SUB2 val2)) l A substitution is invoked with $(SUB) l Processing order: –Within scope, processed left-to-right, –Outer scope processed before inner scope –Variable definition can reference previously defined variables

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Substitution Example l This &(rslSubstitution=(URLBASE “ftp://host:1234”)) (rslSubstitution=(URLDIR $(URLBASE)/dir)) (executable=$(URLDIR)/myfile) l is equivalent to this &(executable=ftp://host:1234/dir/myfile)

June 1, Globus Toolkit™ Developer Tutorial: GRAM GRAM Defined RSL Substitutions l GRAM defines a set of RSL substitutions before processing the job request l Machine Information –GLOBUS_HOST_MANUFACTURER –GLOBUS_HOST_CPUTYPE –GLOBUS_HOST_OSNAME –GLOBUS_HOST_OSVERSION

June 1, Globus Toolkit™ Developer Tutorial: GRAM GRAM Defined RSL Substitutions l Paths to Globus –GLOBUS_LOCATION l Miscellaneous –HOME –LOGNAME –GLOBUS_ID

June 1, Globus Toolkit™ Developer Tutorial: GRAM GRAM Examples The globus-job-run client is a sample GRAM client that integrates GASS services for executable staging and standard I/O redirection, using command-line arguments rather than RSL. % globus-job-run pitcairn.mcs.anl.gov /bin/ls % globus-job-run pitcairn.mcs.anl.gov –s myprog % globus-job-run pitcairn.mcs.anl.gov \ –s myprog –stdin –s in.txt –stdout –s out.txt

June 1, Globus Toolkit™ Developer Tutorial: GRAM GRAM Examples The globusrun client is a more involved tool that allows complicated RSL expressions. % globusrun –r pitcairn.mcs.anl.gov –f myjob.rsl % globusrun –r pitcairn.mcs.anl.gov \ ‘&(executable=myprog)’

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_gram_client l globus_gram_client_job_request() –Submit a job to a remote resource –Input: >Resource manager contact string >RSL specifying the job to be run >Callback contact string, for notification –Output: >Job contact string

June 1, Globus Toolkit™ Developer Tutorial: GRAM Finding The Gatekeeper l globus_gram_client_job_request() requires a resource manager contact string to find the gatekeeper hostname[:port][/service][:subject] –hostname – host of gatekeeper >required –port – port on which gatekeeper is listening >defaults to well known port = gsigatekeeper = 2119 –service – gatekeeper service to invoke >defaults to “jobmanager” –subject – security subject name of gatekeeper >Defaults to standard host cert form: “…/cn=host/hostname” >Applies fuzzy match to deal with interface names, etc.

June 1, Globus Toolkit™ Developer Tutorial: GRAM Job Contact l globus_gram_client_job_request() returns a job contact –Opaque string –Other globus_gram_client_*() functions use the job contact to find the right job manager to which requests are made –Job contact string can be passed between processes, even on different machines

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_gram_client l globus_gram_client_job_status() –Check the status of the job >UNSUBMITTED, PENDING, ACTIVE, FAILED, DONE, SUSPENDED –Can also get job status through callbacks >globus_gram_client_callback_{allow,disallow,check}() l globus_gram_client_job_cancel() –Cancel/kill a pending or active job

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_gram_client l globus_gram_client_job_signal() –Controls the jobmanager –COMMIT_REQUEST* >submit job –COMMIT_END* >Cleanup job –COMMIT_EXTEND* >Wait additional N seconds –* when jobs have “(two_phased=yes)”

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_gram_client l globus_gram_client_job_signal(), continued –STDIO_UPDATE >Allows client to submit an RSL that changes some I/O attributes of the job l stdout, stderr, stdout_position, stderr_position, remote_io_url –STDIO_SIZE >verify that streamed I/O has been completely received –STOP_MANAGER >Tells JM to exit, but leave the job running

June 1, Globus Toolkit™ Developer Tutorial: GRAM State Change Callbacks l GRAM managed job can be in the states: –Unsubmitted, Pending, Active, Failed, Done, Suspended l GRAM client can register for asynchronous state change callbacks –Registration can be done during submission >Globus_gram_client_job_request() –Registration can be done later by any process, using the job contact >globus_gram_client_job_callback_register()

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_gram_client l globus_gram_client_callback_allow() globus_gram_client_callback_disallow() globus_gram_client_callback_check() –Create/destroy a client port to listen for asynchronous state change callbacks –Callback to local function on state change l globus_gram_client_job_callback_register() globus_gram_client_job_callback_unregister() –Register with job manager to receive callbacks

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_gram_myjob l When a set of processes in a single job startup, they may need to self organize –How many processes in the job? –What is my rank within the job? –Simple send/receive between job processes. l This API is a minimal set of functions to allow this self organization l This is a bootstrapping library. It is NOT meant to be a general purpose message passing library for use by applications

June 1, Globus Toolkit™ Developer Tutorial: GRAM DUROC Review l Simultaneous allocation of a resource set –Handled via optimistic co-allocation based on free nodes or queue prediction –In the future, advance reservations will also be supported l globusrun will co-allocate specific multi- requests –Uses a Globus component called the Dynamically Updated Request Online Co-allocator (DUROC)

June 1, Globus Toolkit™ Developer Tutorial: GRAM A Co-allocation Multirequest +( & (resourceManagerContact= *** “flash.isi.edu:2119/jobmanager- lsf:/O=Grid/…/CN=host/flash.isi.edu”) (count=1) (label="subjob A") (executable= my_app1) ) ( & (resourceManagerContact= ***“sp139.sdsc.edu:2119:/O=Grid/…/CN=host/sp097.sdsc.edu") (count=2) (label="subjob B") (executable=my_app2) ) Different executables Different counts Different resource managers

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For DUROC l (subjobStartType=value) –Alters the startup barrier mechanism –values are “strict-barrier”, “loose-barrier”, “no-barrier” l (subjobCommsType=value) –values are “blocking-join” and “independent” –if value is set to “independent”, the subjob won’t be seen from the other subjobs when doing inter-subjob communication.

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL Attributes For DUROC l (label=string) –Identifier for this subjob l (resourceManagerContact=string) (resourceManagerName=string) –Resource manager to which to submit a subjob

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_duroc_control l Submit a multi-request l Edit a pending request –Add new nodes, edit out failed nodes l Commit to configuration –Delay to last possible minute –Barrier synchronization l Initialize computation –Bootstrap library l Monitor and control collection

June 1, Globus Toolkit™ Developer Tutorial: GRAM globus_duroc_runtime l globus_duroc_runtime_barrier() –All processes in DUROC job must call this –It will wait until the DUROC control module releases all processes from the barrier l globus_duroc_runtime_inter_subjob_*() –Bootstrap library between subjobs l globus_duroc_runtime_intra_subjob_*() –Bootstrap library within a subjob

June 1, Globus Toolkit™ Developer Tutorial: GRAM Job Manager Files X509_USER_PROXY GASS_CACHE Gatekeeper stdout stderr Staged EXE Staged stdin Submission Scheduler Desc. Exe=x Args=y Env=z Job status monitoring UP Client GRIS JOB Jobmanager

June 1, Globus Toolkit™ Developer Tutorial: GRAM GRAM exercises l Note: GRAM has three APIs: –client, myjob, job_manager –Most users will never use job_manager API l Go to the “gram” subdirectory l Documentation – l Follow instructions in the file README

June 1, Globus Toolkit™ Developer Tutorial: GRAM DUROC exercises l Note: DUROC has two APIs: –control, runtime l Go to the “duroc” subdirectory l Documentation – l Follow instructions in the file README

June 1, Globus Toolkit™ Developer Tutorial: GRAM RSL exercises l Go to the “rsl” subdirectory l Documentation – l Follow instructions in the file README

June 1, Globus Toolkit™ Developer Tutorial: GRAM Changes: 1.1.x  2.0 l One-and-only-once submission –Through 2 phase commit signal l Recoverability –Job manager can be restarted –Restart/redirect stdout/err l Generalized signaling of job manager

June 1, Globus Toolkit™ Developer Tutorial: GRAM Future “GRAM 1.6” l Asynchronous client API l New RSL attribute to pass through scheduler specific commands –No more piggy-backing on the environment attributes l File staging –scratch dir, input, output l Advanced output management –Stream/store stdout and stderr to multiple destinations

June 1, Globus Toolkit™ Developer Tutorial: GRAM Interesting Issues l The Globus Toolkit does not include a resource broker or a metascheduler! –We have helped many people to build these using GRAM and MDS services; many now exist. >Condor-G, DRM, PUNCH, Nimrod/G, Cactus, AppLeS,