Download presentation
Presentation is loading. Please wait.
Published byBartholomew Smith Modified over 9 years ago
1
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
2
What is Resource Management? l Mechanisms for locating and allocating computational resources Authentication Process creation l Remote job submission l Scheduling l Other resources that can be managed: Memory Disk Networks
3
Resource Management Issues for Grid Computing l Site autonomy Resources owned by different organizations, in different administrative domains Local policies for use, scheduling, security l Heterogeneous substrate Different local resource management systems l Policy extensibility Local sites need ability to customize their resource management policies
4
More Issues for Grid Computing l Co-allocation May need resources at several sites Mechanism for allocating multiple resources, initiating computation, monitoring and managing l On-line control Adapt application requirements to resource availability
5
Specifying Resource and Job Requirements l Resource requirements: Machine type Number of nodes Memory Network l Job or scheduler parameters: Directory Executable Arguments Environment Maximum time required
6
Resource and Job Specification l Globus: Resource Specification Language (RSL) &(executable=myprog) (|(&(count=5)(memory>=64)) (&(count=10)(memory>=32))) l Condor: Classified ads Resource owners advertise abilities and constraints Applications advertise resource requests Matchmaking: match offers & requests
7
Components of Globus Resource Management Architecture l Resource specification using RSL l Resource brokers: translate resource requirements into specifications l Co-allocators: break down requests for multiple sites l Local resource managers: apply local, site-specific resource management policies l Information about available compute resources and their characteristics
8
Resource Specification Language l Common notation for exchange of information between components l API provided for manipulating RSL
9
RSL Syntax l Elementary form: parenthesis clauses (attribute op value [ value … ] ) l Operators Supported: =, >, != l Some supported attributes: executable, arguments, environment, stdin, stdout, stderr, resourceManagerContact, resourceManagerName l Unknown attributes are passed through May be handled by subsequent tools
10
Constraints: “&” l For example: & (count>=5) (count<=10) (max_time=240) (memory>=64) (executable=myprog) l “Create 5-10 instances of myprog, each on a machine with at least 64 MB memory that is available to me for 4 hours”
11
Multirequest: “+” l A multirequest allows us to specify multiple resource needs, for example + (& (count=5)(memory>=64) (executable=p1)) (&(network=atm) (executable=p2)) Execute 5 instances of p1 on a machine with at least 64M of memory Execute p2 on a machine with an ATM connection l Multirequests are central to co-allocation
12
Resource Broker l Takes high-level RSL specification l Transforms into concrete specifications through “specialization” process l Locate resources that meet requirements l Multiple brokers may service single request l Application-specific brokers translate application requirements l Output: complete specification of locations of resources; given to co-allocator
13
Examples of Resource Brokers l Nimrod-G Automates creation and management of large parametric experiments Run application under wide range of input conditions and aggregate results Queries MDS to find resources Generates number of independent jobs GRAM allocates jobs to computational nodes Higher-level broker: allows user to specify time and cost constraints
14
Examples of Resource Brokers l AppLeS Application Level Scheduler Map large number of independent tasks to dynamically varying pool of available computers Use GRAM to locate resources and initiate and manage computation
15
Resource co-allocators l May request resources at multiple sites Two or more computers and networks l Break multi-request into components l Pass each component to resource manager l Provide means for monitoring job status or terminating job l Complex: Two or more resource managers Global state like availability of resources difficult to determine
16
Different co-allocation services 1. Require all resources to be available before job proceeds; fail globally if failure occurs at any resource 2. Allocate at least N out of M resources and return 3. Return immediately, but gradually return more resources as they become available l Each useful for some class of applications
17
Concurrent Allocation l If advance reservations are available: Obtain list of available time slots from each participating resource manager and choose timeslot l Without reservations: Optimistically allocate resources Hope desired set will be available at future time Use information service (MDS) to determine current availability of resources Construct RSL request that is likely to succeed If allocation fails, all started jobs must be terminated
18
Disadvantages of Concurrent Allocation Scheme l Computational resources wasted while waiting for all requested resources to become available l Application must be altered to perform barrier to synchronize startup across components l Detecting failure of a resource is difficult, e.g. in queue-based local resource managers
19
Local Resource Managers l Implemented with Globus Resource Allocation Manager (GRAM) 1.Processing RSL specifications representing resource requests Deny request Create one or more processes (jobs) that satisfy request 2.Enable remote monitoring and management of jobs 3.Periodically update MDS information service with current availability and capabilities of resources
20
GRAM (cont.) l Interface between grid environment and entity that can create processes E.g., Parallel scheduler or Condor pool l GRAM may schedule resource itself l More commonly, maps resource specification into a request to a local resource allocation mechanism E.g., Condor, LoadLeveler, LSF l Co-exists with local mechanisms
21
GRAM (cont.) l GRAM API has functions for: Submitting a job request: produces globally unique job handle Canceling a job request Asking when job request is expected to run Upon submission, can request that progress be signaled asynchronously to callback URL
22
GRAM Scheduling Model l Jobs are either: Pending: resources have not yet been allocated to the job Active: resources allocated, job running Done: when all processes have terminated and resources have been deallocated Failed: job terminates due to : explicit termination error in request format failure in resource management system denial of access to resource
23
GRAM Components l Gatekeeper Responds to a request: 1.Performs mutual authentication of user and resource 2.Determines local user name for remote user 3.Starts a job manager that executes as local user and handles request
24
GRAM Components (cont.) l Job manager Creates processes requested by user Submits resource allocation requests to underlying resource management system (or does fork) Monitors state of created processes Notifies callback contact of state transitions Implements control operations like termination
25
GRAM Components (cont.) l GRAM reporter Responsible for storing into MDS (information service) info about: Scheduler structure Support reservations? Number of queues Scheduler state Currently active jobs Expected wait time in queue Total number of nodes and available nodes
26
GRAM LSFEASY-LLNQE Application RSL Simple ground RSL Information Service Local resource managers RSL specialization Broker Ground RSL Co-allocator Queries & Info Resource Management Architecture
27
Job Submission Interfaces l Globus Toolkit includes several command line programs for job submission globus-job-run: Interactive jobs globus-job-submit: Batch/offline jobs globusrun: Flexible scripting infrastructure
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.