Presentation is loading. Please wait.

Presentation is loading. Please wait.

HPC Profile BOF Marvin Theimer Marty Humphrey Microsoft Corporation

Similar presentations


Presentation on theme: "HPC Profile BOF Marvin Theimer Marty Humphrey Microsoft Corporation"— Presentation transcript:

1 HPC Profile BOF Marvin Theimer Marty Humphrey Microsoft Corporation
University of Virginia

2 Agenda 11:00 – 11:15 Review of Charter (Humphrey)
11:15 – 11:30 HPC Use Cases – Base Case and Common Cases (Theimer) 11:30 – 11:45 Extensible Job Submission Design (Theimer) 11:45 – 12:00 Comparative Analysis Extensible Job Submission Design and JSDL/BES (Wasson) ESI – Snelling/Foster (Theimer) 12:00 – 12:30 Discussion

3 Review of Charter (11:00 – 11:15)

4 History GGF14: Chicago Jul 14 2005 GGF15: Boston Oct 6 2005
“Minimal Web Services BOF” (aka WS-Management) Newhouse, Theimer, Humphrey, Tollefsrud GGF15: Boston Oct UVa update on WS-Management use for OGSA (Wasson) Specific technical thoughts on the support of dual stacks Suspended given rumored “reconcilation” GGF16: Athens Feb “An evolutionary approach to realizing the Grid vision” Theimer, Parastatidis, Hey, Humphrey, Fox OGSA F2F Feb Theimer gives detailed presentation of the “evolutionary” paper

5 More History Mar 15 2006 OGSA F2F: Sunnyvale, CA April 5 2006
“Toward Converging Web Service Standards for Resources, Events, and Management” HP, IBM, Intel, Microsoft OGSA F2F: Sunnyvale, CA April Theimer presented the use-case document Since March 2006, active engagement of OGSA-WG mailing list to build consensus

6 OGSA HPC Profile WG (Computing Area)
Objective: the profile and protocol specifications needed to realize the vertical use case of batch job scheduling of scientific/technical applications   “use case” = HPC use case Output: HPCP (normative) Scope Identify any changes/extensions that are deemed necessary to existing protocol specifications and will work with the relevant working groups to try to affect the identified changes/extensions Identify additional protocol specifications that need to be defined and will either work on their definition or spin them out to additionally defined working groups.

7 OGSA HPC Profile WG (Computing Area)
“sub-profiles” interface for specifying and submitting and scheduling jobs interface for bulk data staging Evolutionary approach A simple base case will be defined that we expect to have universally implemented by all batch job scheduling clients and schedulers. All additional functionality will be defined in terms of optional extensions (which are anticipated to be widely applicable)

8 Pre-existing Documents
JSDL BES “An evolutionary approach to realizing the Grid vision”

9 Status Use-case in final revisions
Resource reservation Provisioning Execution Next: what the framework should be for defining extension profiles Aggressive milestones to meet vendor deadlines

10 Deliverables OGSA HPC Use Cases – Base Case and Common Cases (GFD-I)
OGSA HPC profile specification (GFD-R.P) OGSA HPC initial common cases extension profile specification (GFD-R.P)

11 Milestones Document name First draft available Ready for Public
Comment review GFD publication OGSA HPC Use Cases April 2006 July 2006 Sept. 2006 OGSA HPC base case profile Aug. 2006 Nov. 2006 Mar. 2007 OGSA HPC initial common cases extension profile Jan. 2007 Apr. 2007 Aug. 2007

12 HPC Use Cases – Base Case and Common Cases [ GFD-I ] (11:15 – 11:30)

13 Goals BASE case: Common Cases:
ALL scheduling clients and services are expected to understand HPC, not Grid (i.e., do NOT span administrative domains) Common Cases: Represent some significant fraction of implementors, not all implementors NOT all cases – only common cases capture client-visible functionality requirements rather than being technology/system-design-driven

14 Base Case High throughout compute cluster used only within the enterprise User requests: Submit a job with specification of resource requirements  unique jobID or fault Query a specific job for its current state Cancel a specific job List jobs State diagram: queued, running, finished

15 Base Case (cont) Only small set of “standard” resources
number of CPUs/compute nodes needed, memory requirements, disk requirements, etc. Only equality of string values and numeric relationships among pairs of numeric values are provided in the base use case. Once a job has been submitted it can be cancelled, but its resource requests can't be modified

16 Base case: Out of Scope Data access issues
Programs are assumed to be pre-installed Creation and management of user security credentials No need for directory services beyond something like DNS Management of the system resources

17 Base case: Fault tolerance model
Job fails because of “system problems” Job must be resubmitted by client the job scheduler will not automatically rerun the job Failure of the scheduler may or may not cause currently running jobs to fail.

18 Base case: Job Exits Whether it exited successfully, with an error code, or terminated due to some other fault situation. How long it ran in terms of wall-clock time.

19 Base case: scheduling policy
FIFO Out-of-scope: quotas and other forms of SLAs Non-independent jobs Infrastructure support for parallel, distributed programs (such as MPI) Reservation of resources separate from allocation to a running job (e.g., reserve 3 cpus for future use) Interactive access to running jobs

20 Common Cases Purpose of enumerating common cases: use as the basis for creating appropriate extension mechanisms 13 cases

21 13 Common Cases Exposing existing schedulers’ functionality
Condor, Globus, LSF, Maui, Microsoft-CCS, PBS, SGE, etc. Polling vs. notification notification “call-back” messages for significant changes in the state of a job What are the semantics of message delivery? At-Most-Once and Exactly-Once Submission Guarantees The base use case allows the possibility that a client can’t directly tell whether its job submission request has been successfully received by a job scheduler or not Types of Data Access non-transparent staging of data between independent storage systems. explicitly supports transparent data access within a virtual organization or across a federated set of organizations

22 13 Common Cases (cont.) Types of Programs to Install/Provision/Run
users may have programs that require explicit installation of some form. Multiple Organization Use Cases Submission of jobs requires additional security support (e.g., “foreign” credential) Data current resides outside of the enterprise in question Additional sandboxing of non-local users Extended Resource Descriptions allow arbitrary resource types whose semantics are not understood by the HPC infrastructure accounting information returned for a job

23 13 Common Cases (cont.) Extended Client/System Administrator Operations users may wish to modify the requirements for a job after it has already been submitted Arrays of jobs system administrators: suspension/resumption of jobs and migration of jobs among compute nodes Extended Scheduling Policies shortest/smallest-job-first, weighted-fair-share scheduling, etc. multiple submission queues, job submission quotas, and various forms of SLAs, such as guarantees on how quickly a job will be scheduled to run.

24 13 Common Cases (cont.) Parallel Programs and Workflows of Programs
instantiate such programs (e.g., MPI) across multiple compute nodes in a suitable manner, including provision of information that will allow the various program instances to find each other within the cluster Programs may have execution dependencies on each other. Advanced Reservations and Interactive Use Cases reserve resources for use at a specific future time communicate in real time with external client users Cycle Scavenging batch job scheduler dispatches jobs to machines that have dynamically indicated to it that they are currently available for running guest jobs.

25 13 Common Cases (cont.) Multiple Schedulers
submit work to the whole of the computing infrastructure without having to manually select which facility to submit to

26 Status Need feedback Is the base case sufficient?
Missing any “common” cases? Any of the 13 “too uncommon”?

27 Extensible Job Submission Design (11:30 – 11:45)

28 Extensible Job Submission Design (EJS)
Main focus: extensibility Philosophy: Cover all the bases (resource reservation, provisioning, execution, data staging, etc.) Keep it simple Approach: Minimalist base cases (overall and for each sub-component) Optional extensions to enable both richer semantics and evolution

29 What is a Job? OGSA glossary: Batch job scheduling literature:
Job: User-defined task that is scheduled to be carried out by an execution subsystem Task: ??? Single program instance? Distributed MPI program? What about data staging? BES defines simple workflows Execution subsystem: ??? Job queue? Process? Compute node? Multiple compute nodes? Workflow: Focus is on business processes & services No mention of executing multiple user-defined tasks or data staging steps Batch job scheduling literature: Job ~ accounting entity under which multiple user-defined steps are run

30 Core Concepts Task: execution of one or more program instances in one or more execution subsystems Compute node: execution subsystem that actually executes a program Resources: Compute node CPUs, memory, disk space, etc. Aggregates: # of compute nodes, all resources of a compute node, etc. Scheduler: allocates resources to job and tasks Resource allocation: 3 distinct phases Clients query schedulers about available resources Clients reserve resources Schedulers allocate resources to tasks or to reservation requests Job: reified resource reservation against which tasks can be run

31 Examples of Jobs and Tasks

32 Base Task States New Pending Running Finished Canceled Failed

33 Base Job States New Unsatisfied Satisfied Finished Canceled Failed

34 Multiple Schedulers … Cluster13 Cluster13-1 CN Cluster13-headnode
Task1 Sched13 Cluster13-2 Client Meta-sched CN Task1 Meta-schedulers Autonomous schedulers => don’t have the right to allocate resources until you’ve reserved them Hierarchical information representation Cluster42 Desktop-foo Sched42 Cluster42-8 CN CN 42-1 Task3 Task2 42-7

35 Other Topics Covered Advertising resource information
Failure and recovery model Security and credential delegation

36 Types of Extensions Purely additive extensions allowed (i.e. no changes to base semantics) Additional WSDL operations (incl. for parameter overloading) Array operations Extended state diagrams Extended resource descriptions Extended information representations Multiple, composable, extensible “micro”-protocols

37 Specialization of States
New Pending Running Finished Canceled Failed Running: Migrating Migrate New Pending Running Finished Canceled Failed Running: Suspended Suspend Profile A: Task state transition diagram for a scheduling profile that extends the base protocol to support task migration Profile C: Task state transition diagram for a scheduling profile that extends the base protocol to support task suspension New Pending Running: Stage-in Finished Canceled Failed Executing Stage-out Profile B: Task state transition diagram for a scheduling profile that extends the base protocol to support the notion of staging in data to a compute node before a task runs and staging data out back to the client user after the task has finished execution

38 Base Interoperability Interface
Task interface: CreateTask(schedulerEPR, resourceDescr, credentialsDescr, lifetime)  taskDescr QueryTask(taskEPR, taskID, queryDescr)  taskDescr CancelTask(taskEPR, taskID) Scheduler interface: QueryScheduler(schedulerEPR, queryDescr)  schedulerDescr

39 Generic Extensions Array operations Notifications
Query operation modifiers Idempotent message delivery semantics EPR resolution

40 Task Interface Extensions
Re-execution of failed tasks Additional & extended resource definitions Additional operations ModifyTask Additional scheduling policies Support for parallel/distributed programs Data staging Provisioning Static workflow

41 Resource Reservations
Job interface: CreateJob (schedulerEPR, resourceDescr, credentialsDescr, lifetime)  rsrvDesc QueryJob (rsrvEPR, rsrvID)  rsrvDescr ModifyJob (rsrvEPR, rsrvID, resourceDescr)  rsrvDescr CancelJob (rsrvEPR, rsrvID)

42 Multiple Schedulers Hierarchical information option
Client scheduler list AnnounceScheduler (schedulerEPR, announcerDesc)

43 Comparison of ESI to Extensible Job Submission Design
Focus of ESI: reconciliation/synthesis of Globus and Unicore Focus of EJS: extensibility


Download ppt "HPC Profile BOF Marvin Theimer Marty Humphrey Microsoft Corporation"

Similar presentations


Ads by Google