Presentation is loading. Please wait.

Presentation is loading. Please wait.

GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.

Similar presentations


Presentation on theme: "GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL."— Presentation transcript:

1 GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL

2 SC 2009GRAM52 What is GRAM? l GRAM is a Globus Toolkit component u For Grid job management l GRAM is a unifying remote interface to Resource Managers u Yet preserves local site security/control l GRAM is for stateful job control u Reliable create operation u Asynchronous monitoring and control u Remote credential management u Remote file staging and file cleanup

3 SC 2009GRAM53 Grid Job Management Goals Provide a service to securely: l Create an environment for a job l Stage files to/from environment l Cause execution of job process(es) u Via various local resource managers l Monitor execution l Signal important state changes to client

4 SC 2009GRAM54 Traditional Interaction 4 Local Jobs Resource A Scheduler (e.g., PBS) Compute Nodes l Satisfies many use cases l TACC’s Ranger (62976 cores!) is the Costco of HTC ;-), one stop shopping, why do we need more?

5 SC 2009GRAM555 Local Jobs Resource A GRAM Service Scheduler (e.g., PBS) Compute Nodes remote GRAM Jobs GRAM API l Add remote execution capability u Enable clients/devices to manage jobs with logging into the cluster GRAM Benefit

6 SC 2009GRAM56 GRAM Benefit 6 GRAM Service Scheduler (e.g., PBS) Compute Nodes GRAM Service Scheduler (e.g., LSF) Compute Nodes Local Jobs Resource AResource B GRAM Jobs GRAM API l Provides scheduler abstraction

7 SC 2009GRAM57 GRAM Benefit 7 GRAM Sched Compute Nodes GRAM jobs l Scalable job management l Interoperablility GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM API GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes

8 SC 2009GRAM58 Users/Applications: Science Gateways, Portals, CLI scripts, App Specific Web Service, etc. Resource Managers: PBS, Condor, LSF, SGE, Loadleveler, Fork GRAM

9 SC 2009GRAM59 Higher-level Clients and User Examples

10 SC 2009GRAM510 Condor-G Architecture GRAM LSF User Job Startd Personal CondorRemote Resource Condor jobs GlideIn jobs Starter ScheddCollector & Negotiator Grid Manager Shadow Master

11 SC 2009GRAM5 GridWay Components Execution Manager Transfer Manager Information Manager Dispatch Manager Request Manager Scheduler Job PoolHost Pool DRMAA library CLI GridWay Core File Transfer Services Execution Services GridFTPRFT pre-WS GRAM WS GRAM Information Services MDS2 GLUE MDS4 Resource Discovery Resource Monitoring Resource Discovery Resource Monitoring Job Preparation Job Termination Job Migration Job Preparation Job Termination Job Migration Job Submission Job Monitoring Job Control Job Migration Job Submission Job Monitoring Job Control Job Migration

12 SC 2009GRAM512 GridWay / Condor-G Benefit 12 l Scalable job management l Throttling l Metascheduling GRAM API GridWay jobs GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes

13 SC 2009GRAM513 Architecture of Ninf-G Client GRAM / NAREGI / Condor / SSH Invoke Executable Connect back IDL file Numerical Library IDL Compiler Ninf-G Executable Generate Interface Request Interface Reply Server side Client side MDS4 / NAREGI IS Interface Information LDIF File retrieve Globus-IO / ssh / TCP Invoke Server

14 SC 2009GRAM514 caBIG and Globus l caGrid is built on top of Globus 4 WSRF Java Core and Security

15 SC 2009GRAM515 caBIG - TeraGrid Integration l Leave caGrid service infrastructure as is with the exception of the analytical services. globusglobus

16 SC 2009GRAM516 Hierarchical Clustering Results

17 SC 2009GRAM517 User Job(s) GRAM2 Architecture Diagram Job Manager ClientGatekeeper RM adapter submit Resource Manager User Job(s) Job Manager RM adapter poll Resource Manager Job Submission Job Monitoring

18 SC 2009GRAM518 User Job(s) GRAM2 Architecture Job Manager ClientGatekeeper RM adapter submit Resource Manager User Job(s) Job Manager RM adapter poll Resource Manager Job Submission Job Monitoring Job Manager RM adapter submit Job Manager RM adapter submit Job Manager RM adapter submit Job Manager RM adapter poll Job Manager RM adapter poll Job Manager RM adapter poll Unlimited

19 SC 2009GRAM519 User Job(s) GRAM5 Architecture Job Manager ClientGatekeeper RM adapter submit Resource Manager User Job(s) Job Manager Resource Manager Job Submission Job Monitoring RM adapter submit RM adapter submit Job Manager RM logSEG log SEG throttled (default 6) 1 process

20 SC 2009GRAM520 Changes Made to Improve Scalability l Removed extra listening port per job for MPIg jobs u Functionality can be re-implemented around GRAM l Removed active monitoring of stdout/err files for streaming during job execution u Instead transfer stdout/err at the end of job execution

21 SC 2009GRAM521 Improvements l New Job Manager Logging implementation l Added job exit code support l Added GRAM service version detection l Added usage statistics support l Added support for auditing of TG gateway user attribute l Updated admin, user, developer guides l Many bugs fixed

22 SC 2009GRAM522 Releases and Testing l 3 Alpha releases and 1 Beta u 2 deployments on TeraGrid l Significant scalability testing of Condor-G u Jaime Frey u Igor Sfiligoi u Gaurang Mehta l Included in GT 5.0.0 RCs l Internal functional and performance testing u http://cvs.globus.org/toolkit/docs/5.0/5.0.0/execution /gram5/qp/#id2557011

23 SC 2009GRAM523

24 SC 2009GRAM524 Next Improvement l Add support for Sun Grid Engine (SGE) adapter l Improve support for native packaging

25 SC 2009GRAM525 Thanks to the GRAM developers! l Joe Bester - ANL l Mike Link - ANL


Download ppt "GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL."

Similar presentations


Ads by Google