GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.

Slides:



Advertisements
Similar presentations
TeraGrid's GRAM Auditing & Accounting, & its Integration with the LEAD Science Gateway Stuart Martin Computation Institute, University of Chicago & Argonne.
Advertisements

1/22 Distributed Systems Architecture Research Group Universidad Complutense de Madrid Constantino Vázquez Eduardo Huedo Scaling DRMAA codes to the Grid:
Distributed Systems Architecture Research Group Universidad Complutense de Madrid EGEE UF4/OGF25 Catania, Italy March 2 nd, 2009 State and Future Plans.
Three types of remote process invocation
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
A Computation Management Agent for Multi-Institutional Grids
GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
2 nd GADA Workshop / OTM 2005 Conferences Eduardo Huedo Rubén S. Montero Ignacio M. Llorente Advanced Computing Laboratory Center for.
Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Globus activities within INFN Massimo Sgaravatto INFN Padova for the INFN Globus group
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Workload Management Massimo Sgaravatto INFN Padova.
First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
Overview of TeraGrid Resources and Usage Selim Kalayci Florida International University 07/14/2009 Note: Slides are compiled from various TeraGrid Documentations.
Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.
COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.
Scalable Systems Software Center Resource Management and Accounting Working Group Face-to-Face Meeting October 10-11, 2002.
Condor Birdbath Web Service interface to Condor
3-2.1 Topics Grid Computing Meta-schedulers –Condor-G –Gridway Distributed Resource Management Application (DRMAA) © 2010 B. Wilkinson/Clayton Ferner.
Web Services Load Leveler Enabling Autonomic Meta-Scheduling in Grid Environments Objective Enable autonomic meta-scheduling over different organizations.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid Workload Management Massimo Sgaravatto INFN Padova.
Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks S. Natarajan (CSU) C. Martín (UCM) J.L.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, An Overview of the GridWay Metascheduler.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
Report from USA Massimo Sgaravatto INFN Padova. Introduction Workload management system for productions Monte Carlo productions, data reconstructions.
Todd Tannenbaum Computer Sciences Department University of Wisconsin-Madison Condor RoadMap.
TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 NW-GRID Campus Grids Workshop Liverpool31 Oct 2007 Moving Beyond Campus Grids Steven Young Oxford NGS.
1October 9, 2001 Sun in Scientific & Engineering Computing Grid Computing with Sun Wolfgang Gentzsch Director Grid Computing Cracow Grid Workshop, November.
1 Condor BirdBath SOAP Interface to Condor Charaka Goonatilake Department of Computer Science University College London
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks, Novelties and Features around the GridWay.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Part Five: Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Information Services Andrew Brown Jon Ludwig Elvis Montero grid:seminar1:lectures:seminar-grid-1-information-services.ppt.
Resource Management Ewa Deelman.
Creating and running an application.
Pilot Factory using Schedd Glidein Barnett Chiu BNL
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
LSF Universus By Robert Stober Systems Engineer Platform Computing, Inc.
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
EGI Technical Forum Amsterdam, 16 September 2010 Sylvain Reynaud.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
CSF4 Meta-Scheduler Zhaohui Ding College of Computer Science & Technology Jilin University.
Dynamic Deployment of VO Specific Condor Scheduler using GT4
Management of Virtual Machines in Grids Infrastructures
Management of Virtual Machines in Grids Infrastructures
Condor-G: An Update.
Presentation transcript:

GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL

SC 2009GRAM52 What is GRAM? l GRAM is a Globus Toolkit component u For Grid job management l GRAM is a unifying remote interface to Resource Managers u Yet preserves local site security/control l GRAM is for stateful job control u Reliable create operation u Asynchronous monitoring and control u Remote credential management u Remote file staging and file cleanup

SC 2009GRAM53 Grid Job Management Goals Provide a service to securely: l Create an environment for a job l Stage files to/from environment l Cause execution of job process(es) u Via various local resource managers l Monitor execution l Signal important state changes to client

SC 2009GRAM54 Traditional Interaction 4 Local Jobs Resource A Scheduler (e.g., PBS) Compute Nodes l Satisfies many use cases l TACC’s Ranger (62976 cores!) is the Costco of HTC ;-), one stop shopping, why do we need more?

SC 2009GRAM555 Local Jobs Resource A GRAM Service Scheduler (e.g., PBS) Compute Nodes remote GRAM Jobs GRAM API l Add remote execution capability u Enable clients/devices to manage jobs with logging into the cluster GRAM Benefit

SC 2009GRAM56 GRAM Benefit 6 GRAM Service Scheduler (e.g., PBS) Compute Nodes GRAM Service Scheduler (e.g., LSF) Compute Nodes Local Jobs Resource AResource B GRAM Jobs GRAM API l Provides scheduler abstraction

SC 2009GRAM57 GRAM Benefit 7 GRAM Sched Compute Nodes GRAM jobs l Scalable job management l Interoperablility GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM API GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes

SC 2009GRAM58 Users/Applications: Science Gateways, Portals, CLI scripts, App Specific Web Service, etc. Resource Managers: PBS, Condor, LSF, SGE, Loadleveler, Fork GRAM

SC 2009GRAM59 Higher-level Clients and User Examples

SC 2009GRAM510 Condor-G Architecture GRAM LSF User Job Startd Personal CondorRemote Resource Condor jobs GlideIn jobs Starter ScheddCollector & Negotiator Grid Manager Shadow Master

SC 2009GRAM5 GridWay Components Execution Manager Transfer Manager Information Manager Dispatch Manager Request Manager Scheduler Job PoolHost Pool DRMAA library CLI GridWay Core File Transfer Services Execution Services GridFTPRFT pre-WS GRAM WS GRAM Information Services MDS2 GLUE MDS4 Resource Discovery Resource Monitoring Resource Discovery Resource Monitoring Job Preparation Job Termination Job Migration Job Preparation Job Termination Job Migration Job Submission Job Monitoring Job Control Job Migration Job Submission Job Monitoring Job Control Job Migration

SC 2009GRAM512 GridWay / Condor-G Benefit 12 l Scalable job management l Throttling l Metascheduling GRAM API GridWay jobs GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes GRAM Sched Compute Nodes

SC 2009GRAM513 Architecture of Ninf-G Client GRAM / NAREGI / Condor / SSH Invoke Executable Connect back IDL file Numerical Library IDL Compiler Ninf-G Executable Generate Interface Request Interface Reply Server side Client side MDS4 / NAREGI IS Interface Information LDIF File retrieve Globus-IO / ssh / TCP Invoke Server

SC 2009GRAM514 caBIG and Globus l caGrid is built on top of Globus 4 WSRF Java Core and Security

SC 2009GRAM515 caBIG - TeraGrid Integration l Leave caGrid service infrastructure as is with the exception of the analytical services. globusglobus

SC 2009GRAM516 Hierarchical Clustering Results

SC 2009GRAM517 User Job(s) GRAM2 Architecture Diagram Job Manager ClientGatekeeper RM adapter submit Resource Manager User Job(s) Job Manager RM adapter poll Resource Manager Job Submission Job Monitoring

SC 2009GRAM518 User Job(s) GRAM2 Architecture Job Manager ClientGatekeeper RM adapter submit Resource Manager User Job(s) Job Manager RM adapter poll Resource Manager Job Submission Job Monitoring Job Manager RM adapter submit Job Manager RM adapter submit Job Manager RM adapter submit Job Manager RM adapter poll Job Manager RM adapter poll Job Manager RM adapter poll Unlimited

SC 2009GRAM519 User Job(s) GRAM5 Architecture Job Manager ClientGatekeeper RM adapter submit Resource Manager User Job(s) Job Manager Resource Manager Job Submission Job Monitoring RM adapter submit RM adapter submit Job Manager RM logSEG log SEG throttled (default 6) 1 process

SC 2009GRAM520 Changes Made to Improve Scalability l Removed extra listening port per job for MPIg jobs u Functionality can be re-implemented around GRAM l Removed active monitoring of stdout/err files for streaming during job execution u Instead transfer stdout/err at the end of job execution

SC 2009GRAM521 Improvements l New Job Manager Logging implementation l Added job exit code support l Added GRAM service version detection l Added usage statistics support l Added support for auditing of TG gateway user attribute l Updated admin, user, developer guides l Many bugs fixed

SC 2009GRAM522 Releases and Testing l 3 Alpha releases and 1 Beta u 2 deployments on TeraGrid l Significant scalability testing of Condor-G u Jaime Frey u Igor Sfiligoi u Gaurang Mehta l Included in GT RCs l Internal functional and performance testing u /gram5/qp/#id

SC 2009GRAM523

SC 2009GRAM524 Next Improvement l Add support for Sun Grid Engine (SGE) adapter l Improve support for native packaging

SC 2009GRAM525 Thanks to the GRAM developers! l Joe Bester - ANL l Mike Link - ANL