Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

Slides:



Advertisements
Similar presentations
CSF4 Meta-Scheduler Tutorial 1st PRAGMA Institute Zhaohui Ding or
Advertisements

GT4 Architectural Security Review December 17th, 2004.
Legacy code support for commercial production Grids G.Terstyanszky, T. Kiss, T. Delaitre, S. Winter School of Informatics, University.
C. Grimme, A. Papaspyrou Scheduling in C3-Grid AstroGrid-D Workshop Project: C3-Grid Collaborative Climate Community Data and Processing Grid Scheduling.
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
CSF4, SGE and Gfarm Integration Zhaohui Ding Jilin University.
High Performance Computing Course Notes Grid Computing.
Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.
A Computation Management Agent for Multi-Institutional Grids
MTA SZTAKI Hungarian Academy of Sciences Grid Computing Course Porto, January Introduction to Grid portals Gergely Sipos
Seminar Grid Computing ‘05 Hui Li Sep 19, Overview Brief Introduction Presentations Projects Remarks.
USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:
Globus Toolkit 4 hands-on Gergely Sipos, Gábor Kecskeméti MTA SZTAKI
The Globus Toolkit Gary Jackson. Introduction The Globus Toolkit is a product of the Globus Alliance ( It is middleware for developing.
Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.
6a.1 Globus Toolkit Execution Management. Data Management Security Common Runtime Execution Management Information Services Web Services Components Non-WS.
1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.
4b.1 Grid Computing Software Components of Globus 4.0 ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson, slides 4b.
Workload Management Massimo Sgaravatto INFN Padova.
The Open Grid Service Architecture (OGSA) Standard for Grid Computing Prepared by: Haoliang Robin Yu.
Globus Computing Infrustructure Software Globus Toolkit 11-2.
Globus 4 Guy Warner NeSC Training.
Kate Keahey Argonne National Laboratory University of Chicago Globus Toolkit® 4: from common Grid protocols to virtualization.
Overview of TeraGrid Resources and Usage Selim Kalayci Florida International University 07/14/2009 Note: Slides are compiled from various TeraGrid Documentations.
Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen
OSG End User Tools Overview OSG Grid school – March 19, 2009 Marco Mambelli - University of Chicago A brief summary about the system.
Connecting OurGrid & GridSAM A Short Overview. Content Goals OurGrid: architecture overview OurGrid: short overview GridSAM: short overview GridSAM: example.
Ashok Agarwal 1 BaBar MC Production on the Canadian Grid using a Web Services Approach Ashok Agarwal, Ron Desmarais, Ian Gable, Sergey Popov, Sydney Schaffer,
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
OPEN GRID SERVICES ARCHITECTURE AND GLOBUS TOOLKIT 4
High Performance Louisiana State University - LONI HPC Enablement Workshop – LaTech University,
Dynamic Firewalls and Service Deployment Models for Grid Environments Gian Luca Volpato, Christian Grimm RRZN – Leibniz Universität Hannover Cracow Grid.
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Grids and Portals for VLAB Marlon Pierce Community Grids Lab Indiana University.
Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.
GRAM5 - A sustainable, scalable, reliable GRAM service Stuart Martin - UC/ANL.
Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Grid Workload Management Massimo Sgaravatto INFN Padova.
- Distributed Analysis (07may02 - USA Grid SW BNL) Distributed Processing Craig E. Tull HCG/NERSC/LBNL (US) ATLAS Grid Software.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Evaluation of Agent Teamwork High Performance Distributed Computing Middleware. Solomon Lane Agent Teamwork Research Assistant October 2006 – March 2007.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,
Institute For Digital Research and Education Implementation of the UCLA Grid Using the Globus Toolkit Grid Center’s 2005 Community Workshop University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Resource Management Ewa Deelman.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Proposal for a IS schema Massimo Sgaravatto INFN Padova.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
Job Submission with Globus, Condor, and Condor-G Selim Kalayci Florida International University 07/21/2009 Note: Slides are compiled from various TeraGrid.
Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.
Data Manipulation with Globus Toolkit Ivan Ivanovski TU München,
Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.
Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI UMD Roadmap Steven Newhouse 14/09/2010.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Parallel Computing Globus Toolkit – Grid Ayaka Ohira.
Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016
Example: Rapid Atmospheric Modeling System, ColoState U
The Open Grid Service Architecture (OGSA) Standard for Grid Computing
Grid Computing Software Interface
Condor-G: An Update.
Presentation transcript:

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework (CSF): Provides a single interface to different resource schedulers. Community Scheduler Framework (CSF): Provides a single interface to different resource schedulers. –PBS, Condor(G). Workspace management Workspace management –Dynamically create and manage workspaces on remote hosts. Grid Telecontrol Protocol Grid Telecontrol Protocol –WSRF-enabled service interface for control of remote instruments. Remote goldfish surgical procedures. Remote goldfish surgical procedures.

Jobs are computational tasks that may perform input/output operations while running. Jobs are computational tasks that may perform input/output operations while running. Affect the state of the computational resource and its associated file systems. Affect the state of the computational resource and its associated file systems. May require coordinated staging of data into the resource prior to job execution and out of the resource following execution. May require coordinated staging of data into the resource prior to job execution and out of the resource following execution. Some users, particularly interactive ones, benefit from accessing output data files as the job is running. Monitoring consists of querying and subscribing for status information such as job state changes. Some users, particularly interactive ones, benefit from accessing output data files as the job is running. Monitoring consists of querying and subscribing for status information such as job state changes.

Monitoring consists of querying and subscribing for status information such as job state changes. Monitoring consists of querying and subscribing for status information such as job state changes. Operated under the control of a scheduler which implements allocation and prioritization policies (i.e., priorities). Operated under the control of a scheduler which implements allocation and prioritization policies (i.e., priorities). GRAM is not a resource scheduler but a protocol engine for communicating with different local resource schedulers. GRAM is not a resource scheduler but a protocol engine for communicating with different local resource schedulers.

Conceptual Details Targeted Job Types Targeted Job Types –Not “RPC” –reliable operation, stateful monitoring, credential management, and file staging are important (i.e., the performance is horrible so only use if necessary).

Component Architecture Based on Component architecture Based on Component architecture –Job management services represent, monitor, and control the overall job life cycle. These services are the job-management specific software provided by the GRAM solution. represent, monitor, and control the overall job life cycle. These services are the job-management specific software provided by the GRAM solution. –File transfer services support staging of files into and out of compute resources. support staging of files into and out of compute resources.

Component Architecture –Credential management services are used to control the delegation of rights among distributed elements of the GRAM architecture based on users' application requirements. are used to control the delegation of rights among distributed elements of the GRAM architecture based on users' application requirements.

Security Secure Operation Secure Operation –WS GRAM utilizes WSRF functionality to provide for authentication of job management requests as well as to protect job requests from malicious interference. Local System protection domains Local System protection domains –jobs are executed in appropriate local security contexts e.g. under specific Unix user IDs based on details of the job request and authorization policies. e.g. under specific Unix user IDs based on details of the job request and authorization policies.

Credential delegation and management Credential delegation and management –Client may delegate some of its rights to GRAM services e.g. rights for GRAM to access data on a remote storage element as part of the job execution. e.g. rights for GRAM to access data on a remote storage element as part of the job execution. Audit Audit –To assist with normal accounting functions as well as to further mitigate risks from abuse or malfunction.

Job Management Reliable job submission. Reliable job submission. –“at most once” semantics Job Cancellation Job Cancellation –a mechanism for clients to cancel (abort) their jobs at any point in the job life cycle.

Data Management Reliable Data Staging Reliable Data Staging –reliable, high-performance transfers of files between the compute resource and external (gridftp) data storage elements before and after the job execution. Output Monitoring Output Monitoring –mechanism for incrementally transferring output file contents from the computation resource while the job is running.

Task Coordination Parallel Jobs Parallel Jobs Task rendezvous Task rendezvous –mechanism for task rendezvous which job applications may use if they do not have another more appropriate solution –Usually done in MPI

WS-GRAM (Web Services version). WS-GRAM (Web Services version). Designed to support job execution with coordinated file staging. Designed to support job execution with coordinated file staging. Uses a set of Web services in the GT4 WSRF core. Uses a set of Web services in the GT4 WSRF core. –ManagedJob: Provides interface to monitor the status of the job, terminate. Each submitted job is a distinct resource. –ManagedJobFactory: Interface to create ManagedJob resources of appropriate type to perform a job in that local scheduler. ManagedJob resource creation ManagedJobFactory::createManagedJob invocation. ManagedJob resource creation ManagedJobFactory::createManagedJob invocation.

Creation of Job Creation of Job –ManagedJobFactory::createManagedJob invocation. –A meaningful WS GRAM client MUST create a job that will then go through a life cycle where it eventually completes execution and the resource is eventually destroyed Optional Staging Credentials Optional Staging Credentials –Must be performed before call to createMnagedJob Optional Job Credential Optional Job Credential –Store into user account for use by job process.

Optional Credential Refresh Optional Credential Refresh –Credentials delegated may be refreshed. Optional Hold of Cleanup Optional Hold of Cleanup –User wants to directly access output files without waiting for stage-out. ManagedJob Destruction ManagedJob Destruction –Can explicitly destroy job.

Globus Toolkit Components used by WS GRAM Reliable File Transfer (RFT) Reliable File Transfer (RFT) –For file staging before and after job completes. GridFTP GridFTP –Supports retry –Partial file transfer –3 rd party file transfer

GridFTP FOO1 FOO2

GridFTP FOO1 FOO2

Delegation Services Can delegate credentials to any service that is deployed in the same container as the service. Can delegate credentials to any service that is deployed in the same container as the service. –Tells delegation service it wants to delegate its credentials. –The service that wants to use them must contact the delegation service to acquire them.

External Components Used by WS GRAM Local job scheduler: Local job scheduler: –PBS, LSF, Condor Sudo Sudo –Access to user accounts without having root privilege.