Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen

Slides:

Advertisements

Similar presentations

Agreement-based Distributed Resource Management Alain Andrieux Karl Czajkowski.

Advertisements

Grid Quality of Service and Service Level Agreements Karim Djemame University of Leeds.

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.

A Computation Management Agent for Multi-Institutional Grids

Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.

Dr. David Wallom Use of Condor in our Campus Grid and the University September 2004.

USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.

Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.

Workload Management Massimo Sgaravatto INFN Padova.

First steps implementing a High Throughput workload management system Massimo Sgaravatto INFN Padova

Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group

Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Grid Monitoring By Zoran Obradovic CSE-510 October 2007.

The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.

6d.1 Schedulers and Resource Brokers ITCS 4010 Grid Computing, 2005, UNC-Charlotte, B. Wilkinson.

Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.

Job Submission Condor, Globus, Java CoG Kit Young Suk Moon.

Grid Resource Allocation and Management (GRAM) Execution management Execution management –Deployment, scheduling and monitoring Community Scheduler Framework.

Grid Computing I CONDOR.

COMP3019 Coursework: Introduction to GridSAM Steve Crouch School of Electronics and Computer Science.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Condor Birdbath Web Service interface to Condor

DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.

Rochester Institute of Technology Job Submission Andrew Pangborn & Myles Maxfield 10/19/2015Service Oriented Cyberinfrastructure Lab,

CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei

Grid Workload Management Massimo Sgaravatto INFN Padova.

Condor: High-throughput Computing From Clusters to Grid Computing P. Kacsuk – M. Livny MTA SYTAKI – Univ. of Wisconsin-Madison

NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.

Tool Integration with Data and Computation Grid GWE - “Grid Wizard Enterprise”

Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

Grid Security: Authentication Most Grids rely on a Public Key Infrastructure system for issuing credentials. Users are issued long term public and private.

July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.

Review of Condor,SGE,LSF,PBS

GridLab Resource Management System (GRMS) Jarek Nabrzyski GridLab Project Coordinator Poznań Supercomputing and.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.

EGEE is a project funded by the European Union under contract IST WS-Based Advance Reservation and Co-allocation Architecture Proposal T.Ferrari,

Introduction to Grid Computing and its components.

Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.

GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.

Condor Services for the Global Grid: Interoperability between OGSA and Condor Clovis Chapman 1, Paul Wilson 2, Todd Tannenbaum 3, Matthew Farrellee 3,

Tool Integration with Data and Computation Grid “Grid Wizard 2”

INFSO-RI Enabling Grids for E-sciencE Grid Services for Resource Reservation and Allocation Tiziana Ferrari Istituto Nazionale di.

Grid Compute Resources and Job Management. 2 Grid middleware - “glues” all pieces together Offers services that couple users with remote resources through.

Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group

Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.

EGEE 3 rd conference - Athens – 20/04/2005 CREAM JDL vs JSDL Massimo Sgaravatto INFN - Padova.

CSF. © Platform Computing Inc CSF – Community Scheduler Framework Not a Platform product Contributed enhancement to The Globus Toolkit Standards.

First evaluation of the Globus GRAM service Massimo Sgaravatto INFN Padova.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

A System for Monitoring and Management of Computational Grids Warren Smith Computer Sciences Corporation NASA Ames Research Center.

Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.

Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Workload Management Workpackage

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

Management of Virtual Execution Environments 3 June 2008

Basic Grid Projects – Condor (Part I)

Wide Area Workload Management Work Package DATAGRID project

Service Oriented Architecture (SOA)

Resource and Service Management on the Grid

The Anatomy and The Physiology of the Grid

Presentation transcript:

Grid Computing 7700 Fall 2005 Lecture 17: Resource Management Gabrielle Allen

Miron Livny Seminar Professor Miron Livny Condor Project Computer Sciences Department University of Wisconsin - Madison Thursday, November 3, 2005, 11:00 a.m. Johnston Hall, Room 338

Reading  Chapter 18 “Resource and Service Management”, The Grid 2  Grid Resource Management: State of the Art and Future Trends

Recap  Globus GRAM: –Grid Resource Allocation and Management –data staging, delegation of proxy credentials, and computation monitoring and management  Grid Application Toolkit: –Abstract high level API to resource management

GRAM

GAT APIs to …  Form software and hardware descriptions  Discover resources  Submit and handle jobs Interface to any grid resource management system.

Basic Problem  Resource Management: –Discover resources –Allocate resources –Utilize resources –Monitor use of resources  Resources are: –Computational services provided by hardware –Application services provided by software –Bandwidth on a network –Storage space on a storage system

Local Resource Management  Much work has already been done: –Batch schedulers –Workflow engines –Operating systems  Characteristics: –Have complete control of a resource and can implement mechanisms and policies for effective use.

Grid Resource Management  Characterised by: –Different administrative domains –Heterogeneity of interfaces –Differing policies  Need –Standard protocols –Standard semantics for resources and task requirements

GRM: Requirements  Task submission  Workload management  On demand access (advance reservation)  Co-scheduling  Resource brokering  Transactions and QoS

General Resource Management Framework

Service Level Agreements  Resource providers “contract” in some way with a client through negotiations to provide some capability with a certain QoS  SLAs state the terms of the agreement between the resource provider and resource user –Abstracts external grid usage from local use and policies  Three different types of agreement: –Task TSLA: performance of activity or task e.g. a TSLA is created by submitting a job description to a queuing system –Resource RSLA: right to consume a resource without specifying what it will be used for (eg advance reservation) –Binding BSLA: application of a resource to a task (eg binding network bandwidth to a socket, or a number of nodes to a parallel job). BSLA associated a task defined by a TSLA to a RSLA.

Job/Task Description  For example, JSDL (GGF working group), (also ClassAds, RSL)  The Job Submission Description Language (JSDL) is a language for describing the requirements of computational jobs for submission to resources. The JSDL language contains a vocabulary and XML Schema that facilitate the expression of those requirements as a set of XML elements.  Motivations: –Grids accommodate a variety of job management systems, where each system has its own language for describing job submission requirements, makes interoperability difficult. –Descriptions may be passed between systems or instantiated on resources matching the resource requirements for that job. All these interactions can be undertaken automatically, facilitated by a standard language that can be easily translated to each system’s own language.  Additional JSDL consumers include: accounting systems, security systems, archiving systems, and provenance (auditing) systems.  JSDL 1.0 provides the core vocabulary for describing a job for submission to Grid environments.

Job/Task Description Job Manager Super-scheduler, or Broker, or … A Grid Information System Local Information System WS intermediary DRM Local Resource (e.g., super- computer) Local Resource (e.g., super- computer) Another JSDL System J J J J J J JSDL document: WS Client J J From JSDL specification

JSDL  Description of jobs only at submission time: –No information about job identifiers or job status (job management) –Does not describe relationships between jobs (workflow) –Does not describe policies –Does not describe scheduling –etc

JSDL  Contains types for –Processor architecture (sparc, powerpc, …) –File system (swaom temporary, spool, normal, …) –Operating system (WINNT, LINUX, …) –File creation (overwrite, append, …) –Ranges  Core elements (just a sample) –Application name, version, description, resource requirements (filesystem, architecture, CPU speed, CPU time, virtual memory, diskspace), candidate hosts, required filesystems, delete file on terminate

Resource Description  Need a language for –Clients to request resources –Servers to describe their resources  Resource Description Languages –Some resources are configurable –Need to include lifetimes  Examples: –RSL –ClassAds –CIM

RSL  Globus Resource Specification Language

ClassAds  Condor Classified Advertisement Language  Represents characteristics of hosts (and jobs)  A ClassAd is a set of uniquely named expressions --- called “attributes”

CIM  Common Information Model (CIM) –Standard designed by the Distributed Management Task Force  Information about systems, networks, applications and services  Becoming widely adopted in industry

Resource Discovery and Selection  Discovery: query to identify resources where characteristics and state match thoses required (no commitment)  Selection: choose based on different criteria (commit)

Task Management  Monitor task status while executing, and status of managed resource (SLA status)  Task status: queued, pending, running, terminated, etc.  Change state of current SLAs or negotiate additional agreements –Terminate an SLA (application error, better resource found, unsatisfactory performance) –Extend SLA lifetime (task taking longer than expected) –Change SLA details (less/more disk space, QoS requirements change) –Create a new SLA

Grid Resource Management Systems  Thus far these are pretty limited, one reason for this is because the local resource management systems are limited (eg no advance reservations)  Examples: –Globus GRAM –General-purpose architecture for Reservation and Allocation (GARA) –Condor –Sun Grid Engine, PBS, Load Leveller, LSF.

For any system …  What platforms does it support?  What is the architecture?  Open source or cost?  Job life cycle  Security  Job management (queues, job types (MPI, batch, interactive), checkpointing, file transfer)  Resource maangement (tracking, reservations)  Scheduling policies  Resource matching

GRAM  Does not implement local resource management functionality itself  Interfaces to e.g. SGE, PBS, LSF, Load Leveller, Condor.  Does not have advance reservation, but coallocation is supported via DUROC broker (eg need dedicated queues or exclusive access)  Uses RSL both as a resource description language and for task descriptions.  (GARA generalized GRAM to provide advanced reservation)

Condor  High throughput scheduler  Research project from University of Wisconsin- Madison  Uses ClassAds for discovery and matching of resources with tasks  Resources are organized into “condor pools” with one central manager (master) and arbitrary number of execution hosts (workers)  Handles file transfer to and from submission host  Has job priorities and various scheduling algorithms

Condor Condor Pool Central manager Execute host Execute host Execute host Submission host Submission host Jobs

Condor  Execution hosts: –Advertise information about resource to central manager –Enforce policies of the resource owners –Start and monitor jobs  Submission hosts: –Advertise job requests –Manage jobs running in the Condor pool –Handle checkpoints etc  Manager: –Monitors information about the Condor pool –Matches resources with job requests –Handles cycle scavenging (watches for idle machines)

Condor and Globus  Can use Globus GRAM to submit to Condor pool –Use condor job manager  Can use Condor to submit to a Globus machine –Condor-G –Job descriptions are converted to RSL

Resource Brokers/Metaschedulers  Virtualize interface to sets of resources  Broker acts as an intermediary between VO and a set of resources –To date mainly for computational jobs and workflows  Provides: –Simplified view of resources –Policy enforcement (community based) –Protocol conversion  Metascheduling algorithms (usually for Grid the resource broker does not actually control the resources)  Examples: –Sun Grid Engine, Platform LSF, PBSPro, GRMS

Grid Scheduling  Three main paradigms: –Centralized, hierarchical, and distributed  Centralized: –Central machine acts a resource manager. Scheduler has all necessary and up to date information. Does not scale well. Single point of failure  Distributed: –Better scaling properties, fault tolerance and reliability. Lack of current information usually leads to suboptimal decisions. –Can be Direct or Indirect communication.  Hierarchical: –Can have scalability and communication bottlenecks. Global and local schedulers can have different policies.

CSF  The Community Scheduler Framework 4.0 (CSF4.0) is a WSRF-compliant Grid level meta-scheduling framework built upon the Globus Toolkit. CSF provides interface and tools for Grid users to submit jobs, create advanced reservations and define different scheduling policies at the Grid level. Using CSF, Grid users are able to access different resource managers, such as LSF, PBS, Condor and SGE, via a single interface.

GRMS  Grid Resource Management System  From GridLab project  Built on top of Globus  Multicriteria matchmaking  Workflow management

SNAP  Future directions for grid resource management: –Service oriented architecture –General services (not just hardware) –Provisioned rather than best-effort  Suggest a SLA-based resource management model independent of particular services –Protocols used to negotiate these SLAs: Service Negotiation and Acquisition Protocol (SNAP)  Grid Resource Allocation Agreement Protocol working group of GGF.