Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project

Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project http://www.globus.org/

2 Overview l Introduction to Grid Environments l The Resource Management Problem –Cross-domain applications –Resource owner goals vs. application goals l An Open Architecture to Manage Resources –Service-Level Agreement (SLA) –GRAM and Managed Services l Related and Ongoing Work

3 Grid Resource Environment l Distributed users and resources l Variable resource status l Variable grouping and connectivity l Decentralized scheduling/policy R R R R R R ? ? R R R R RR R RR ? ? R R R R R dispersed users VO-AVO-B network

4 Social/Policy Conflicts l Application Goals –Users: deadlines and availability goals –Applications: need coordinated resources l Localized Resource Owner Goals –Policies towards users –Optimization goals l Community Goals Emerge As: –An aggregate user/application? –A virtual resource? Both!

5 Data-Intensive Example l Concurrent resource requirements –Large scale storage, computing, network, graphics l Datapath involves autonomous domains

6 Early Co-Allocation in Grids l SF-Express (1997-8) –Real-time simulation –12+ supercomputers, 1400 processors l Required advance reservation –Brokered by telephone! –Globus DUROC software to sync startup –Over 45 minutes to recover from failure l In use today in MPICH-G2 (MPI library)

7 Traditional Scheduling l Closed-System Model –Presumption of global owner/authority –Sandboxed applications with no interactions –“Toss job over the fence and wait” l Utilization as Primary Metric –Deep batch queues allow tighter packing –No incentives for matching user schedule l Sub-cultures Counter Site Policies –Users learn tricks for “gaming” their site

8 An Open Negotiation Model l Resources in a Global Context –Advertisement and negotiation –Normalized remote client interface –Resource maintains autonomy l Users or Agents Bridge Resources –Drive task submission and provisioning –Coordinate acts across domains l Community-based Mediation –Coordination for collective interest

9 Community Scheduling Example l Individual users –Require service –Have application goals l Community schedulers –Broker service –Aggregate scheduling l Individual resources –Provide service –Have policy autonomy –Serve above clients

10 Negotiation Phases l Discovery –“What resources are relevant to interest?” –Finds service providers l Monitoring –“What’s happening to them now?” –Compare service providers l Service-Level Agreement –“Will they provide what I need?” –The core Resource Management problem –Process can iterate due to adaptation

11 Service-Level Agreement l Three kinds of SLA –Task submission (do something) –Resource reservation (pre-agreement) –Lazy task/resource Binding (apply resv.) l Simple protocol for negotiating SLAs –Basic 2-party negotiation >Support for basic offer/accept pattern >Optional counter-offer patterns >Variable commitment phase for stricter promises –Client may maintain multiple 2-party SLAs

12 Many Types of Service l Must support service heterogeneity –Resources >Hardware: disks, CPU, memory, networks, display… >Logical: accounts, services… >Capabilities: space, throughput… –Tasks >Data: stored file, data read/write >Compute: execution, suspended/swapped job l SLAs bear embedded term languages –Isolate domain-specific details

13 Domain Extension: File Transfer l Single goal –Reliable deadline transfer l Specialized scheduler –Brokers basic services –Synthesizes new service >Fault-handling logic l Distributed resources –Storage space –Storage bandwidth –Network bandwidth

14 Technical Challenges l Complex Security Requirements l Global Scalability –Similar ideals to Internet –Interoperable infrastructure –Policy-configurable for social needs l Permanence or “Evolve in Place” –Cannot take World off-line for service –Over time: upgrade, extend, adapt –Accept heterogeneity

15 GRAM2 JobCPUDisk Application Domain-specific SLA Incremental SLAs Information Service Local resource managers SLA implementation Planner Concrete SLA Coordinator Monitor & Discover GRAM Architecture

16 WS-Agreement l New standardization effort l Generalizes GRAM ideas –Service-oriented architecture –Resource becomes Service Provider –Tasks become Negotiated Services –SLAs presented as Agreement services l Still supports extensible domain terms

17 WS-Agreement Entities

18 WS-Agreement Adds Management

19 Virtualized Providers

20 Agreement-based Jobs l Agreement represents “queue entry” –Commitment with job parameters etc. l Agreement Provider –i.e. Job scheduler/Queuing system –Management interface to service provider l Service Provider –i.e. scheduled resource (compute nodes) l Service is the Job computation

21 Advance Reservation for Jobs l Schedule-based commitment of service –Requires schedule based SLA terms l Optional Pre-Agreement (RSLA) –Agreement to facilitate future Job Agreement –Characterizes virtual resource needed for Job –May not need full job terms l Job Agreement almost as usual –May exploit Pre-Agreement >Reference existing promise of resource schedule –May get schedule commitment in one shot >Directly include schedule terms >(Can think of as atomic advance reserve/claim)

22 Need for Complex Description l 128 physical nodes –Physical topology >Interconnect >RAM, disk size –Subject of RSLA l Single MPI job –Subject of TSLA –May reference RSLAs l Quality requirements –Real-time parameters >CPU, disk performance –Subject of BSLA

23 MDS Resource Models (History)

24 Future Models l Service behavioral descriptions –Unified service term model –Capture user/application requirements –Capture provider capabilities l Core meta-language –Facilitates planner/decision designs –Extends with domain concepts –Extensible negotiability mark-up >Capture range of negotiability for variable terms >Capture importance of terms (required/optional) >Capture cost of options (fees/penalties)

25 SLA Types in Depth l Resource SLA (RSLA), i.e. reservation –A promise of resource availability >Client must utilize promise in subsequent SLAs l Task SLA (TSLA), i.e. execution –A promise to perform a task >Complex task requirements >May reference an RSLA (implicit binding) l Binding SLA (BSLA), i.e. claim –Binds a resource capability to a TSLA >May reference an RSLA (otherwise obtain implicitly) >May be created lazily to provision the task

26 Resource Lifecycle l S0: Start with no SLAs l S1: Create SLAs –TSLA or RSLA l S2: Bind task/resource –Explicit BSLA –Implicit provider schedule l S3: Active task –Resource consumption l Backtrack to S0 –On task completion –On expiration –On failure

27 Incremental Negotiation l RSLA: reserve resources for future use l Resources change state due to SLAs and scheduler decisions l TSLA: submit task to scheduler l BSLA: bind reservation to task

28 Linking SLAs for Complex Case l Dependent SLAs nest intrinsically –BSLA2 defined in terms of RSLA2 and TSLA4 l Chained SLAs simplify negotiation –Optionally link destruction/reclamation TSLA1 RSLA1 BSLA1 TSLA2 TSLA3 Stage out Stage in time BSLA2 RSLA2 TSLA4 Net 30 GB for /scratch/tmpuser1/foo/* files Complex job 50 GB in /scratch filesystem account tmpuser1

29 Related Work l Academic Contemporaries –Condor Matchmaking –Economy-based Scheduling –Work-flow Planning l Commercial Scheduler Examples –Many examples for traditional sites >Several generalized for “the enterprise” –Platform Computing >LSF scaled to lots of jobs >MultiCluster for site-to-site resource sharing –IBM eWLM >Goal-based provisioning of transactional flows

30 Condor Matchmaking l At heart: a scheduling algorithm l Heuristics for pairing job with resource –Match symmetric “Classified Ads” –Great for bulk/commodity matching l Closed system view –Subsumes resource through “lease” –Sandboxed job environment –Favor vertical integration over generality –Tuned high-throughput system

31 Condor on GRAM l Condor already uses GRAM two ways 1.GRAM treats Condor as local scheduler 2.Condor uses GRAM to access resource l Condor maps to SLA architecture –Advertise resource ClassAd –Submit job ClassAd (as TSLA) –Matchmaker is a Community Scheduler –Need SLA scalability to be practical

32 Future Work l SLA interaction with policy –SLA negotiation subject to policy >One SLA affects another, e.g. RSLA subdivision >One client “more important” than another –SLA implemented by low-level policies >Domain-specific SLA maps to resource SLAs >Resource SLAs map to resource control mechanisms l Resource characterization –Advertisement of resources: options, cost –Interoperable capability languages

33 Conclusion l Generic SLA management –Compositional for complex scenarios –Extensible for unique requirements –Requires work on Grid service modeling >To describe jobs, resource requirements, etc. l Enhancement to proven architectures –Encompasses GRAM+GARA l Evolution of the Globus Toolkit RM –GRAM evolving since 1997 –WS-Agreement standard in progress

Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project

Similar presentations

Presentation on theme: "Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project

Similar presentations

Presentation on theme: "Grid Scheduling through Service-Level Agreement Karl Czajkowski The Globus Project"— Presentation transcript:

Similar presentations

About project

Feedback