Economic-based Resource Management and Scheduling for Global/Grid Computing Project Team: David Abramson Rajkumar Buyya Jonathan Giddy Rajkumar Buyya

Slides:

Advertisements

Similar presentations

Nimrod/G GRID Resource Broker and Computational Economy

Advertisements

Nimrod/G and Grid Market A Case for Economy Grid Architecture for Service Oriented Global Grid Computing Rajkumar Buyya, David Abramson, Jon Giddy Monash.

Computational Grids and Computational Economy: Nimrod/G Approach David Abramson Rajkumar Buyya Jonathan Giddy.

Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)

1 Project Overview EconomyGrid Economic Paradigm For “Resource Management and Scheduling” for Service-Oriented Grid Computing Presenter Name: Sama GovindaRamanujam.

High Performance Computing Course Notes Grid Computing.

Cloud Computing to Satisfy Peak Capacity Needs Case Study.

Resource Management of Grid Computing

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.

1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.

Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.

Gridbus Resource Broker for Application Service Costs-based Scheduling on Global Grids: A Case Study in Brain Activity Analysis Srikumar Venugopal 1, Rajkumar.

Economy Grid: A New e-Paradigm for Grid/Internet Computing GAURAV GUNJAN JHA - BTECH/COMP.SCIENCE BIT MESRA, INDIA.

Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.

DISTRIBUTED COMPUTING

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

1 520 Student Presentation GridSim – Grid Modeling and Simulation Toolkit.

Nimrod & NetSolve Sathish Vadhiyar. Nimrod Sources/Credits: Nimrod web site & papers.

GRID RESOUCE MANAGEMENT Pham Thanh Toan Trinh Quoc Huy Pham The Anh 10/18/

The Globus Project: A Status Report Ian Foster Carl Kesselman

October 18, 2005 Charm++ Workshop Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming.

Issues in (Financial) High Performance Computing John Darlington Director Imperial College Internet Centre Fast Financial Algorithms and Computing 4th.

1 4/23/2007 Introduction to Grid computing Sunil Avutu Graduate Student Dept.of Computer Science.

Perspectives on Grid Technology Ian Foster Argonne National Laboratory The University of Chicago.

Tools for collaboration How to share your duck tales…

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.

Authors: Rajkumar Buyya, David Abramson & Jonathan Giddy

GridLab Resource Management System (GRMS) Jarek Nabrzyski GridLab Project Coordinator Poznań Supercomputing and.

Economic and On Demand Brain Activity Analysis on Global Grids A case study.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.

Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.

GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Presentation agenda Introduction.Background.Definition. Why it is? How it works? Applications Entry to Grid Adv. & Dis adv. Conclusion.

System Software Laboratory Databases and the Grid by Paul Watson University of Newcastle Grid Computing: Making the Global Infrastructure a Reality June.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Grid Resource Management, Scheduling and Computational Economy Nimrod/G Experience + GRACE proposal Project Team: David Abramson Rajkumar Buyya Jonathan.

Md Baitul Al Sadi, Isaac J. Cushman, Lei Chen, Rami J. Haddad

Grid and Cloud Computing

Workload Management Workpackage

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

David Abramson, Rajkumar Buyya, and Jonathan Giddy

Clouds , Grids and Clusters

Introduction to Load Balancing:

Globus —— Toolkits for Grid Computing

Cloud Management Mechanisms

Rajkumar Buyya School of Computer Science and Software Engineering

Grid Computing.

University of Technology

GRID COMPUTING PRESENTED BY : Richa Chaudhary.

Introduction to Cloud Computing

GGF15 – Grids and Network Virtualization

CSS490 Grid Computing Textbook No Corresponding Chapter

Cloud Management Mechanisms

The Globus Toolkit™: Information Services

CLUSTER COMPUTING.

Subject Name: Operating System Concepts Subject Number:

Building and running HPC apps in Windows Azure

Resource and Service Management on the Grid

The Anatomy and The Physiology of the Grid

LO2 – Understand Computer Software

The Anatomy and The Physiology of the Grid

Grid Computing Software Interface

L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher

Presentation transcript:

Economic-based Resource Management and Scheduling for Global/Grid Computing Project Team: David Abramson Rajkumar Buyya Jonathan Giddy Rajkumar Buyya

Agenda  Computing Platforms and How Grid is Different ?  Resource Management Issues  Resource Management Architectures  Nimrod/G Broker  Computational Economy  Scheduling Algorithms  Limitations of current Nimrod/G  Proposal “GRACE - Grid Architecture of Computational Economy”  GRACE Negotiation Protocols and APIs  Conclusion

Computing Power (HPC) Drivers Solving grand challenging problem/applications using computer modeling, simulation and analysis Life Sciences Design & Analysis (CAD/CAM) Aerospace Geographic Information Systems Geographic Information Systems Military Applications

2100 Single Processor Shared Memory Local Cluster Global Cluster/Grid PERFORMANCEPERFORMANCE Computing Platforms Breaking Administrative Barriers Inter Planet Cluster/Grid ?? Individual Group Department Campus State National Globe Inter Planet Universe Administrative Barriers

Towards Grid Computing…. For illustration, placed resources arbitrarily on the GUSTO test-bed!!

What is Grid ?  An infrastructure that couples – Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops, notebooks, mobile computers, PDA, and so on) … – Software ? (e.g., renting expensive special purpose applications on demand) – Databases (e.g., transparent access to human genome database) – Special Instruments (e.g., radio Searching for Life in galaxy, for pulsars) – People (may be even animals who knows ?)  across the Internet and presents them as an unified integrated (single) resource.

Grid Application-Drivers  Old and New applications getting enabled due to coupling of computers, databases, instruments, people, etc: – (distributed) Supercomputing – Collaborative engineering – high-throughput computing – large scale simulation & parameter studies – Remote software access / Renting Software – Data-intensive computing – On-demand computing

The Grid Vision: To offer “Dependable, consistent, pervasive access to [high-end] resources”  Dependable: Can provide performance and functionality guarantees  Consistent: Uniform interfaces to a wide variety of resources  Pervasive: Ability to “plug in” from anywhere Source:

The Grid Impact! “ “The global computational grid is expected to drive the economy of the 21st century similar to the electric power grid that drove the economy of the 20th century”

Sources of Complexity in Grid Resource Management  No single adminstrative control  No single policy – each resource owners have their own policies or scheduling mechanisms. – Users must honor them (particularly external users of the grid).  Heterogenity of resources (static and dynamic)  Unreliable - resource may come or disappear (die)  No single cost model (it cannot be) – varies from one user to another and time to time.  No Single access mechanism

Domain 2 Domain 1 Grid Resource Management: Challenging Issues Ack.: globus.. Authentication (once) Specify simulation (code, resources, etc.) Discover resources Negotiate authorization, acceptable use, Cost, etc. Acquire resources Schedule Jobs Initiate computation Steer computation Access remote data-sets Collaborate on results Account for usage

Grid Components and our Focus: Resource Brokers and Computational Economy Applications Core Services MDS GRAM Globus Security Interface Heartbeat Monitor Nexus Gloperf Local Services LSF CondorMPI NQEEasy TCP SolarisIrixAIX UDP High-level Services and Tools DUROCglobusrunMPI Nimrod/G MPI-IOCC++ GlobusViewTestbed Status GASS Source: Globus GRACE GARA Grid Fabric Grid Apps. Grid Middleware Grid Tools

Resource Management Strawman’s proposed at Grid Forum (GF-SCHED WG)*  Strawman 1: Too much explicitly mentions and tries to capture all current schedulers features. - Steve Chapin et. al  Strawman 2: too much abstract (you need to think yourself to understand it). - Jenny Schopf et. al  AO (Abstract Owner) emphasizes very much on order- delivery approach, looks too ideal and captures real world model, however, I doubt currently any systems support this grand image (hard to real at this moment). - Dave DiNucci  What we discuss here sounds like “the mixture of all the above through Nimrod/G and proposed GRACE middleware services.” - I plan to propose one soon !! *

Access ontrol Agent A Grid Resource Management Architecture (Strawman 1) Deployment Agent (state info. +reservation+bid+… services) Connection Cloud User Global Scheduler Control Domain Monitor Resource Grid Information Service Access/Admission Control Agent Domain Resource Manager or Control Agent Global Scheduler Global Scheduler Global Scheduler Global Scheduler Local Scheduler - Task Persistent Job Control Agent

Access ontrol Agent A Grid Resource Management Architecture (Strawman 2) Mapper Commit AgentDeploy Agent Job & Resource Info. Can I Map Task to Resource? Y/N Map T to R Mon, kill/restat Monitoring info. If required, talk to lower-level scheduler to answer query If required, talk to lower-level scheduler or resource manager to do the job. Lower Level Scheduler is expected to have similar architecture Higher Level Scheduler :

Access ontrol Agent A Grid Resource Management Architecture (AO, Abstract Owner) Abstract Owner Pickup Window Order Window Resource Manager Pickup Window Order Window Physical Resource Manager Pickup Window Order Window Delivery RepSales Rep AO1 Broker is Resource Owner: Broker is Not Resource Owner: Job Shop ResultJob AO for Grid GRID User

Nimrod/G Resource Broker Nimrod/G Approach to Resource Management and Scheduling

 A global scheduler for managing and steering task farming (parametric simulation) applications on computational grid based on deadline and computational economy.  Key Features – A single window to manage and control whole experiment – Resource Discovery – Trade for Resources – Scheduling – Steering & data management  Leverages Globus Services  Other parallel application models can be supported easily (MPI & DUROC co-allocation) What is Nimrod/G ?

A Nimrod/G User Console CostDeadline AvailableMachines

Nimrod/G Architecture Grid Middleware Services Dispatcher Nimrod/G Client Grid Information Services Schedule Advisor Resource Discovery Parametric Engine GUSTO Test Bed Persistent Info. Grid Explorer

Nimrod/G Interactions MDS server Resource location Queuing System GRAM server Resource allocation (local) Additional services used implicitly: GSI (authentication & authorization) Nexus (communication) User process File access GASS server Gatekeeper node Job Wrapper Computational node Dispatcher Root node Scheduler Prmtc.. Engine

Computational Economy  Resource selection on based real money and market based  A large number of sellers and buyers (resources may be dedicated/shared)  Negotiation: tenders/bids and select those offers meet the requirement  Trading and Advance Resource Reservation  Schedule computations on those resources that meet all requirements

Global resource allocation: Meet Deadline, but minimize the cost  Global information is hard to get and out of date – Load balancing – Fairness to multiple users  Global limits are easy to set and fairly stable – Non equal number of jobs assigned to resources – Load profiling – Cost-based resource allocation

13 21 User 5 Machine 1 User 1 Machine 5 Cost Model  non-uniform costing – time to time – one user to another – usage duration  encourages use of local resources first  user can access remote resources, but pays a penalty in higher cost.

Current Scheduling Algorithm in Nimrod/G: Deadline based, and tries to meet using low cost resources.  M - Resources, N - Jobs, D - deadline  Note: Cost of any R i is less than any of R i+1 …. Or Rm – RL: Resource List need to be maintained in increasing order of cost  C t - Time when accessed (Time now)  T i - Job runtime (average) on Resource i (R i ) [updated periodically] – T i is acts as a load profiling parameter.  A i - number of jobs assigned to R i, where: – A i = Min (No.Unassigned Jobs, No. Jobs R i can complete by remaining deadline) – No.UnAssignedJobs i = Diff( N, (A 1 +…+A i-1 )) – JobsR i consume = RemainingTime (D- C t ) DIV T i  ALG: Invoke Job Assignment() periodically untile all jobs done. – Job Assignment(): – Establish ( RL, C t, T i, A i ) dynamically. – For all resources (I = 1 to M) { Assign A i Jobs to R i, if required}

Resource Usage (for various deadlines)

Scheduling Methods for Global Resource Allocation 1. Equal Assignment, but no Load Balancing 2. Equal Assignment and Load Balanced 3. (2) + deadline (no worry about cost) 4. (2) + deadline (minimize the cost of computation) 5. (4) + budget --> deadline + cost (up front agreement) – I am willing to pay $$$, can you complete by deadline. 6. (5) + Advance Resource Reservation 6. (6) + Grid Economy - dynamic pricing - use Trading/Tender/Bid process 7. (6) + Grid Economy - dynamic pricing - use auction technique 8. Genetic/Fuzzy Logic, etc algorithms

Scheduling Policies can be combined! Advaced Reservation Deadline Economic-based

Nimrod/G Limitations (current version)  Manual/Static Resource Cost Model – static file of machine cost  Partially support for computational economy – User cannot know the cost of computation until experiement completes. They just need to trust the broker.  No support dividing the budget among jobs.  Flat price model (limitation like Internet pricing)  No Advanced Resource Reservation  No resource trading

GRACE Gr id A rchitecture for C omputational E conomy  GRACE aims help Nimrod/G overcome the current limitations.  GRACE middleware offer generic interfaces (APIs) that other developers of grid tools can use along with Globus services.  GRACE may become part of Globus ?? – Then, APIs look like “ globus_grace_ …. (….) ”

Why Computational Economy in Resource Management ? “Observe Grid characteristics and current resource management policies”  Grid resources are not owned by user or single organisation.  They have their own administrative policy  Mismatch in resource demand and supply – overall resource demand may exceed supply.  Traditional System-centric (performance matrix approaches does not suit in grid environment. – System-Centric --> User Centric  Like in real life, economic-based approach is one of the best ways to regulate selection and scheduling on the grid as it captures user-intent.  Markets are an effective institution in coordinating the activities of several entities.

Advantages of Economic-based RM  System Centric --> User Centric Policy in RM  Helps in regulating demand and supply – resource access cost can fluctuate (based on demand and supply and system can adapt)  Scalable Solution – No need of central coordinator (during negotiation) – Resources(sellers) and also Users(buyers) can make their own decisions and try to maximize utility and profit.  Uniform Treatment of all Resources – Everything can can be traded including CPU, Mem, Net, Storage/Disk, other devices/instruments – Efficient allocation of resources

Resource Profile (that can be considered during resource selection)  Static profile – CPU (power, clock speed, arch.) – memory capacity – Disk (speed of access) – Network (bandwidth/latency) – OS  Dynamic Profile – Load – Available Capacity – Priority – Cost (trade off with all the above)

Grid Resource Management Architecture (Economic/Market-based) Client Application Super-scheduler or Broker Resource Domain Grid Explorer Scheduler Trade Manager Job Control Agent Deployment Agent Trade Server Process Server Resource Reservation R Other components of local resource manager Trading Grid Information Server

Key components involved in computation economy issues…  Client – accepts user job (application) – Budget (B) – Budget Handling Rules – Deadline (T) – Application requirements (storage etc.) – Normalised/Estimated runtime of tasks (optional) – Preferences  Super-scheduler or Broker – Persistent job control agent – Resource Discovery/Locator – Bid Manager (Negotiating with resource owners/bid servers) – Dispatcher

Key components involved in computation economy issues  Grid Information Directory/Server – Static profile – Dynamic profile – Cost/Price (yellow pages/posted) and conditions of use etc.  Bid Server (can be part of local scheduler/RM) – Negotiation Template – Trading/Auction  Local Resource Manager – Can support Bid Server – Advance Resource Reservation (e.g., GRAM+GARA) – Also handle Scheduling

Grid Open Trading Protocols Get Connected Call for Bid(DT) Reply to Bid(DT) Negotiate Deal(DT) Confirm Deal(DT, Y/N) …. Cancel Deal(DT) Change Deal(DT) Get Disconnected Bid ManagerBid Server API Pricing Rules DT - Deal Template - resource requirements (BM) - resource profile (BS) - price (any one can set) - status - change the above values - negotiation can continue - accept/decline - validity period

Open Trading Finite State Machine DT Offer BS ND NAD Offer BM > > ND - Negotiate Deal

GRACE APIs (for Trading between Bid Manager and Bid Server)  tid = grace_trade_connect(resource_id)  grace_request_quote/bid(tid, DT)  grace_deal_negotiate (tid, DT)  grace_deal_confirm(tid, DT)  grace_deal_cancel(tid, DT)  grace_deal_change( tid, DT)  grace_trade_reconnect(tid, resource_id)  grace_trade_disconnect(tid, resource_id) – tid = Trade Identification code – DT = Deal Template

GRACE Deal Template  Resource Requirement – CPU Time – Memory – Storage – Timeline (when and up to) DT - Deal Template - resource requirements (BM) - resource profile (BS) - price (any one can set) - status - change the above values - negotiation can continue - accept/decline - validity period  Resource Profile – Static – Dynamic – Cost Chart – Slots Chart

Scheduler Steps 1. Identify Accessible Resources 2. Call for Bids 3. Collect Bids 4. Evaluate Bids and Negotiate such that resources meets user requirements 5. Confirm all bidders (accept/reject) 6. Have contract/Agreement of Usage 7. Schedule Computations 8. Periodically Monitor progress and make sure that user requirements will be meet A. If “No”, Reschedule unfinished computations such that requirements are meet A. If “No”, Reschedule unfinished computations such that requirements are meet B. IF “yes” repeat Step (8) Until application processing completes. B. IF “yes” repeat Step (8) Until application processing completes.

Related Works  AppLeS (UC. San Diego) – application level scheduling templates & case-by- case for different Class of Apps.  NetSolve (UTK/ORNL) – API for creating farms  Millennium (UC. Berkeley) – remote execution environment on clusters and supports computational economy  CODINE/GRD (Genias/Gridware) – a RMS for clusters, but moving towards grid; meets deadline by dominating over others share.  Mariposa- Distributed Database system (UC, Berkeley) – query with budget, creates sub-query & divides budget, trades with (remote) servers

Scheduling Policies in GRD Policy - who gets how much Global Dynamic Scheduler 4tracks workload 4enforces policies 4manages global resource utilization Urgency-based Priority Initiate TimeDeadline User/Project share tree time based Share-based Usage Functional Priority Current load snapshot based Override System Boosts temporarily a project/job/team Policies can be combined

Deadline policy in GRD Urgency-based Priority Initiate TimeDeadline Deadline Job Other Jobs Share Utilization TimeRequired Completion

Scheduling Policies Policy - who gets how much Global Dynamic Scheduler 4tracks workload 4enforces policies 4manages global resource utilization Urgency-based Priority Initiate TimeDeadline User/Project share tree time based Share-based Usage Functional Priority Current load snapshot based Override System Boosts temporarily a project/job/team

Conclusions  Nimrod/G architecture offers a scalable model for resource management and scheduling on computational grids  Economic based approach to resource management is the way to go in the grid environment.  Resources can be traded for sequential and parallel applications.  Advance Resource Reservation and Computation Economy together helps in moving towards user- centric approach to resource management.  The user can say “I am willing to pay $…, can you complete my job by this time…”  Grid: A Next Generation Internet ?

Further Information  Nimrod/G Project: –  eGRID, Economy driven GRID resource management: –  Grid Computing Info Centre: –  Millennium Compute Power Grid Project – – You are invited to join CPG project!  GRID’2000 Meeting –

Thank You… Any ??