Rajkumar Buyya School of Computer Science and Software Engineering

Slides:

Advertisements

Similar presentations

Nimrod/G GRID Resource Broker and Computational Economy

Advertisements

Nimrod/G and Grid Market A Case for Economy Grid Architecture for Service Oriented Global Grid Computing Rajkumar Buyya, David Abramson, Jon Giddy Monash.

Computational Grids and Computational Economy: Nimrod/G Approach David Abramson Rajkumar Buyya Jonathan Giddy.

Economy Grid: A New e-Paradigm for Grid/Internet Computing Special Thanks: David Abramson Jack Dongarra Wolfgang Gentzsch Jonathan Giddy Domenico Laforenza.

1 Project Overview EconomyGrid Economic Paradigm For “Resource Management and Scheduling” for Service-Oriented Grid Computing Presenter Name: Sama GovindaRamanujam.

High Performance Computing Course Notes Grid Computing.

Resource Management of Grid Computing

Grids and Grid Technologies for Wide-Area Distributed Computing Mark Baker, Rajkumar Buyya and Domenico Laforenza.

1 GRID D. Royo, O. Ardaiz, L. Díaz de Cerio, R. Meseguer, A. Gallardo, K. Sanjeevan Computer Architecture Department Universitat Politècnica de Catalunya.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Cloud Computing for the Enterprise November 18th, This work is licensed under a Creative Commons.

Economy Grid: A New e-Paradigm for Grid/Internet Computing GAURAV GUNJAN JHA - BTECH/COMP.SCIENCE BIT MESRA, INDIA.

Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.

DISTRIBUTED COMPUTING

SUMA: A Scientific Metacomputer Cardinale, Yudith Figueira, Carlos Hernández, Emilio Baquero, Eduardo Berbín, Luis Bouza, Roberto Gamess, Eric García,

The Globus Project: A Status Report Ian Foster Carl Kesselman

Tools for collaboration How to share your duck tales…

NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.

The Grid the united computing power Jian He Amit Karnik.

Authors: Ronnie Julio Cole David

Authors: Rajkumar Buyya, David Abramson & Jonathan Giddy

Economic and On Demand Brain Activity Analysis on Global Grids A case study.

7. Grid Computing Systems and Resource Management

3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 1.

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Presentation agenda Introduction.Background.Definition. Why it is? How it works? Applications Entry to Grid Adv. & Dis adv. Conclusion.

Chapter 16 Client/Server Computing Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Chapter 1 Characterization of Distributed Systems

Workload Management Workpackage

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING

David Abramson, Rajkumar Buyya, and Jonathan Giddy

Chapter 1: Introduction

Chapter 1: Introduction

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Clouds , Grids and Clusters

Operating Systems : Overview

Peter Kacsuk – Sipos Gergely MTA SZTAKI

GWE Core Grid Wizard Enterprise (

Chapter 1: Introduction

Chapter 1: Introduction

Globus —— Toolkits for Grid Computing

Introduction to Data Management in EGI

Grid Computing.

Introduction to Operating System (OS)

Chapter 1: Introduction

Chapter 1: Introduction

University of Technology

GRID COMPUTING PRESENTED BY : Richa Chaudhary.

Chapter 1: Introduction

Grid Means Business OGF-20, Manchester, May 2007

Convergence Characteristics for Clusters, Grids, and P2P networks

CSS490 Grid Computing Textbook No Corresponding Chapter

The Globus Toolkit™: Information Services

PDCAT’2000 Panel, Hong Kong ”The Application of PDC (Parallel and Distributed Computing) and Technique in E-Commerce” Rajkumar Buyya School of Computer.

Chapter 1: Introduction

Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.

Operating Systems : Overview

Operating Systems : Overview

Subject Name: Operating System Concepts Subject Number:

Chapter 1: Introduction

Chapter 1: Introduction

Building and running HPC apps in Windows Azure

Chapter 1: Introduction

LO2 – Understand Computer Software

Defining the Grid Fabrizio Gagliardi EMEA Director Technical Computing

Chapter 1: Introduction

Grid Computing Software Interface

Chapter 1: Introduction

Presentation transcript:

Seamless, Scalable Computing from Desktops to Global Computational Power Grids: Hype or Reality ? Rajkumar Buyya School of Computer Science and Software Engineering Monash University, Melbourne, Australia Email: rajkumar@csse.monash.edu.au | rajkumar@buyya.com Internet Home: www.csse.monash.edu.au/~rajkumar | www.buyya.com Thanks to: David Abramson Jonathan Giddy Jack Dongarra Wolfgang Gentzsch http://www.buyya.com/ecogrid/

Agenda Computing Platforms and How Grid is Different ? Cluster Computing Technologies and new challenges ? Clusters of Clusters (Federated/Hyper Clusters) Towards Global (Grid) Computing Grid Resource Management and Scheduling Challenges Grid Technologies (in development) Grid Substrates and Interoperability Grid Middleware Resource Brokers and High Level Tools Applications and Portals Conclusion

Computing Power (HPC) Drivers Solving grand challenge applications using computer modeling, simulation and analysis E-commerce/anything Life Sciences Aerospace Digital Biology CAD/CAM Military Applications Military Applications Military Applications

How to Run App. Faster ? There are 3 ways to improve performance: 1. Work Harder 2. Work Smarter 3. Get Help Computer Analogy 1. Use faster hardware: e.g. reduce the time per instruction (clock cycle). 2. Optimized algorithms and techniques 3. Multiple computers to solve problem: That is, increase no. of instructions executed per clock cycle.

Computing Platforms Evolution Breaking Administrative Barriers 2100 ? PERFORMANCE 2100 Administrative Barriers Individual Group Department Campus State National Globe Inter Planet Universe Desktop (Single Processor?) SMPs or SuperComputers Local Cluster Enterprise Cluster/Grid Global Cluster/Grid Inter Planet Cluster/Grid ??

Towards Inexpensive Supercomputing It is: Cluster Computing.. The Commodity Supercomputing!

Clustering of Computers for Collective Computing: Trends 1990 1995+ 1960

Clustering Today Clustering is gained momentum when 4 technologies converged: 1. Killer Micros/CPUs: Very HP Microprocessors workstation performance = yesterday supercomputers. 2. Killer Networks: High speed communication Comm. between cluster nodes >= between processors in an SMP. 3. Killer Tools: Standard tools for parallel/ distributed computing & their growing popularity. 4. Killer Applications: scientific, engineering, Database, and Internet/Web serving.

Cluster Computer Architecture

Major issues in cluster design Scalability (physical & application) Performance High Availability (fault-tolerance and failure management) Single System Image (look-and-feel of one system) Fast Communication (networks & protocols) Load Balancing (CPU, Memory, Disk, Net) Security and Encryption (clusters of clusters) Distributed Environment (Social issues) Manageability (administration and control) Programmability (simple API if required) Applicability (cluster-aware and non-aware apps.)

Killer Micro: Technology Trend (lead to the death of traditional supercomputers)

Killer Networks and Protocols Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) ATM Myrinet (1.2Gbps) Digital Memory Channel Internet!!! Protocols (Light weight protocols (User Level) TCP/IP? Active Messages (Berkeley) Basic Interface for Parallelism (Lyon, France) Fast Messages (Illinois) U-net (Cornell) VIA (Industry standard)…...

Killer Tools/Middleware (e.g., HKU JESSICA) http://www.srg.csis.hku.hk/jessica.htm

Killer Applications Numerous Scientific & Engineering Apps. Parametric Simulations Business Applications E-commerce Applications (Amazon.com, eBay.com ….) Database Applications (Oracle on cluster) Decision Support Systems Internet Applications Web serving Infowares (yahoo.com, AOL.com) ASPs (application service providers) eChat, ePhone, eBook, eCommerce, eBank, eSociety, eAnything! Computing Portals Mission Critical Applications command control systems, banks, nuclear reactor control, star-war, and handling life threatening situations.

Science Portals - e.g., PAPIA system Pentiums Myrinet NetBSD/Linuux PM Score-D MPC++ RWCP Japan: http://www.rwcp.or.jp/papia/ PAPIA PC Cluster

Cluster Computing Forum IEEE Task Force on Cluster Computing (TFCC) http://www.ieeetfcc.org

Adoption of the Approach

Clusters of Clusters (HyperClusters) Scheduler Master Daemon Execution Submit Graphical Control Clients Cluster 2 Cluster 3 Cluster 1 LAN/WAN

Towards Grid Computing…. For illustration, placed resources arbitrarily on the GUSTO test-bed!!

What is Grid ? An infrastructure that couples Computers (PCs, workstations, clusters, traditional supercomputers, and even laptops, notebooks, mobile computers, PDA, and so on) … Software ? (e.g., ASPs renting expensive special purpose applications on demand) Databases (e.g., transparent access to human genome database) Special Instruments (e.g., radio telescope--SETI@Home Searching for Life in galaxy, Austrophysics@Swinburne for pulsars) People (may be even animals who knows ?) across the local/wide-area networks (enterprise, organisations, or Internet) and presents them as an unified integrated (single) resource.

Conceptual view of the Grid Leading to Portal (Super)Computing http://www.sun.com/hpc/

Grid Application-Drivers Old and New applications getting enabled due to coupling of computers, databases, instruments, people, etc: (distributed) Supercomputing Collaborative engineering high-throughput computing large scale simulation & parameter studies Remote software access / Renting Software Data-intensive computing On-demand computing

The Grid Vision: To offer “Dependable, consistent, pervasive access to [high-end] resources” Dependable: Can provide performance and functionality guarantees Consistent: Uniform interfaces to a wide variety of resources Pervasive: Ability to “plug in” from anywhere Source: www.globus.org

The Grid Impact! “The global computational grid is expected to drive the economy of the 21st century similar to the electric power grid that drove the economy of the 20th century”

Sources of Complexity in Grid Resource Management No single adminstrative control No single policy each resource owners have their own policies or scheduling mechanisms. Users must honor them (particularly external users of the grid). Heterogenity of resources (static and dynamic) Unreliable - resource may come or disappear (die) No uniform cost model (it cannot be) varies from one user to another and time to time. No Single access mechanism

Resource Management Challenges introduced by the GRID Site Autonomy Heteregenous Resources and Substrate Each site has systems with different architectures (SMPs, Clusters.Linux/NT/Unix./Intel) each resource owners have their own policies or scheduling mechanisms (Codine/Condor). Policity Extensibility (through resource brokers) Resource Allocation/Coallocation Online Control (can Apps (Graphics) tolerate nonavailablity of required resource and adapt itself) Computational Economy

Grid Resource Management: Challenging Issues Authentication (once) Specify simulation (code, resources, etc.) Discover resources Negotiate authorization, acceptable use, Cost, etc. Acquire resources Schedule Jobs Initiate computation Steer computation Access remote data-sets Collaborate on results Account for usage Domain 1 Domain 2 Ack.: globus..

Grid Components … … … … Grid Apps. Scientific Engineering Applications and Portals Grid Apps. … Scientific Engineering Collaboration Prob. Solving Env. Web enabled Apps Development Environments and Tools Grid Tools … Languages Libraries Debuggers Monitoring Resource Brokers Web tools Distributed Resources Coupling Services Grid Middleware … Comm. Sign on & Security Information Process Data Access QoS Local Resource Managers Operating Systems Queuing Systems Libraries & App Kernels … TCP/IP & UDP Grid Fabric Networked Resources across Organisations … Computers Clusters Storage Systems Data Sources Scientific Instruments

Many GRID Projects and Initiatives PUBLIC FORUMS Computing Portals Grid Forum European Grid Forum IEEE TFCC! GRID’2000 and more. Australia Nimrod/G EcoGrid and GRACE DISCWorld Europe UNICORE MOL METODIS Globe Poznan Metacomputing CERN Data Grid MetaMPI DAS JaWS and many more... Public Grid Initiatives Distributed.net SETI@Home Compute Power Grid USA Globus Legion JAVELIN AppLes NASA IPG Condor Harness NetSolve NCSA Workbench WebFlow EveryWhere and many more... Japan Ninf Bricks http://www.gridcomputing.com/

Distributed ASCI Supercomputer Many GRID Testbeds... GUSTO Distributed ASCI Supercomputer NASA IPG

Scheduler Architecture Alternatives Scheduler Model Scheduler Architecture Example System Centralized (Single Resource) R Unix, Linux, Windows OS R1 R2 Rn . Cluster systems such as LSF, Codine, Nimrod, Condor Centralized (Multiple Resources) (Single/Multiple Domains) (Jobs) (Queue) R Multi-clusters or Grid Systems such as AppLes, Nimrod/G, HTB, Legion, Condor Hierarchical (Multiple Domains) R1 (Local Scheduler) . (Resource Broker or Super Scheduler) . ... R Rn R Cluster systems like MOSIX follows simple model. Grid systems can follow, but it is complex to realize. ……. Decentralized (Self coordinated or Job Pool) (Single/Multiple Domains) R1 (Job Pool) . . Rn

Grid Components and our Focus: Resource Brokers and Computational Economy Apps. Applications High-level Services and Tools GlobusView Testbed Status Grid Tools DUROC MPI MPI-IO CC++ Nimrod/G globusrun Core Services Grid Middleware Heartbeat Monitor Nexus GRAM GRACE Globus Security Interface Gloperf MDS GASS GARA Grid Fabric Local Services Condor MPI TCP UDP LSF Easy NQE AIX Irix Solaris Source: Globus

Nimrod/G Resource Broker Nimrod/G Approach to Resource Management and Scheduling

What is Nimrod/G ? A global scheduler for managing and steering task farming (parametric simulation) applications on computational grid based on deadline and computational economy. Key Features A single window to manage and control whole experiment Resource Discovery Trade for Resources Scheduling Steering & data management Leverages Globus Services Other parallel application models can be supported easily (MPI & DUROC co-allocation)

A Nimrod/G User Console Deadline Cost Available Machines

Nimrod/G Architecture Nimrod/G Client Nimrod/G Client Nimrod/G Client Nimrod Engine Schedule Advisor Persistent Store Trading Manager Dispatcher Grid Explorer TM TS Middleware Services GE GIS Grid Information Services RM & TS RM & TS RM & TS GUSTO Test Bed RM: Local Resource Manager, TS: Trade Server

Nimrod/G Interactions Resource location MDS server Additional services used implicitly: GSI (authentication & authorization) Nexus (communication) Scheduler Trade Server Resource allocation (local) Prmtc.. Engine Dispatcher GRAM server Queuing System Job Wrapper User process GASS server File access Root node Gatekeeper node Computational node

Resource Location/Discovery Need to locate suitable machines for an experiment Speed Number of processors Cost Availability User account Available resources will vary across experiment Supported through Directory Server (Globus MDS)

Computational Economy Resource selection on based real money and market based A large number of sellers and buyers (resources may be dedicated/shared) Negotiation: tenders/bids and select those offers meet the requirement Trading and Advance Resource Reservation Schedule computations on those resources that meet all requirements

Cost Model 1 3 2 User 5 Machine 1 User 1 Machine 5 non-uniform costing time to time one user to another usage duration encourages use of local resources first user can access remote resources, but pays a penalty in higher cost.

Nimrod/G Scheduling Algorithm Find a set of machines (MDS search) Distribute jobs from root to machines Establish job consumption rate for each machine For each machine Can we meet deadline? If not, then return some jobs to root If yes, distribute more jobs to resource If cannot meet deadline with current resource Find additional resources

Nimrod/G Scheduling algorithm ... Locate more Machines Locate Machines Establish Rates Re-distribute Jobs Distribute Jobs Meet Deadlines?

Resource Usage (for various deadlines)

Scheduling Methods for Global Resource Allocation 1. Equal Assignment, but no Load Balancing 2. Equal Assignment and Load Balanced 3. (2) + deadline (no worry about cost) 4. (2) + deadline (minimize the cost of computation) 5. (4) + budget --> deadline + cost (up front agreement) I am willing to pay $$$, can you complete by deadline. 6. (5) + Advance Resource Reservation 6. (6) + Grid Economy - dynamic pricing - use Trading/Tender/Bid process 7. (6) + Grid Economy - dynamic pricing - use auction technique 8. Genetic/Fuzzy Logic, etc algorithms

Grid Architecture for Computational Economy GRACE Grid Architecture for Computational Economy GRACE aims help Nimrod/G overcome the current limitations. GRACE middleware offer generic interfaces (APIs) that other developers of grid tools can use along with Globus services.

Grid Resource Management Architecture (Economic/Market-based) Grid Information Server Grid Explorer Application Job Control Agent Trading Schedule Advisor Trade Server Other components of local resource manager Resource Reservation Trade Manager Deployment Agent Resource Allocation R1 R2 … Rn User Resource Broker A Resource Domain

Why Computational Economy in Resource Management ? “Observe Grid characteristics and current resource management policies” Grid resources are not owned by user or single organisation. They have their own administrative policy Mismatch in resource demand and supply overall resource demand may exceed supply. Traditional System-centric (performance matrix approaches does not suit in grid environment. System-Centric --> User Centric Like in real life, economic-based approach is one of the best ways to regulate selection and scheduling on the grid as it captures user-intent. Markets are an effective institution in coordinating the activities of several entities.

Advantages of Economic-based RM System Centric --> User Centric Policy in RM Helps in regulating demand and supply resource access cost can fluctuate (based on demand and supply and system can adapt) Scalable Solution No need of central coordinator (during negotiation) Resources(sellers) and also Users(buyers) can make their own decisions and try to maximize utility and profit. Uniform Treatment of all Resources Everything can can be traded including CPU, Mem, Net, Storage/Disk, other devices/instruments Efficient allocation of resources

Grid Open Trading Protocols Trade Manager Trade Server Get Connected Pricing Rules Call for Bid(DT) Reply to Bid (DT) Negotiate Deal(DT) …. API Confirm Deal(DT, Y/N) DT - Deal Template - resource requirements (BM) - resource profile (BS) - price (any one can set) - status - change the above values - negotiation can continue - accept/decline - validity period Cancel Deal(DT) Change Deal(DT) Get Disconnected

Open Trading Finite State Machine DT < TM, Request for Resource > < TM, Ask Price > << TS, Update >> << TM, Update >> DT <TS, Bid > < TS, Final Offer > < TM, Final Offer > Offer TS Offer TM <TM, Rej.> < TS, Reject > < TM, Accept > DT - Deal Template TM - Trade Manager TM - Trade Server DA - Deal Accepted DN - Deal Not accepted DA DN

Conclusions The Emergence of Internet as a Powerful connectivity media is bridging the gap between a number of technologies leading to what is known as “Everything on IP”. Cluster-based systems have become a platform of choice for mainstream computing. Economic based approach to resource management is the way to go in the grid environment. The user can say “I am willing to pay $…, can you complete my job by this time…” Both sequential and parallel applications run seamless on desktops, SMPs, Clusters, and the Grid without any change. Grid: A Next Generation Internet ?

Further Information Cluster Computing Infoware: http://www.buyya.com/cluster/ Grid Computing Infoware: http://www.gridcomputing.com Millennium Compute Power Grid/Market Project http://www.ComputePower.com You are invited to join CPG project! Books: High Performance Cluster Computing, V1, V2, R.Buyya (Ed), Prentice Hall, 1999. The GRID, I. Foster and C. Kesselman (Eds), Morgan-Kaufmann, 1999. IEEE Task Force on Cluster Computing http://www.ieeetfcc.org GRID Forums http://www.gridforum.org | http://www.egrid.org GRID’2000 Meeting http://www.gridcomputing.org

Thank You… Any ??

Back up Slides Thanks to Jack Dongarra; He allowed me to use his survey slides particularly those slides which are cut and paste included.