High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ? David Abramson, Jon Giddy and Lew Kotler Presentation By:

Slides:

Advertisements

Similar presentations

Computational Grids and Computational Economy: Nimrod/G Approach David Abramson Rajkumar Buyya Jonathan Giddy.

Advertisements

High Performance Computing Course Notes Grid Computing.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

Condor-G: A Computation Management Agent for Multi-Institutional Grids James Frey, Todd Tannenbaum, Miron Livny, Ian Foster, Steven Tuecke Reporter: Fu-Jiun.

A Computation Management Agent for Multi-Institutional Grids

Resource Management of Grid Computing

USING THE GLOBUS TOOLKIT This summary by: Asad Samar / CALTECH/CMS Ben Segal / CERN-IT FULL INFO AT:

CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.

Universität Dortmund Robotics Research Institute Information Technology Section Grid Metaschedulers An Overview and Up-to-date Solutions Christian.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Introduction and Overview “the grid” – a proposed distributed computing infrastructure for advanced science and engineering. Purpose: grid concept is motivated.

Slides for Grid Computing: Techniques and Applications by Barry Wilkinson, Chapman & Hall/CRC press, © Chapter 1, pp For educational use only.

1-2.1 Grid computing infrastructure software Brief introduction to Globus © 2010 B. Wilkinson/Clayton Ferner. Spring 2010 Grid computing course. Modification.

Milos Kobliha Alejandro Cimadevilla Luis de Alba Parallel Computing Seminar GROUP 12.

Workload Management Massimo Sgaravatto INFN Padova.

What is adaptive web technology?  There is an increasingly large demand for software systems which are able to operate effectively in dynamic environments.

Lecture Nine Database Planning, Design, and Administration

Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.

Globus Computing Infrustructure Software Globus Toolkit 11-2.

Evaluation of the Globus GRAM Service Massimo Sgaravatto INFN Padova.

Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”

Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)

Chapter 9 Database Planning, Design, and Administration Sungchul Hong.

Database System Development Lifecycle © Pearson Education Limited 1995, 2005.

Overview of the Database Development Process

 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.

©Ian Sommerville 2006Software Engineering, 8th edition. Chapter 12 Slide 1 Distributed Systems Architectures.

Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.

Active Monitoring in GRID environments using Mobile Agent technology Orazio Tomarchio Andrea Calvagna Dipartimento di Ingegneria Informatica e delle Telecomunicazioni.

WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.

GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.

1 Minggu 9, Pertemuan 17 Database Planning, Design, and Administration Matakuliah: T0206-Sistem Basisdata Tahun: 2005 Versi: 1.0/0.0.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

1 520 Student Presentation GridSim – Grid Modeling and Simulation Toolkit.

Nimrod & NetSolve Sathish Vadhiyar. Nimrod Sources/Credits: Nimrod web site & papers.

1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.

The Grid System Design Liu Xiangrui Beijing Institute of Technology.

CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.

The Globus Project: A Status Report Ian Foster Carl Kesselman

Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.

Service - Oriented Middleware for Distributed Data Mining on the Grid ，劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.

Job scheduling algorithm based on Berger model in cloud environment Advances in Engineering Software (2011) Baomin Xu,Chunyan Zhao,Enzhao Hua,Bin Hu 2013/1/251.

Ames Research CenterDivision 1 Information Power Grid (IPG) Overview Anthony Lisotta Computer Sciences Corporation NASA Ames May 2,

Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.

Globus Toolkit Massimo Sgaravatto INFN Padova. Massimo Sgaravatto Introduction Grid Services: LHC regional centres need distributed computing Analyze.

Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]

Economic and On Demand Brain Activity Analysis on Global Grids A case study.

International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.

Globus and PlanetLab Resource Management Solutions Compared M. Ripeanu, M. Bowman, J. Chase, I. Foster, M. Milenkovic Presented by Dionysis Logothetis.

Introduction to Grid Computing and its components.

Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.

GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.

Globus: A Report. Introduction What is Globus? Need for Globus. Goal of Globus Approach used by Globus: –Develop High level tools and basic technologies.

CSC 480 Software Engineering Lecture 17 Nov 4, 2002.

The Globus Toolkit The Globus project was started by Ian Foster and Carl Kesselman from Argonne National Labs and USC respectively. The Globus toolkit.

Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy

INTRODUCTION TO GRID & CLOUD COMPUTING U. Jhashuva 1 Asst. Professor Dept. of CSE.

Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.

The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.

Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,

Distributed Systems Architectures. Topics covered l Client-server architectures l Distributed object architectures l Inter-organisational computing.

Workload Management Workpackage

DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S

TensorFlow– A system for large-scale machine learning

David Abramson, Rajkumar Buyya, and Jonathan Giddy

Duncan MacMichael & Galen Deal CSS 534 – Autumn 2016

Wide Area Workload Management Work Package DATAGRID project

Resource and Service Management on the Grid

Preventing Privilege Escalation

Grid Computing Software Interface

Presentation transcript:

High Performance Parametric Modeling with Nimrod/G: A Killer Application for the Global Grid ? David Abramson, Jon Giddy and Lew Kotler Presentation By: Abhijeet Karnik

Outline Introduction Parametric Modeling with Nimrod Nimrod/G Description Architecture Working Comparison with Nimrod Globus Toolkit and Grid Issues Scheduling on the Grid Cost Scheduling Algorithms Case Study: An evaluation of Nimrod/G Conclusion References

Introduction We examine the role of parametric modeling as an application for the global computing grid and explore some heuristics using which we can specify some soft real-time deadlines for larger computational experiments. Nimrod is a specialized parametric modeling system: It uses a simple declarative parametric modeling language to expresses an experiment Provides machinery that automates the task of formulating, running, monitoring and collating the results from multiple individual experiments Incorporates a scheduling component that can manages scheduling of individual experiments to idle computers.

Parametric Modeling With Nimrod Nimrod: In a Nutshell  Is a tool that manages the execution of parametric studies across distributed computers.  Takes responsibility of the overall experiment as well as low- level issues of distributing files to remote systems.  Performs remote computation and gathers the results. A user describes an experiment to Nimrod, it develops a declarative plan which describes their default values, parameters and commands necessary. A plan file consists of two main sections the parameter section and the task section. The machine which invokes Nimrod becomes known as the Root Machine: It controls the experiment. The dispatcher executes code on remote platforms; each of which is known as a computational node.

Parametric Modeling (Contd.) The Plan file is processed by a tool called the generator. The Generator:  Takes the parameter values and gives the user the choice of actual values.  Builds a run-file which contains a description of each job.  This run-file is then processed by another tool called the dispatcher. The Dispatcher:  Implements File-Transfer commands.  Responsible for the execution of the model on the remote nodes and for managing the computation across the nodes.  Allocates work to machines without any attempt to schedule their execution.

Plan-file Processing Generator Dispatcher Plan File Default Parameter Values & Commands Run File Actual Values ( Description of Each job) Managing Computation, Transfer Commands & Execution Processed By

Phases of a Nimrod Computation 1. Experiment Pre-Processing : Data is set-up for the experiment. 2. Execution Pre-Processing: Data is prepared for a particular execution. 3. Execution: Program execution for a given set of parameter values 4. Execution Post-Processing: Data from a particular experiment is reduced. 5. Experiment Post-Processing: Results are processed using tools. Phases 1 and 5 are performed once per experiment, while phases 2, 3 and 4 are run for each distinct set of parameters.

Nimrod: Limitations Nimrod, though successful, suffers from a few limitations when considered in the context of a global computational grid. 1. Uses a static set of resources and does not discover new ones dynamically. 2. Has no idea of user deadlines. In a dynamic global grid environment this is not acceptable. 3. Nimrod relies on UNIX level security, whereas in the global grid, owners of expensive supercomputing resources require a more elaborate security mechanism. 4. Nimrod does not support a range of access mechanisms.

Nimrod/G: Description Nimrod/G extends the basic Nimrod model to provide soft performance guarantees in a dynamic and heterogeneous environment. There is an effective scheduling component in Nimrod/G which seeks to meet such constraints It provides a dynamic and iterative process of resource discovery, resource acquisition and resource monitoring. Nimrod/G is a “Grid-Aware” application. It exploits the understanding of its problem domain and; Nature of the Computational Grid Features:- High level interface to the user. Transparent access to the computational resources. Implements user level Scheduling

Nimrod/G Architecture Nimrod/G is designed to operate in an environment that comprises a set of sites. Sites provide access to a set of computers with their own administrative control. Access to resources are mediated by GRAMs. Information about the physical characteristics and availability of resources are available from the MDS.

Nimrod-G Architecture Grid Middleware Services Dispatcher Nimrod/G Client Grid Directory Services Schedule Advisor Resource Discovery Parametric Engine GUSTO Test Bed Persistent Info.

Nimrod/G Working A user initiates a parametric study at a local site. Nimrod/G then organizes the mapping of individual computations to appropriate remote sites- Scheduling heuristics. On the local site, the Origin Process operates as the master for the whole system; it exists for the entire duration of the experiment. It is responsible for execution within the specified time and cost constraints. Client and the origin are distinct because a client may be tied to a particular environment. It is possible for multiple clients to monitor the same experiment by connecting to the one Origin process

Nimrod/G Working Each remote site consists of a cluster of computational nodes. A cluster may be a singe multiprocessor machine, a cluster of workstations, or even a single processor. A defining characteristic of a cluster is that access to all nodes is thru’ a set of resource managers provided by the Globus Infrastructure. The Origin process uses the Globus process creation services to start a Nimrod Resource Broker (NRB) on the cluster. NRB provides capabilities for file staging, creation of jobs and process control beyond that provided by the GRAM.

Nimrod/G Versus Nimrod Attempts to schedule otherwise unrelated tasks so that a user specified deadline is met. Computational Resources are allocated in a dynamic fashion so as to meet specified deadlines and constraints. The scheduling complexity is increased due to the introduction of parameters such as computational economics, deadlines, usage of scattered & remote resources. There is no communication between tasks once they have started. The scheduling reduces to finding suitable resources and execution of the application. Scheduling is restricted to allocating resources statically so that the application can complete; remoteness of resources, deadline and cost constraints and other such complexities are not considered.

Globus Toolkit and Grid Issues The Globus Toolkit is a collection of software components designed to support the development of applications slated for a high-performance distributed computing environment. Implementation of a bag-of-services architecture. Globus components provide basic services such as resource allocation, authentication, information communication, remote data access, fault detection..among others. Applications and Tools combine these services in different ways to construct ‘grid-enabled’ systems. Nimrod/G Uses 1.Globus Resource Allocation Manager (GRAM) for starting and managing computations on a resource. 2. Metacomputing Directory Service (MDS) provides an API for discovering the structure and state of resources for computation. 3. Globus Security Infrastructure (GSI) provides a single sign-on, run anywhere capabilities for computations. 4. Global Access to Secondary Storage (GASS) provides uniform access mechanisms for files on various storage systems.

Cost in a Global Grid Unless restrictions are placed on access to various resources of a global grid, it is likely to become congested in with too much work. A fiscal model has been implemented for controlling the amount of work requested wherein users pay for access. This scheme allows resource providers to set pricing rates for the various machines- this varies between the classes of machines, times of the day, resource demand and classes of users. Nimrod/G:The Cost Matrix

Scheduling Algorithm Nimrod/G scheduler is responsible for discovering and allocating the resources required to complete an experiment, subject to execution time and budget constraints. Scheduling Heuristics: -  Discover: the number and then the identity of the lowest-cost set of resources able to meet the deadline. A cost matrix is used for this and the output from this phase is set of resources to which jobs should be submitted.  Allocation: Unscheduled jobs are allocated to the candidate resources identified in the discovery phase.  Monitoring: The completion time of submitted jobs is monitored, hence establishing an execution rate for each resource.  Refinement: Rate information is used to update estimates of typical execution times on different resources and hence the expected completion time of the job. This may lead to jumps to steps 1 & 2 so as to discover new resources or drop existing ones from the candidate set

Scheduling Algorithm This scheme continues till the deadline is met or until the cost budget is exceeded. The user is advised and then the deadline can be modified accordingly. A consequence of using this cost-based implementation is that the cost of an experiment will vary depending on the load and the profile of the users at that time. This reflects the demand and supply mechanism, less demand will allow the experiment to be performed on cheaper resources. The thinking is more towards “Allowing the user to specify an absolute (soft) deadlines so as to express the timeliness of the computation”

Case Study:An Experiment An experiment has been conducted to test the effectiveness of Nimrod/G architecture and scheduling heuristics in a real time application. Resources were provided by the GUSTO (Globus Ubiquitous Supercomputing Testbed Organization). They are diverse in terms of their size, availability, architecture, processing capability, power, performance, scheduling mechanism & location.

Ionization Chamber An ionization chamber essentially isolates a certain volume of air and measures the ionization within that volume. This process however modifies the original photon and electron spectrum entering the volume. If the ionization chamber is to act as a primary standard for calibration purpose, it is necessary to correct the measured ionization. Experiments were performed and calculations reported here concern the simulation of the chamber response as a function of the front wall thickness. Nimrod/G performs this parametric variation.

Computational Results The ionization chamber study involved 400 tasks; the execution time of the model varied depending on the platform used; 45 minutes to 140 minutes per parameter set. Three separate experiments were performed, with deadlines of 20 hours, 15 hours and 10 hours respectively. This allows an evaluation of Nimrod/G’s ability to meet soft real time deadlines. The graphs obtained for the different deadlines depict the manner in which Nimrod/G allocates additional resources for more stringent deadlines.

Computational Results The number of processors allocated are dependent on the deadline

Results: 20 Hour Deadline 10 CU machines are introduced when the scheduler calculates that it cannot meet the deadline with the 5 CU machines

Results: 15 Hour Deadline Higher CU machines are introduced when the scheduler calculates that it cannot meet the deadline with lower CU machines

Results: 10 Hour Deadline 50 CU machines are introduced 2 hrs later, these were not needed in the 15 and 20 hour deadline experiment.

Computational Cost Quantifies the impact on cost of the different node selections made for different deadlines: A 10 hour deadline costs three times as much as the 20 hour deadline. In a dynamic environment it is not possible to show that Nimrod/G is making optimal selections- it is however, effective in selecting more expensive nodes only when the system requires them to meet deadlines.

We have discussed the evolution of a scheduling tool, Nimrod, from a local computing environment to a Global Computing Grid. Nimrod/G architecture offers a scalable model for resource management and scheduling on computational grids The algorithm used is simple and adaptive to changes; it incorporates user as well as system requirements. However, future work needs to address issues such as: Plan to use the concept of Advance Resource Reservation in order to offer the feature wherein the user can say “I am willing to pay $…, can you complete my job by this time…” Take into account the ability of Globus to reserve resources and incorporate them into the scheduling mechanism. A notion of priority could be implemented in addition to the cost- based implementation. Conclusion

References High Performance Parametric Modeling with Nimrod/G: Killer Application for Global Grids?", D. Abramson, J. Giddy and L. Kotler, International Parallel and Distributed Processing Sumposiu (IPDPS), May2000. Web Sites: