Holding slide prior to starting show. Scheduling Parametric Jobs on the Grid Jonathan Giddy

Slides:



Advertisements
Similar presentations
Computational Grids and Computational Economy: Nimrod/G Approach David Abramson Rajkumar Buyya Jonathan Giddy.
Advertisements

Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
CSE 522 Real-Time Scheduling (4)
Distributed Multimedia Systems
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter : S.Y.Chen.
SWiM Panel on Engine Implementation Jennifer Widom.
Towards Resource Aware Applications and Systems Michael B. Jones Microsoft Research.
The new The new MONARC Simulation Framework Iosif Legrand  California Institute of Technology.
Building Resource-Aware Applications and Systems Michael B. Jones Microsoft Research.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Workload Management Massimo Sgaravatto INFN Padova.
Cmpt-225 Simulation. Application: Simulation Simulation  A technique for modeling the behavior of both natural and human-made systems  Goal Generate.
GRID COMPUTING & GRID SCHEDULERS - Neeraj Shah. Definition A ‘Grid’ is a collection of different machines where in all of them contribute any combination.
Architectural Design Establishing the overall structure of a software system Objectives To introduce architectural design and to discuss its importance.
DIRAC API DIRAC Project. Overview  DIRAC API  Why APIs are important?  Why advanced users prefer APIs?  How it is done?  What is local mode what.
Resource Management Reading: “A Resource Management Architecture for Metacomputing Systems”
1 Real-Time Queueing Network Theory Presented by Akramul Azim Department of Electrical and Computer Engineering University of Waterloo, Canada John P.
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Gridbus Resource Broker for Application Service Costs-based Scheduling on Global Grids: A Case Study in Brain Activity Analysis Srikumar Venugopal 1, Rajkumar.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
1 Enabling Large Scale Network Simulation with 100 Million Nodes using Grid Infrastructure Hiroyuki Ohsaki Graduate School of Information Sci. & Tech.
Sun Grid Engine. Grids Grids are collections of resources made available to customers. Compute grids make cycles available to customers from an access.
Nimrod/G GRID Resource Broker and Computational Economy David Abramson, Rajkumar Buyya, Jon Giddy School of Computer Science and Software Engineering Monash.
WP9 Resource Management Current status and plans for future Juliusz Pukacki Krzysztof Kurowski Poznan Supercomputing.
GridIS: an Incentive-based Grid Scheduling Lijuan Xiao, Yanmin Zhu, Lionel M. Ni, Zhiwei Xu 19th International Parallel and Distributed Processing Symposium.
1 520 Student Presentation GridSim – Grid Modeling and Simulation Toolkit.
Nimrod & NetSolve Sathish Vadhiyar. Nimrod Sources/Credits: Nimrod web site & papers.
Multicriteria Driven Resource Management Strategies in GRMS Krzysztof Kurowski, Jarek Nabrzyski, Ariel Oleksiak, Juliusz Pukacki Poznan Supercomputing.
Quality of Service Karrie Karahalios Spring 2007.
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
Chapter 101 Multiprocessor and Real- Time Scheduling Chapter 10.
Joseph Cordina 1/11 The Use of Model-Checking for the Verification of Concurrent Algorithms Joseph Cordina Department of C.S.&A.I.
Example: Sorting on Distributed Computing Environment Apr 20,
Performance.
Resource Brokering in the PROGRESS Project Juliusz Pukacki Grid Resource Management Workshop, October 2003.
CS333 Intro to Operating Systems Jonathan Walpole.
July 11-15, 2005Lecture3: Grid Job Management1 Grid Compute Resources and Job Management.
Chapter 3 System Performance and Models Introduction A system is the part of the real world under study. Composed of a set of entities interacting.
Economic and On Demand Brain Activity Analysis on Global Grids A case study.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
1 Lecture 1: Computer System Structures We go over the aspects of computer architecture relevant to OS design  overview  input and output (I/O) organization.
Introduction to Grid Computing and its components.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
HPC HPC-5 Systems Integration High Performance Computing 1 Application Resilience: Making Progress in Spite of Failure Nathan A. DeBardeleben and John.
L12 – Performance 1 Comp 411 Computer Performance He said, to speed things up we need to squeeze the clock Study
Millions of Jobs or a few good solutions …. David Abramson Monash University MeSsAGE Lab X.
MGRID Architecture Andy Adamson Center for Information Technology Integration University of Michigan, USA.
A Distributed Resource Management Architecture that Supports Advance Reservations and Co-Allocation Presented by Alain Roy, University of Chicago With.
Silberschatz and Galvin  Operating System Concepts Module 1: Introduction What is an operating system? Simple Batch Systems Multiprogramming.
Intro to Distributed Systems Hank Levy. 23/20/2016 Distributed Systems Nearly all systems today are distributed in some way, e.g.: –they use –they.
Enabling Grids for E-sciencE Agreement-based Workload and Resource Management Tiziana Ferrari, Elisabetta Ronchieri Mar 30-31, 2006.
Resource Characterization Rich Wolski, Dan Nurmi, and John Brevik Computer Science Department University of California, Santa Barbara VGrADS Site Visit.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
Lecture 3 – MapReduce: Implementation CSE 490h – Introduction to Distributed Computing, Spring 2009 Except as otherwise noted, the content of this presentation.
NFV Compute Acceleration APIs and Evaluation
Workload Management Workpackage
OPERATING SYSTEMS CS 3502 Fall 2017
Network Operating Systems (NOS)
Advanced OS Concepts (For OCR)
Advanced Operating Systems
COMP60611 Fundamentals of Parallel and Distributed Systems
Wide Area Workload Management Work Package DATAGRID project
Function Overloading.
CS703 - Advanced Operating Systems
I Datagrid Workshop- Marseille C.Vistoli
Presentation transcript:

Holding slide prior to starting show

Scheduling Parametric Jobs on the Grid Jonathan Giddy

Parametric computation Scientifically: –Study the behaviour of output variables against a range of different input scenarios Computationally: –Execute an application multiple times, each time with a different combination of input parameters

Why use the Grid? Parametric computations –Require high performance computational resources –Require large numbers of computational resources –Generate large amounts of concurrency –Generate uncoupled computations –Tolerate high latencies

Nimrod/G CostDeadline

Minimise Cost Increasing price Node 4 Node 3 Node 2 Node 1 Time Jobs Budget Cost

7 Minimise Time Increasing price Node 4 Node 3 Node 2 Node 1 Time Jobs Budget Cost Budget / Job

Globus 1.1 GRAM API int globus_gram_client_job_check( char *resource_manager_contact, const char *description, const float conf_percentage, globus_gram_client_time_t *estimate, globus_gram_client_time_t *interval) Note: This is not yet implemented This function returns an estimate of the time it would take for a job of the description provided to reach an ACTIVE state.

Historical profiling Examine characteristics of all jobs in queue against historical profiles in order to determine expected start time of a job Returns start time and error estimate Warren Smith, Ian T. Foster, Valerie E. Taylor: Predicting Application Run Times Using Historical Information. Job Scheduling Strategies for Parallel Processing Workshop (JSSPP) 1998:

Information Overload Too many variables: –Number of CPUs –CPU speed –Processor architecture –Operating system –Real memory –Disk speed –Bandwidth –Latency –Other users

Extrapolation of completion rate A B C 2 jobs/hr 3 jobs/hr 6 jobs/hr 1 hr2 hr

Time Average No. Processors 20 Hour deadline 15 hour deadline 10 hour deadline

Assumptions Compute time >> Network time All jobs are the same length on any particular resource Price of a resource is constant over time Not much wriggle room during the end- game –Both scheduling schemes push up against the limit that they’re not minimising –Heuristic nature of completion time

What we really want… Guaranteed completion time –globus_gram_client_job_check() with teeth –Requires scheduler to internally reserve space for job in advance Advance reservation –As above, but with external interface

And this too… A real grid economy –Incentive for providers to provide resources –Incentive for consumers to describe requirements accurately –Incentive for consumers to use resources judiciously –Price mechanism budget as a timely global information parameter universally understood enables trade-offs in making QoS decisions

A final point Optimising is really hard in a wide area network –Requires centralised decision maker –Information is missing –Information is not contemporaneous –Information is out-of-date

Scalable information …is slow to change Budget and deadline are (relatively) constant and can be propagated far and wide in a timely manner Slow information comes from specifying requirements in the real world Satisfying (instead of optimising) a requirement is relatively simple –A resource can so it does –A resource can’t so it doesn’t