October 18, 2005 Charm++ Workshop 2005 1 Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming.

Slides:



Advertisements
Similar presentations
Pricing for Utility-driven Resource Management and Allocation in Clusters Chee Shin Yeo and Rajkumar Buyya Grid Computing and Distributed Systems (GRIDS)
Advertisements

Dynamic Resource Management for Virtualization HPC Environments Xiaohui Wei College of Computer Science and Technology Jilin University, China. 1 PRAGMA.
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Current methods for negotiating firewalls for the Condor ® system Bruce Beckles (University of Cambridge Computing Service) Se-Chang Son (University of.
SLA-Oriented Resource Provisioning for Cloud Computing
Copyright © 2007, SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Chapter 5 CPU Scheduling. CPU Scheduling Topics: Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling.
Condor and GridShell How to Execute 1 Million Jobs on the Teragrid Jeffrey P. Gardner - PSC Edward Walker - TACC Miron Livney - U. Wisconsin Todd Tannenbaum.
Resource Management of Grid Computing
CoreGRID Workpackage 5 Virtual Institute on Grid Information and Monitoring Services Authorizing Grid Resource Access and Consumption Erik Elmroth, Michał.
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
Informationsteknologi Tuesday, October 9, 2007Computer Systems/Operating Systems - Class 141 Today’s class Scheduling.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Cyberaide Virtual Appliance: On-demand Deploying Middleware for Cyberinfrastructure Tobias Kurze, Lizhe Wang, Gregor von Laszewski, Jie Tao, Marcel Kunze,
Self-Organizing Agents for Grid Load Balancing Junwei Cao Fifth IEEE/ACM International Workshop on Grid Computing (GRID'04)
Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.
OPERATING SYSTEMS CPU SCHEDULING.  Introduction to CPU scheduling Introduction to CPU scheduling  Dispatcher Dispatcher  Terms used in CPU scheduling.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
McGraw-Hill/Irwin © The McGraw-Hill Companies, All Rights Reserved BUSINESS PLUG-IN B17 Organizational Architecture Trends.
October 19, 2005Charm++ Workshop, Faucets Tutorial Presented by Esteban Pauli and Greg Koenig Parallel Programming Lab, UIUC.
Y. Kotani · F. Ino · K. Hagihara Springer Science + Business Media B.V Reporter: 李長霖.
SSS Test Results Scalability, Durability, Anomalies Todd Kordenbrock Technology Consultant Scalable Computing Division Sandia is a multiprogram.
Grid Computing at The Hartford Condor Week 2008 Robert Nordlund
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Processes Introduction to Operating Systems: Module 3.
Computer Science Lecture 7, page 1 CS677: Distributed OS Multiprocessor Scheduling Will consider only shared memory multiprocessor Salient features: –One.
1 11/29/2015 Chapter 6: CPU Scheduling l Basic Concepts l Scheduling Criteria l Scheduling Algorithms l Multiple-Processor Scheduling l Real-Time Scheduling.
Faucets Queuing System Presented by, Sameer Kumar.
Trusted Virtual Machine Images a step towards Cloud Computing for HEP? Tony Cass on behalf of the HEPiX Virtualisation Working Group October 19 th 2010.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
6.1 CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor Scheduling Real-Time Scheduling Algorithm Evaluation.
Timeshared Parallel Machines Need resource management Need resource management Shrink and expand individual jobs to available sets of processors Shrink.
Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.
Group Mission and Approach To enhance Performance and Productivity in programming complex parallel applications –Performance: scalable to thousands of.
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Hierarchical Load Balancing for Large Scale Supercomputers Gengbin Zheng Charm++ Workshop 2010 Parallel Programming Lab, UIUC 1Charm++ Workshop 2010.
Lecture 4 CPU scheduling. Basic Concepts Single Process  one process at a time Maximum CPU utilization obtained with multiprogramming CPU idle :waiting.
CPU scheduling.  Single Process  one process at a time  Maximum CPU utilization obtained with multiprogramming  CPU idle :waiting time is wasted 2.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Computer Science Infrastructure Security for Virtual Cloud Computing Peng Ning 04/08/111BITS/ Financial Services Roundtable Supported by the US National.
Introduction to Load Balancing:
Grid Computing.
Operating Systems Processes Scheduling.
Chapter 6: CPU Scheduling
Operating Systems CPU Scheduling.
Faucets: the Charm++ Clusters Solution Tutorial
CPU Scheduling G.Anuradha
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Basic Grid Projects – Condor (Part I)
Support for ”interactive batch”
3: CPU Scheduling Basic Concepts Scheduling Criteria
CLUSTER COMPUTING.
Chapter5: CPU Scheduling
Unit 1: Introduction to Operating System
Chapter 6: CPU Scheduling
CPU SCHEDULING.
Chapter 5: CPU Scheduling
Outline Chapter 2 (cont) OS Design OS structure
Faucets: Efficient Utilization of Multiple Clusters
Gengbin Zheng, Esteban Meneses, Abhinav Bhatele and Laxmikant V. Kale
Chapter 6: CPU Scheduling
System calls….. C-program->POSIX call
Module 5: CPU Scheduling
Chapter 6: CPU Scheduling
CPU Scheduling.
Module 5: CPU Scheduling
Chapter 5: CPU Scheduling
Presentation transcript:

October 18, 2005 Charm++ Workshop Faucets A Framework for Developing Cluster and Grid Scheduling Solutions Presented by Esteban Pauli Parallel Programming Lab, UIUC

October 18, 2005 Charm++ Workshop Outline Motivation and Goals Motivation and Goals System Overview System Overview Meta Scheduler Meta Scheduler Cluster Scheduler Cluster Scheduler Conclusions Conclusions Future Work Future Work

October 18, 2005 Charm++ Workshop Motivation Clusters are becoming ubiquitous Clusters are becoming ubiquitous Workloads come in bursts, resulting in alternation between low and high utilization Workloads come in bursts, resulting in alternation between low and high utilization Need framework for sharing computing power Need framework for sharing computing power Traditional schedulers care about throughput, not deadlines, priorities, etc. Traditional schedulers care about throughput, not deadlines, priorities, etc.

October 18, 2005 Charm++ Workshop Goals Provide technical and economic framework for allowing organizations to share their resources (clusters) Provide technical and economic framework for allowing organizations to share their resources (clusters) Provide new cluster scheduler which facilitates the above Provide new cluster scheduler which facilitates the above Provide platform for implementing new scheduling strategies Provide platform for implementing new scheduling strategies

October 18, 2005 Charm++ Workshop System Overview Faucets consists of two main components: meta scheduler and cluster scheduler Faucets consists of two main components: meta scheduler and cluster scheduler Meta scheduler provides mechanism for discovering and sharing resourcesMeta scheduler provides mechanism for discovering and sharing resources Cluster scheduler makes scheduling decisions based on local and global workloadCluster scheduler makes scheduling decisions based on local and global workload Components interact to meet users’ needs Components interact to meet users’ needs

October 18, 2005 Charm++ Workshop Central Server Database Cluster Cluster Daemon Scheduler System Architecture User Cluster Cluster Daemon Scheduler User

October 18, 2005 Charm++ Workshop Outline Motivation and Goals Motivation and Goals System Overview System Overview Meta Scheduler Meta Scheduler Cluster Scheduler Cluster Scheduler Conclusions Conclusions Future Work Future Work

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Job Monitor Job Submission Job Specs Bids Job Specs Job Id Cluster

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Users provide job requirements Users provide job requirements System requirements: architecture, number of processors, minimum memory, etc.System requirements: architecture, number of processors, minimum memory, etc. Software requirements: utilities, dynamic libraries, packages, etc.Software requirements: utilities, dynamic libraries, packages, etc. Contract requirements: deadline, reliability, maximum price, etc.Contract requirements: deadline, reliability, maximum price, etc. Use XML, easily expandableUse XML, easily expandable Clusters bid on job Clusters bid on job Winning bidder executes job Winning bidder executes job

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Bidding requires no user intervention Bidding requires no user intervention Clusters bid on jobs based on current conditions Clusters bid on jobs based on current conditions Local utilizationLocal utilization Account balancesAccount balances Depending on scheduling strategy, might not be able to accept all jobs Depending on scheduling strategy, might not be able to accept all jobs

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (10000) Bob (100) Cluster 2 Shared (- 200) John (0)Joan (0)

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (10000) Bob (100) Cluster 2 Shared (- 200) John (0)Joan (0) Bob runs job worth 1000 units on Cluster 2

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (9100) Bob (0) Cluster 2 Shared (- 200) John (0)Joan (0) Bob’s account drained, remaining 900 units come from shared pool

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (9100) Bob (0) Cluster 2 Shared (- 200) John (0)Joan (0) Cluster 2’s policy: 50% to shared, rest divided equally

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (9100) Bob (0) Cluster 2 Shared (300) John (250) Joan (250) Cluster 2’s shared balance up 500, John & Joan get 250 each

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (9100) Bob (0) Cluster 2 Shared (300) John (250) Joan (250) Global balance remains unchanged

October 18, 2005 Charm++ Workshop The Faucets Meta Scheduler Both users and clusters have account balances Both users and clusters have account balances Cluster administrators decide how to share balance among users Cluster administrators decide how to share balance among users Central Server Cluster 1 Shared (9100) Bob (0) Cluster 2 Shared (300) John (250) Joan (250) Have limits to negative balances to prevent freeloading

October 18, 2005 Charm++ Workshop Outline Motivation and Goals Motivation and Goals System Overview System Overview Meta Scheduler Meta Scheduler Cluster Scheduler Cluster Scheduler Conclusions Conclusions Future Work Future Work

October 18, 2005 Charm++ Workshop Cluster Scheduler Traditional schedulers concerned only with throughput – try to have highest possible utilization Traditional schedulers concerned only with throughput – try to have highest possible utilization Faucets cluster scheduler provides different strategies to allow more efficient bidding Faucets cluster scheduler provides different strategies to allow more efficient bidding Leverage run-time systems Leverage run-time systems Flexible design allows for easy implementation of new strategies Flexible design allows for easy implementation of new strategies

October 18, 2005 Deadline-Driven Scheduling (Gantt Chart)   Schedule based on #processors, deadline, wall-time   As new jobs arrive, reschedule meeting all demands  Allows bidding based on deadline – can charge different amounts based on user’s flexibility  Can leverage Charm++ runtime system to shrink and expand jobs Job 1, 4 PE’s, 4 time slices Job 2, 2 PE’s, 6 time slices Job 3, 3 PE’s, 3 time slices New Job, 2 PE’s, 7 time slices P1 P2 P3 P Original schedule P1 P2 P3 P New schedule

October 18, 2005 Priority-Driven Scheduling Leverage Charm++ and other checkpoint/restart mechanisms Leverage Charm++ and other checkpoint/restart mechanisms Priority can be based on rank (military, institutional, etc), price paid, or other factors Priority can be based on rank (military, institutional, etc), price paid, or other factors P1 P2 P3 P4 Original Schedule Job 1, normal priority Job 2, normal priority Job 3, high priority Job 3 arrives Job 3 terminates P1 P2 P3 P4 New Schedule

October 18, 2005 Charm++ Workshop Outline Motivation and Goals Motivation and Goals System Overview System Overview Meta Scheduler Meta Scheduler Cluster Scheduler Cluster Scheduler Conclusions Conclusions Future Work Future Work

October 18, 2005 Charm++ Workshop Conclusions Clusters becoming more common, Faucets provides economic and technical framework for sharing Clusters becoming more common, Faucets provides economic and technical framework for sharing Flexible cluster scheduler allows scheduling based on deadlines, priorities, etc. Flexible cluster scheduler allows scheduling based on deadlines, priorities, etc. Cluster scheduler leverages run-time systems to increase functionality Cluster scheduler leverages run-time systems to increase functionality

October 18, 2005 Charm++ Workshop Future Work NCSA Faculty Fellowship – on- demand access NCSA Faculty Fellowship – on- demand access How do we control anonymous access? How do we control anonymous access? Only allow pre-selected applicationsOnly allow pre-selected applications Virtual machines (Xen, VMWare, etc.)Virtual machines (Xen, VMWare, etc.) Leverage virtualization to allow processor sharing Leverage virtualization to allow processor sharing Re-architect Faucets to make more robust, easier to write strategies Re-architect Faucets to make more robust, easier to write strategies

October 18, 2005 Charm++ Workshop Questions?

October 18, 2005 Charm++ Workshop Thanks!