Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

7 april SP3.1: High-Performance Distributed Computing The KOALA grid scheduler and the Ibis Java-centric grid middleware Dick Epema Catalin Dumitrescu,
Distributed Systems Major Design Issues Presented by: Christopher Hector CS8320 – Advanced Operating Systems Spring 2007 – Section 2.6 Presentation Dr.
Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.
A Dynamic World, what can Grids do for Multi-Core computing? Daniel Goodman, Anne Trefethen and Douglas Creager
A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.
Scheduling of parallel jobs in a heterogeneous grid environment Scheduling of parallel jobs in a heterogeneous grid environment Each site has a homogeneous.
Reference: Message Passing Fundamentals.
GridFlow: Workflow Management for Grid Computing Kavita Shinde.
Computer Science Department 1 Load Balancing and Grid Computing David Finkel Computer Science Department Worcester Polytechnic Institute.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.
Cs238 CPU Scheduling Dr. Alan R. Davis. CPU Scheduling The objective of multiprogramming is to have some process running at all times, to maximize CPU.
EEC-681/781 Distributed Computing Systems Lecture 3 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University
Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,
An Adaptive Multi-Objective Scheduling Selection Framework For Continuous Query Processing Timothy M. Sutherland Bradford Pielech Yali Zhu Luping Ding.
On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,
DISTRIBUTED COMPUTING
An adaptive framework of multiple schemes for event and query distribution in wireless sensor networks Vincent Tam, Keng-Teck Ma, and King-Shan Lui IEEE.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Tomographic mammography parallelization Juemin Zhang (NU) Tao Wu (MGH) Waleed Meleis (NU) David Kaeli (NU)
Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.
Meta Scheduling Sathish Vadhiyar Sources/Credits/Taken from: Papers listed in “References” slide.
High Throughput Computing on P2P Networks Carlos Pérez Miguel
IEEE Globecom 2010 Tan Le Yong Liu Department of Electrical and Computer Engineering Polytechnic Institute of NYU Opportunistic Overlay Multicast in Wireless.
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,
1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.
The Grid System Design Liu Xiangrui Beijing Institute of Technology.
BOF: Megajobs Gracie: Grid Resource Virtualization and Customization Infrastructure How to execute hundreds of thousands tasks concurrently on distributed.
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Silberschatz and Galvin  Operating System Concepts Module 5: CPU Scheduling Basic Concepts Scheduling Criteria Scheduling Algorithms Multiple-Processor.
Co-Grid: an Efficient Coverage Maintenance Protocol for Distributed Sensor Networks Guoliang Xing; Chenyang Lu; Robert Pless; Joseph A. O ’ Sullivan Department.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Tools for collaboration How to share your duck tales…
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
Lecture 4 TTH 03:30AM-04:45PM Dr. Jianjun Hu CSCE569 Parallel Computing University of South Carolina Department of.
Performance Analysis of Preemption-aware Scheduling in Multi-Cluster Grid Environments Mohsen Amini Salehi, Bahman Javadi, Rajkumar Buyya Cloud Computing.
Copyright © 2006, GemStone Systems Inc. All Rights Reserved. Increasing computation throughput with Grid Data Caching Jags Ramnarayan Chief Architect GemStone.
OPERATING SYSTEMS CS 3530 Summer 2014 Systems and Models Chapter 03.
International Symposium on Grid Computing (ISGC-07), Taipei - March 26-29, 2007 Of 16 1 A Novel Grid Resource Broker Cum Meta Scheduler - Asvija B System.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.
Basic Concepts Maximum CPU utilization obtained with multiprogramming
-1/16- Maximum Battery Life Routing to Support Ubiquitous Mobile Computing in Wireless Ad Hoc Networks C.-K. Toh, Georgia Institute of Technology IEEE.
Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.
Towards a High Performance Extensible Grid Architecture Klaus Krauter Muthucumaru Maheswaran {krauter,
OPERATING SYSTEMS CS 3502 Fall 2017
Parallel Programming By J. H. Wang May 2, 2017.
Parallel Programming in C with MPI and OpenMP
Chapter 6: CPU Scheduling
Unistore: Project Updates
CPU Scheduling G.Anuradha
Module 5: CPU Scheduling
ExaO: Software Defined Data Distribution for Exascale Sciences
"Developing an Efficient Sparse Matrix Framework Targeting SSI Applications" Diego Rivera and David Kaeli The Center for Subsurface Sensing and Imaging.
3: CPU Scheduling Basic Concepts Scheduling Criteria
CLUSTER COMPUTING.
Variability 8/24/04 Paul A. Jensen
Chapter 6: CPU Scheduling
Operating System , Fall 2000 EA101 W 9:00-10:00 F 9:00-11:00
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Parallel Programming in C with MPI and OpenMP
Chapter 6: CPU Scheduling
Module 5: CPU Scheduling
Presentation transcript:

Scheduling MPI Workflow Applications on Computing Grids Juemin Zhang, Waleed Meleis, and David Kaeli Electrical and Computer Engineering Department, Northeastern University, Boston, MA jzhang, meleis, Acknowledgement This work is supported in part by CenSSIS, the Center for Subsurface Sensing and Imaging Systems, under the Engineering Research Centers Program of the National Science Foundation (Award # EEC ).

R1 R2 Fundamental Science Validating TestBEDs L1 L2 L3 R3 Image and data Information management S1 S4 S5 S3S2 Bio-MedEnviro-Civil Value added to Censsis This work falls under Research thrust R3, image and data information management. This work can be applied to image analysis applications in all three levels, including modeling, simulation as well as other areas requiring intensive computation or accessing distributed data set.

Grid-computing Grid problem flexible, secure, coordinated resource sharing among a dynamic collection of individuals, institutions and resources – referred to as virtual organizations. (From “The Anatomy of The Grid”, by I. Foster, C. Kesselman, and S. Tuecke) Computing Grid: Multiple independently managed computing sites which are connected to a public network through gateway nodes Computing site: Collection of computing resources (nodes) Single administrative domain (batch job system) Local/private network: connecting all computing resources

Why Grid-computing Characteristics of computing resources Increasing number of distributed computing and storage resources are available Low latency and high bandwidth inter-connection Unbalanced loads among resources Characteristics of imaging applications: Large problems, requiring lots of computation and storage resources Distribute properties, from data acquisition to data access, tend to be distributed among multiple sits. High costs for centralized solution over distributed one

MPI Workflow Workflow A workflow consists of multiple dependent or concurrent tasks. Dependency: task needs to executed in order Concurrency: tasks are executed in parallel across multiple computing sites MPI Workflow A task is a parallel MPI execution on multiple computing nodes within a computing site

Tomosysnthesis application The Tomosynthesis image reconstruction process consists of multiple functional tasks, which are executed complying with data their dependency. Tasks are parallelized using MPI library, but each exhibits different parallelism.

Problem Definition Executing MPI Workflow on Grids Mapping tasks to computing sites Objective: Performance – tuning the application turn- around time Minimize request queuing time and execution time Throughput – maximize the numbers applications processed during a period of time Resource utilization

MPI Workflow Scheduler Mapping tasks to computing sites Input: Petri net – workflow execution Task specification: number of nodes  Network and physical location transparent Tasks are scheduled, submitted and executed on computing sites of a grid without user interference Minimize the task request queuing time Minimize the resource co-allocation coordination time

Scheduler Design Part of the complete framework supporting execution of MPI workflow on grids Message relay, task grouping and task scheduler. Parallel approach One scheduler process is running on a gateway/headnode of each computing site Message passing is used for inter-process communication Local workload information query Local task submission Collective scheduling decision making process

Task Scheduler Structure

Task Scheduling Algorithm Objective: For a given task, find a computing site which may yield the shortest queuing time. Task scheduling scheme Predict site with the shortest queuing time Ranking computing sites by: The queuing length The estimated queuing time: the queuing length divided by the average system throughput The number of available resource

Task Scheduling on Grids Limitations of single-site scheduling decision Rank is correlated with the task queuing time Assumption: the higher rank may lead to shorter queuing time (not true) Dynamically changing workloads: After tasks are submitted, ranking order may change Our solutions Duplicate the task request and submit them to different computing sites Using task grouping to resolve redundant task executions at runtime (during MPI initialization) The first running task continues Other redundant task executions starting later will be terminated automatically

Duplicate Task Submission Advantage of task duplication Dynamically selecting which site to run the task Flooding all computing sites leads the shortest queue time No need to guarantee which computing site has the shortest queuing time Side-effect There are extra copies of task’s requests on different computing sites – higher workload Increase the job queue length and change the job queue scheduling behavior Flooding all computing sites is not favorable for resource management Overheads in resolving duplications.

Modeling Environment Csim based simulation Computing site: job queue First-come-first-serve, Backfill EASY, and backfill conservative Random workload generation: Inter-arrival time: exponential distribution Job execution time: Zipf distribution Job size: Poisson distribute+higher probability on some special job sizes. Task scheduling schemes: Random selection The queue length, Estimated queue time (queue length / system throughput) available resources

Environment Structure Settings multiple computing sites grid Local workload: 100,000 local jobs for each computing site Global workload:10,000 global tasks for all sites 0.5 ~ 0.75 workload level for all computing sites

Algorithm Comparison 8-site computing grid No duplication is used for each global task

Duplication and Impact 8-site grid simulation Each site uses backfill conservative queue with 0.7 workload, Global task scheduler: the queue length scheduling scheme

Resource co-allocation

Conclusion When workload is low The available resource scheduling scheme has the best performance, no task duplication is required. When workload is high (all computing sites are busy) Random select is worse than others The cost of a bad scheduling decision is very high. The queue length and estimate queue time scheme achieve similar performance. Two or three duplications can reduce the average task queuing time by a factor of 3 to 5 No negative impact on local job queuing systems or local jobs