Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison IEEE Transactions on Parallel and Distributed Systems, Vol.

Slides:

Advertisements

Similar presentations

Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.

Advertisements

Management and Control of Domestic Smart Grid Technology IEEE Transactions on Smart Grid, Sep Albert Molderink, Vincent Bakker Yong Zhou

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Using Parallel Genetic Algorithm in a Predictive Job Scheduling

Building a Distributed Full-Text Index for the Web S. Melnik, S. Raghavan, B.Yang, H. Garcia-Molina.

Particle swarm optimization for parameter determination and feature selection of support vector machines Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen,

Analog Circuits for Self-organizing Neural Networks Based on Mutual Information Janusz Starzyk and Jing Liang School of Electrical Engineering and Computer.

A system Performance Model Instructor: Dr. Yanqing Zhang Presented by: Rajapaksage Jayampthi S.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

1 Accurate Power Grid Analysis with Behavioral Transistor Network Modeling Anand Ramalingam, Giri V. Devarayanadurg, David Z. Pan The University of Texas.

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.

Dynamic Tuning of the IEEE Protocol to Achieve a Theoretical Throughput Limit Frederico Calì, Marco Conti, and Enrico Gregori IEEE/ACM TRANSACTIONS.

A Grid Resource Broker Supporting Advance Reservations and Benchmark- Based Resource Selection Erik Elmroth and Johan Tordsson Reporter ： S.Y.Chen.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

1 Peer-To-Peer-Based Resource Discovery In Global Grids: A Tutorial Rajiv Ranjan, Aaron Harwood And Rajkumar Buyya, The University Of Melbbourne IEEE Communications.

Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The

Estimating Multimedia Instruction Performance Based on Workload Characterization and Measurement Gheewala, A.; Peir, J.-K.; Yen-Kuang Chen; Lai, K.; IEEE.

Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman

Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.

Chapter 14 Simulation. Monte Carlo Process Statistical Analysis of Simulation Results Verification of the Simulation Model Computer Simulation with Excel.

Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.

On Fairness, Optimizing Replica Selection in Data Grids Husni Hamad E. AL-Mistarihi and Chan Huah Yong IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,

Redundant Parallel File Transfer with Anticipative Adjustment Mechanism in Data Grids Chao-Tung Yang, Yao-Chun Chi, Chun-Pin Fu, High-Performance Computing.

Customized Dynamic Load Balancing for a Network of Workstations Taken from work done by: Mohammed Javeed Zaki, Wei Li, Srinivasan Parthasarathy Computer.

1 Incentive-Based Scheduling for Market-Like Computational Grids Lijuan Xiao, Yanmin Zhu, Member, IEEE, Lionel M. Ni, Fellow, IEEE, and Zhiwei Xu, Senior.

1 A Cooperative Game Framework for QoS Guided Job Allocation Schemes in Grids Riky Subrata, Member, IEEE, Albert Y. Zomaya, Fellow, IEEE, and Bjorn Landfeldt,

1 Validation & Verification Chapter VALIDATION & VERIFICATION Very Difficult Very Important Conceptually distinct, but performed simultaneously.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Performance Evaluation of Parallel Processing. Why Performance?

Presented By HaeJoon Lee Yanyan Shen, Beng Chin Ooi, Bogdan Marius Tudor National University of Singapore Wei Lu Renmin University Cang Chen Zhejiang University.

MS Thesis Defense “IMPROVING GPU PERFORMANCE BY REGROUPING CPU-MEMORY DATA” by Deepthi Gummadi CoE EECS Department April 21, 2014.

Performance Model & Tools Summary Hung-Hsun Su UPC Group, HCS lab 2/5/2004.

Distributed Demand Scheduling Method to Reduce Energy Cost in Smart Grid Humanitarian Technology Conference (R10-HTC), 2013 IEEE Region 10 Akiyuki Imamura,

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

Scientific Workflow Scheduling in Computational Grids Report: Wei-Cheng Lee 8th Grid Computing Conference IEEE 2007 – Planning, Reservation,

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

Heterogeneity-Aware Peak Power Management for Accelerator-based Systems Heterogeneity-Aware Peak Power Management for Accelerator-Based Systems Gui-Bin.

Evaluating FERMI features for Data Mining Applications Masters Thesis Presentation Sinduja Muralidharan Advised by: Dr. Gagan Agrawal.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

CISC Machine Learning for Solving Systems Problems Presented by: Alparslan SARI Dept of Computer & Information Sciences University of Delaware

Simulation is the process of studying the behavior of a real system by using a model that replicates the behavior of the system under different scenarios.

Robustness of complex networks with the local protection strategy against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Wayne.

UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.

Hard Real-Time Scheduling for Low- Energy Using Stochastic Data and DVS Processors Flavius Gruian Department of Computer Science, Lund University Box 118.

5 May CmpE 516 Fault Tolerant Scheduling in Multiprocessor Systems Betül Demiröz.

Computing Simulation in Orders Based Transparent Parallelizing Pavlenko Vitaliy Danilovich, Odessa National Polytechnic University Burdeinyi Viktor Viktorovych,

Distributed Process Scheduling : A Summary

A Hyper-heuristic for scheduling independent jobs in Computational Grids Author: Juan Antonio Gonzalez Sanchez Coauthors: Maria Serna and Fatos Xhafa.

Design and Implementation of a High-Performance PMLSM Drives Using DSP Chip Student : Le Thi Van Anh Advisor : Ying-Shieh Kung IEEE TRANSACTIONS ON INDUSTRIAL.

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

A Bandwidth Scheduling Algorithm Based on Minimum Interference Traffic in Mesh Mode Xu-Yajing, Li-ZhiTao, Zhong-XiuFang and Xu-HuiMin International Conference.

Incremental Run-time Application Mapping for Heterogeneous Network on Chip 2012 IEEE 14th International Conference on High Performance Computing and Communications.

Outline  Introduction  Subgraph Pattern Matching  Types of Subgraph Pattern Matching  Models of Computation  Distributed Algorithms  Performance.

Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.

1 Hierarchical Parallelization of an H.264/AVC Video Encoder A. Rodriguez, A. Gonzalez, and M.P. Malumbres IEEE PARELEC 2006.

Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.

1 Performance Impact of Resource Provisioning on Workflows Gurmeet Singh, Carl Kesselman and Ewa Deelman Information Science Institute University of Southern.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

Zhaoxia Fu, Yan Han Measurement Volume 45, Issue 4, May 2012, Pages 650–655 Reporter: Jing-Siang, Chen.

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

15/02/2006CHEP 061 Measuring Quality of Service on Worker Node in Cluster Rohitashva Sharma, R S Mundada, Sonika Sachdeva, P S Dhekne, Computer Division,

1 Parallel Mining of Closed Sequential Patterns Shengnan Cong, Jiawei Han, David Padua Proceeding of the 11th ACM SIGKDD international conference on Knowledge.

Edinburgh Napier University

Speaker: Po-Hung Chen Advisor: Dr. Ho-Ting Wu 2016/10/12

Shih-Wei Lin, Kuo-Ching Ying, Shih-Chieh Chen, Zne-Jung Lee

Ho-Ramamoorthy 2-Phase Deadlock Detection Algorithm

Presentation transcript:

Dynamic Load Balancing and Job Replication in a Global-Scale Grid Environment: A Comparison IEEE Transactions on Parallel and Distributed Systems, Vol. 20, No. 2, February 2009 Menno Dobber, Student Member, IEEE, Rob van der Mei, and Ger Koole Present by Chen, Ting-Wei

Chen, Ting-Wei2 Index Introduction Introduction Preliminaries Preliminaries Experimental Setup Experimental Setup Experimental Results Experimental Results Conclusions Conclusions

Chen, Ting-Wei3 Introduction (cont.) Dynamics of grid environments Dynamics of grid environments Dynamic Load Balancing Dynamic Load Balancing Job Replication Job Replication Easy-to-measure statistic Y Easy-to-measure statistic Y Corresponding threshold value Y* If Y>Y* ……DLB outperforms JR If Y>Y* ……DLB outperforms JR If Y<Y* ……JR outperforms DLB If Y<Y* ……JR outperforms DLB

Chen, Ting-Wei4 Introduction (cont.) Easy-to-implement approach Easy-to-implement approach Make dynamic decisions about whether to use DLB or JR Make dynamic decisions about whether to use DLB or JR Two types of investigations accurately verify Two types of investigations accurately verify Trace-driven simulation Trace-driven simulation Real implementation Real implementation

Chen, Ting-Wei5 Introduction (cont.) Real implementation Real implementation To acquire more knowledge about DLB To acquire more knowledge about DLB Means of trace-driven simulations Means of trace-driven simulations Require detailed knowledge about the processes Require detailed knowledge about the processes Take less time Take less time More extensive analyses can be performed More extensive analyses can be performed

Chen, Ting-Wei6 Introduction (cont.) Analyze and compare the effectiveness of ELB, DLB, and JR Analyze and compare the effectiveness of ELB, DLB, and JR Using trace-driven simulations Using trace-driven simulations Gathering from a global-scale grid testbed Gathering from a global-scale grid testbed

Chen, Ting-Wei7 Preliminaries (cont.) Bulk Synchronous Processing (BSP) Bulk Synchronous Processing (BSP) Problem can be divided into subproblems or jobs Problem can be divided into subproblems or jobs I iterations, P jobs, P processes I iterations, P jobs, P processes Each processor receives one job per iteration Each processor receives one job per iteration After computing the jobs, all the processors send their data and wait for each others data before the next iteration starts After computing the jobs, all the processors send their data and wait for each others data before the next iteration starts The standard BSP program is implemented according to the ELB principle The standard BSP program is implemented according to the ELB principle

Chen, Ting-Wei8 Preliminaries (cont.) Implementations on ELB Implementations on ELB

Chen, Ting-Wei9 Preliminaries (cont.) Dynamic Load Balancing (DLB) Dynamic Load Balancing (DLB) DLB starts with the execution of an iteration is the same with BSP DLB starts with the execution of an iteration is the same with BSP At the end of each iteration, the processors predict their processing speed for the next iteration At the end of each iteration, the processors predict their processing speed for the next iteration Select one processor to be the DLB scheduler Select one processor to be the DLB scheduler After every N iterations, the processors send their prediction to this scheduler After every N iterations, the processors send their prediction to this scheduler

Chen, Ting-Wei10 Preliminaries (cont.) The processor calculate the “optimal” distribution The processor calculate the “optimal” distribution Send relevant information to each processor Send relevant information to each processor All processors redistribute the load All processors redistribute the load

Chen, Ting-Wei11 Preliminaries (cont.) Implementations on DLB Implementations on DLB

Chen, Ting-Wei12 Preliminaries (cont.) Job Replication (JR) Job Replication (JR) Two copies of a job Two copies of a job R copies of all P jobs have been distributed to P processors. R copies of all P jobs have been distributed to P processors. A processor has finished one of the copies, it sends a message to the other processors A processor has finished one of the copies, it sends a message to the other processors The other processors can kill the job and start the next job The other processors can kill the job and start the next job

Chen, Ting-Wei13 Preliminaries (cont.) Implementations on JR Implementations on JR

Chen, Ting-Wei14 Experimental Setup (cont.) Data-Collection Procedure Data-Collection Procedure

Chen, Ting-Wei15 Experimental Setup (cont.) Completely available Pentium 4, 3.0-GHz processor, the computations in the jobs would take ms Completely available Pentium 4, 3.0-GHz processor, the computations in the jobs would take ms Set one’s job times are ms (average) Set one’s job times are ms (average) Distributed within the USA Distributed within the USA More coherence between the generated datasets More coherence between the generated datasets Set two’s job times are ms (average) Set two’s job times are ms (average) Show more burstiness and have higher differences between the average job times on the processors Show more burstiness and have higher differences between the average job times on the processors Globally distributed Globally distributed

Chen, Ting-Wei16 Experimental Setup (cont.) Trace-driven simulation analyses Trace-driven simulation analyses with with, and, and with with

Chen, Ting-Wei17 Experimental Setup (cont.) Simulation Details Simulation Details Trace-driven DLB simulations Trace-driven DLB simulations Assume a linear relation between the job size and their job times in BSP Assume a linear relation between the job size and their job times in BSP

Chen, Ting-Wei18 Experimental Setup (cont.) DLB simulation DLB simulation 1. Randomly select a resource set 2. The DES-based prediction 3. Derive the IT 4. Derive the runtime of the R-JR 5. Derive the expected runtime of a DLB run

Chen, Ting-Wei19 Experimental Setup (cont.) JT simulation JT simulation 1. The same with step one of the DLB simulation 2. Divide the set of processors in execution groups 3. Drive the effective job times for all P processors 4. Derive the IT by repeating step two R times 5. Derive the runtime of the R-JR run by repeating step three 6. Derive the expected runtime of an R-JR run on P processors

Chen, Ting-Wei20 Experimental Setup (cont.) Dynamic Selection Method Dynamic Selection Method Analysis Analysis

Chen, Ting-Wei21 Experimental Results (cont.) Simulate the runtimes of DLB for different numbers of processors with set one and two Simulate the runtimes of DLB for different numbers of processors with set one and two Simulate runs of BSP parallel applications that use JR and analyze the expected speedups for different numbers of processors, replication, data sets and CCR values Simulate runs of BSP parallel applications that use JR and analyze the expected speedups for different numbers of processors, replication, data sets and CCR values

Chen, Ting-Wei22 Experimental Results (cont.) Compare the results of the runtimes and the speedups of the ELB, DLB, and JR Compare the results of the runtimes and the speedups of the ELB, DLB, and JR Simulate the speedups of the proposed selection method Simulate the speedups of the proposed selection method

Chen, Ting-Wei23 Experimental Results (cont.) DLB DLB

Chen, Ting-Wei24 Experimental Results (cont.) Job Replication Job Replication

Chen, Ting-Wei25 Experimental Results (cont.) Comparison of ELB, DLB, and JR Comparison of ELB, DLB, and JR Runtimes of DLB and JR with CCR 0.01 Runtimes of DLB and JR with CCR 0.01

Chen, Ting-Wei26 Experimental Results (cont.) Speedups of DLB and JR with sets of 40 and 90 data sets with CCR 0.01 Speedups of DLB and JR with sets of 40 and 90 data sets with CCR 0.01

Chen, Ting-Wei27 Experimental Results (cont.) Statistic Y against ITs of DLB and JR Statistic Y against ITs of DLB and JR

Chen, Ting-Wei28 Experimental Results (cont.) Speedup of selection method, DLB and JR Speedup of selection method, DLB and JR

Chen, Ting-Wei29 Conclusions Made an extensive assessment and comparison between DLB and JR Made an extensive assessment and comparison between DLB and JR Y>Y* ……DLB outperforms JR Y>Y* ……DLB outperforms JR Y<Y* ……JR outperforms DLB Y<Y* ……JR outperforms DLB Propose the so-called DLB/JR method Propose the so-called DLB/JR method

Chen, Ting-Wei30 Outlook Bring the result to a higher level of reality Bring the result to a higher level of reality Make use of mathematical techniques to provide a more solid foundation Make use of mathematical techniques to provide a more solid foundation Determine the optimal number of job replicas needed to obtain the best speedup performance Determine the optimal number of job replicas needed to obtain the best speedup performance

Thanks for your attention