Scheduling for MW applications

Slides:



Advertisements
Similar presentations
Evaluating the Effects of Business Register Updates on Monthly Survey Estimates Daniel Lewis.
Advertisements

CSCI 4717/5717 Computer Architecture
Hadi Goudarzi and Massoud Pedram
Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.
Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
SLA-Oriented Resource Provisioning for Cloud Computing
LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.
Statistics 1: Introduction to Probability and Statistics Section 3-3.
Cost-based Workload Balancing for Ray Tracing on a Heterogeneous Platform Mario Rincón-Nigro PhD Showcase Feb 17 th, 2012.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Fault-tolerant Adaptive Divisible Load Scheduling Xuan Lin, Sumanth J. V. Acknowledge: a few slides of DLT are from Thomas Robertazzi ’ s presentation.
The many-core architecture 1. The System One clock Scheduler (ideal) distributes tasks to the Cores according to a task map Cores 256 simple RISC Cores,
Performance Evaluation
1 Introduction to Load Balancing: l Definition of Distributed systems. Collection of independent loosely coupled computing resources. l Load Balancing.
GHS: A Performance Prediction and Task Scheduling System for Grid Computing Xian-He Sun Department of Computer Science Illinois Institute of Technology.
A Grid-enabled Branch and Bound Algorithm for Solving Challenging Combinatorial Optimization Problems Authors: M. Mezmaz, N. Melab and E-G. Talbi Presented.
Pregel: A System for Large-Scale Graph Processing
Self Adaptivity in Grid Computing Reporter : Po - Jen Lo Sathish S. Vadhiyar and Jack J. Dongarra.
by B. Zadrozny and C. Elkan
1 The Performance Potential for Single Application Heterogeneous Systems Henry Wong* and Tor M. Aamodt § *University of Toronto § University of British.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
Computer Science Department University of Pittsburgh 1 Evaluating a DVS Scheme for Real-Time Embedded Systems Ruibin Xu, Daniel Mossé and Rami Melhem.
Some Useful Continuous Probability Distributions.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
CS453 Lecture 3.  A sequential algorithm is evaluated by its runtime (in general, asymptotic runtime as a function of input size).  The asymptotic runtime.
Load Balancing The author of these slides is Dr. Arun Sood of George Mason University. Students registered in Computer Science networking courses at GMU.
1 Exploring Custom Instruction Synthesis for Application-Specific Instruction Set Processors with Multiple Design Objectives Lin, Hai Fei, Yunsi ACM/IEEE.
An Energy-Efficient Hypervisor Scheduler for Asymmetric Multi- core 1 Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
ICOM 6115: Computer Systems Performance Measurement and Evaluation August 11, 2006.
Advanced Spectrum Management in Multicell OFDMA Networks enabling Cognitive Radio Usage F. Bernardo, J. Pérez-Romero, O. Sallent, R. Agustí Radio Communications.
Computer Organization and Architecture Tutorial 1 Kenneth Lee.
UAB Dynamic Tuning of Master/Worker Applications Anna Morajko, Paola Caymes Scutari, Tomàs Margalef, Eduardo Cesar, Joan Sorribes and Emilio Luque Universitat.
Network-Aware Scheduling for Data-Parallel Jobs: Plan When You Can
Uppsala, April 12-16th 2010EGEE 5th User Forum1 A Business-Driven Cloudburst Scheduler for Bag-of-Task Applications Francisco Brasileiro, Ricardo Araújo,
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
1/22 Optimization of Google Cloud Task Processing with Checkpoint-Restart Mechanism Speaker: Sheng Di Coauthors: Yves Robert, Frédéric Vivien, Derrick.
Monte-Carlo based Expertise A powerful Tool for System Evaluation & Optimization  Introduction  Features  System Performance.
Copyright © 2014, 2011 Pearson Education, Inc. 1 Active Learning Lecture Slides For use with Classroom Response Systems Chapter 9 Random Variables.
Solving Multistage Stochastic Linear Programs on the Computational Grid Jerry Shen June 8, 2004.
Shouqing Hao Institute of Computing Technology, Chinese Academy of Sciences Processes Scheduling on Heterogeneous Multi-core Architecture.
Miquel Angel Senar Unitat d’Arquitectura de Computadors i Sistemes Operatius Universitat Autònoma de Barcelona Self-Adjusting.
Author Utility-Based Scheduling for Bulk Data Transfers between Distributed Computing Facilities Xin Wang, Wei Tang, Raj Kettimuthu,
Parameter Sweep and Resources Scaling Automation in Scalarm Data Farming Platform J. Liput, M. Paciorek, M. Wrona, M. Orzechowski, R. Slota, and J. Kitowski.
Migration Cost Aware Task Scheduling Milestone Shraddha Joshi, Brian Osbun 10/24/2013.
Dynamic Tuning of Parallel Programs with DynInst Anna Morajko, Tomàs Margalef, Emilio Luque Universitat Autònoma de Barcelona Paradyn/Condor Week, March.
Determining Optimal Processor Speeds for Periodic Real-Time Tasks with Different Power Characteristics H. Aydın, R. Melhem, D. Mossé, P.M. Alvarez University.
Simulation of O2 offline processing – 02/2015 Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture Eugen Mudnić.
CSE 340 Computer Architecture Summer 2016 Understanding Performance.
Unified Adaptivity Optimization of Clock and Logic Signals Shiyan Hu and Jiang Hu Dept of Electrical and Computer Engineering Texas A&M University.
Resource Specification Prediction Model Richard Huang joint work with Henri Casanova and Andrew Chien.
Advanced Algorithms Analysis and Design
Introduction to Load Balancing:
Introduction | Model | Solution | Evaluation
ITEC 451 Network Design and Analysis
CPSC 531: System Modeling and Simulation
Improved schedulability on the ρVEX polymorphic VLIW processor
Department of Computer Science University of California, Santa Barbara
Objective of This Course
Support for ”interactive batch”
Summarizing Data by Statistics
CPU SCHEDULING.
Hawk: Hybrid Datacenter Scheduling
Psych 231: Research Methods in Psychology
Psych 231: Research Methods in Psychology
Parallel k-means++ for Multiple Shared-Memory Architectures
Department of Computer Science University of California, Santa Barbara
COMP755 Advanced Operating Systems
Presentation transcript:

Scheduling for MW applications E. Heymann, M.A. Senar and E. Luque Computer Architecture and Operating Systems Group Universitat Autònoma of Barcelona SPAIN

Contents Statement of the problem. Simulation framework. Simulation results. Implementation on MW. Future work.

Objective To develop and evaluate efficient scheduling policies for parallel applications with the following MW model: for i=1 to M for j=1 to NTasks do F(j) end [computation] In an opportunistic environment Expectation Variance The expectation of the execution time of F(j) is the same (with some variance) for all the executions of the external loop.

Objective How many workers ? How to assign tasks to workers ? Efficiency without forgetting performance. How sensitive is efficiency with respect to variance changes. To optimize the use of the processor resource to get the best efficiency, but considering the efficiency/execution time ratio.

Example 6 8 1 8 5 4 5 4 1 1 4 8 5 Without preoccupation LPTF Policy Efficiency = 6 + 12 = 3 12 + 12 4 8 5 4 1 8 5 4 1 LPTF Policy Efficiency = 9 + 9 = 1 9 + 9

Platforms Dedicated homogeneous machines. Non-dedicated (such as Condor) homogeneous machines. Non-dedicated (such as Condor) heterogeneous machines. 1) Fixed number of machines 2) 3) In an opportunistic environment like Condor.

Policies Simulated LPTF: Largest processing time first LPTF on Average Random Random and Average

LPTF => Implies sorting the tasks Rand&Avg => Implies the computation of the average of the previous execution times of task i. And sorting

Simulation Response Variables: Efficiency Execution Time

Simulation. Factors. Variation 6 24 7 40 8 (5) Simulation. Factors. Variation 35 (4) 28 20% deviation (1) 9 (1) 6 6

Simulation. Factors. Workload

Simulation. Factors Processor Number 31, 100, 300 Standard Deviation 0, 10, 30, 60, 100 External Loop 10, 35, 50, 100 Workload 20%load: 30, 40, 50, 60, 70, 80, 90 20%dist: equal, decreasing 80%dist: equal, decreasing

Simulation Results Dedicated homogeneous machines Efficiency: few big tasks and lots of machines => waste of resources. Loop: does not matter. Deviation: separates the curves when there are many big tasks. Few big tasks: Nothing to do

Simulation Results Dedicated homogeneous machines

Simulation Results Dedicated homogeneous machines

Simulation Results Dedicated homogeneous machines

Simulation Results Dedicated homogeneous machines

Simulation Conclusions Dedicated homogeneous machines Rough analysis: Variance does not seem to make efficiency significantly worse. External loop does not affect efficiency. To achieve an efficiency > 80% and an execution time < 1.1 respect to LPTF execution time, the number of workers should range between 15% of the tasks number (90% load) and 40% (30% load). Few big tasks and lots of machines => Waste of resources. With 30% load, about 33% of the tasks number is needed. With 90% load only 16% is needed.

Simulation Non-dedicated homogeneous machines Factors: Processor Number Standard Deviation External Loop Only 35 Workload Probability of loosing and getting machines Checkpoint Always Never Only for “big” tasks that have been “a long time” in execution The response variables are the same as (A): Efficiency & execution time checkpoint times: SAVE_OVERHEAD = Minute / 2 RESTORE_OVERHEAD = Minute / 2 CHECK_MACHINE_STATE = 2 * Minute (loose a machine with p=15%, win a machine with p=15%)

Simulation Results Non-dedicated homogeneous machines Biggest task = 857.143 Smaller task = 21.538

Simulation Results Non-dedicated homogeneous machines This is only an example, we haven’t done lots of execution and average yet. Probability of getting a machine = Probability of loosing a machine == 15% Ckpt time to save == ckpt time to restore = 30 segs. Dist30-100-1-1: Bigger task = 285 Smaller tasks = 3

Implementation on MW Support to the desired Program Model. Computation of the Efficiency. Scheduling policies Random Random & Average Efficiency: Always with the master clock

Implementation on MW Statistics S to get Eff. Policy: Rand || iteration 1 iteration 2 iteration M Statistics to get Eff. S Policy: Rand || Rand & Avg S

Future Non-dedicated homogeneous machines: Complete the simulations. Duplication of large tasks. (?) Non-dedicated heterogeneous machines: Use dynamic load information (provided by Condor) to rank machines. Implementation on MW: Test the MW scheduling policies with large applications. Lo de tener etiquetas de seguridad en las maquinas, lo tendria que proveer condor. To ask Condor for getting information about machines safety. Complete simulation includes doing proofs with different CKPT_OVERHEAD time, and when loosing more machines than winning machines, and trying some other winning/loosing machine probability.