1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.

Slides:

Advertisements

Similar presentations

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

Advertisements

Slides for Parallel Programming Techniques & Applications Using Networked Workstations & Parallel Computers 2nd ed., by B. Wilkinson & M

ECE 667 Synthesis and Verification of Digital Circuits

Dynamic Thread Assignment on Heterogeneous Multiprocessor Architectures Pree Thiengburanathum Advanced computer architecture Oct 24,

Efficient Scheduling Algorithms for resource management Denis Trystram ID-IMAG CSC, Toulouse, june 23, 2005.

CSC 421: Algorithm Design & Analysis

Fast Algorithms For Hierarchical Range Histogram Constructions

Scheduling Mixed Parallel Applications with Reservations Henri Casanova Information and Computer Science Dept. University of Hawai`i at Manoa

Martha Garcia.  Goals of Static Process Scheduling  Types of Static Process Scheduling  Future Research  References.

Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.

Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.

Multiobjective VLSI Cell Placement Using Distributed Simulated Evolution Algorithm Sadiq M. Sait, Mustafa I. Ali, Ali Zaidi.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

ASWP – Ad-hoc Routing with Interference Consideration June 28, 2005.

System Partitioning Kris Kuchcinski

A Tool for Partitioning and Pipelined Scheduling of Hardware-Software Systems Karam S Chatha and Ranga Vemuri Department of ECECS University of Cincinnati.

Fundamental Techniques

Scheduling Parallel Task

1 Presenter: Ming-Shiun Yang Sah, A., Balakrishnan, M., Panda, P.R. Design, Automation & Test in Europe Conference & Exhibition, DATE ‘09. A Generic.

VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.

An Effective Dynamic Scheduling Runtime and Tuning System for Heterogeneous Multi and Many-Core Desktop Platforms Authous: Al’ecio P. D. Binotto, Carlos.

Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.

ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,

1 Distributed Process Scheduling: A System Performance Model Vijay Jain CSc 8320, Spring 2007.

Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Parallel Programming in C with MPI and OpenMP Michael J. Quinn.

Static Process Schedule Csc8320 Chapter 5.2 Yunmei Lu

GRID’2012 Dubna July 19, 2012 Dependable Job-flow Dispatching and Scheduling in Virtual Organizations of Distributed Computing Environments Victor Toporkov.

A Survey of Distributed Task Schedulers Kei Takahashi (M1)

1 Customer-Aware Task Allocation and Scheduling for Multi-Mode MPSoCs Lin Huang, Rong Ye and Qiang Xu CHhk REliable computing laboratory (CURE) The Chinese.

 A System Performance Model  Static Process Scheduling  Dynamic Load Sharing and Balancing  Real-Time Scheduling.

LATA: A Latency and Throughput- Aware Packet Processing System Author: Jilong Kuang and Laxmi Bhuyan Publisher: DAC 2010 Presenter: Chun-Sheng Hsueh Date:

Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter

Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.

1 11 Channel Assignment for Maximum Throughput in Multi-Channel Access Point Networks Xiang Luo, Raj Iyengar and Koushik Kar Rensselaer Polytechnic Institute.

Task Graph Scheduling for RTR Paper Review By Gregor Scott.

6. A PPLICATION MAPPING 6.3 HW/SW partitioning 6.4 Mapping to heterogeneous multi-processors 1 6. Application mapping (part 2)

O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.

Optimal Resource Allocation for Protecting System Availability against Random Cyber Attack International Conference Computer Research and Development(ICCRD),

Building a Distributed Full-Text Index for the Web by Sergey Melnik, Sriram Raghavan, Beverly Yang and Hector Garcia-Molina from Stanford University Presented.

© 2009 Ilya O. Ryzhov 1 © 2008 Warren B. Powell 1. Optimal Learning On A Graph INFORMS Annual Meeting October 11, 2009 Ilya O. Ryzhov Warren Powell Princeton.

Thinking in Parallel – Implementing In Code New Mexico Supercomputing Challenge in partnership with Intel Corp. and NM EPSCoR.

Dzmitry Kliazovich University of Luxembourg, Luxembourg

Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.

Static Process Scheduling

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Nov 3, 2005.

Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.

A System Performance Model Distributed Process Scheduling.

Pour Michel Hello, Tu peux trouver dans ce ppt 3 parties, je te laisse te servir. - L’outil réalisé par GRAAL et pour la communauté de Grid’5000: GRUDU.

Introduction to Real-Time Systems

A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,

Scheduling in computational grids with reservations Denis Trystram LIG-MOAIS Grenoble University, France AEOLUS, march 9, 2007.

Name : Mamatha J M Seminar guide: Mr. Kemparaju. GRID COMPUTING.

Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.

Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,

OPERATING SYSTEMS CS 3502 Fall 2017

Introduction | Model | Solution | Evaluation

CprE 458/558: Real-Time Systems

Chapter 6: CPU Scheduling

Module 5: CPU Scheduling

Chapter 6: CPU Scheduling

Presented By: Darlene Banta

Chapter 6: CPU Scheduling

Module 5: CPU Scheduling

Parallel Programming in C with MPI and OpenMP

Chapter 6: CPU Scheduling

Kostas Kolomvatsos, Christos Anagnostopoulos

CSC 421: Algorithm Design & Analysis

Module 5: CPU Scheduling

Presentation transcript:

1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11

LOGO 2 Contents Scheduling Heuristics 3 Introduction 1 Simulation Results 4 Related Works 2 Conclusions and Future Works 5

LOGO 3 Contents Scheduling Heuristics 3 Introduction 1 Experimental Results 4 Related Works 2 Conclusions and Future Works 5

LOGO 4 General Purpose  Context : global warming and climate fluctuations  Numerical simulations using general circulation models of a climate system atmosphere ocean continental surfaces  Climatologists’ purpose  estimate global warming simulations’ sensitivity with respect to the model’s parameterization  Climate forecast application provided by CERFACS within the LEGO project Introduction

LOGO 5 Our Goal  Analyze the application  Model its needs Execution model Data access pattern Computing needs  Elaborate, test and compare appropriate scheduling heuristics  Provide generic scheduling schemes for applications with similar dependence graphs Introduction

LOGO 6 Application Description  “Scenario” simulations  current climate followed by 21 st century for 150 years (1800 months)  different parameterization of atmospheric model Introduction

LOGO 7 Application Description  One monthly simulation : concatenate_atmospheric_input_files(1)modify_parameters(1) process_coupled_run convert_output_format(60) compress_diagonals(30)extract_minimun_information(30)  atmospheric model (ARPEGE)  ocean and sea-ice model (OPA)  runoff pathway (TRIP)  coupler (OASIS) Introduction

LOGO 8 Application Description Introduction

LOGO 9 Contents Scheduling Heuristics 3 Introduction 1 Experimental Results 4 Related Works 2 Conclusions 5

LOGO 10 Related Works  Multiple DAGs Scheduling  Mixed Parallelism  Pipelined Data Parallel Tasks Related Works

LOGO 11 Multiple DAGs Scheduling  Directed Acyclic Graph (DAG)  Nodes – tasks  Edges – precedence constraints  Multiple DAGs Scheduling Related Works

LOGO 12 Multiple DAGs Scheduling  Composite DAG Related Works

LOGO 13 Multiple DAGs Scheduling  Group DAGs’ tasks in levels of independent tasks Related Works

LOGO 14 Related Works – Multiple DAGs Scheduling  Composite DAG and round-robin policy of scheduling among DAGs  Composite DAG & ranking based composition Related Works

LOGO 15 Mixed Parallelism  Parallel scientific application  Data parallelism  Task parallelism  Mixed parallelism  Scheduling a DAG on a finite number of resources – NP complete even for the simple case of mono- processor tasks  Heuristic approaches Related Works

LOGO 16 Mixed Parallelism  A. Radulescu & A. Gemund (2001) – 2 step heuristic - CPA (Critical Path and Area based Scheduling)  Processors allocation to tasks - based on a compromise between the critical path length and the processor utilization  Task allocation on processors - list scheduling heuristic Related Works

LOGO 17 Pipelined Data Parallel Tasks  Computations consisting of a chain of data- parallel tasks that process successive data sets in a pipeline fashion – particular case of mixed parallelism  2 key metrics to be optimized:  Latency- duration of processing a data-set  Throughput- rate at which data sets can be processed Related Works

LOGO 18 Related Works – Pipelined Data Parallel Tasks  Aspects to be considered :  Clustering of successive stages into modules Reduces communications Improves latency  Replicating modules Improves throughput Increases latency Related Works

LOGO 19 Contents Scheduling Heuristics 3 Introduction 1 Experimental Results 4 Related Works 2 Conclusions 5

LOGO 20 Scheduling Heuristics  Climate Application Scheduling  Generic Scheduling Heuristics Scheduling Heuristics

LOGO 21 Climate Application Scheduling  Homogeneous platform composed of R resources  Communication assumed contention-free through NFS  Tasks execution time is assumed to include the necessary time to  access the data  redistribute it to processors  effective computing time  store back the data Scheduling Heuristics

LOGO 22 Climate Application Scheduling concatenate_atmosferic_input_files(1)modify_parameters(1) process_coupled_run convert_output_format(60) compress_diagonals(30)extract_minimun_information(30) Main processing Post processing Scheduling Heuristics

LOGO 23 Climate Application Scheduling  We divide processors into disjoint sets on which multi-processor tasks can execute  All multi-processor tasks execute on the same number of resources G, defining a certain grouping of resources  For the given application, 8 possible values for the parameter G (4 → 11) Scheduling Heuristics

LOGO 24 Climate Application Scheduling  Case 1  Case 2 Scheduling Heuristics

LOGO 25 Climate Application Scheduling  The makespan is computed analytically as a function of  number of resources R;  grouping G ;  number of months in an independent simulation (NM)  number of independent simulations (NS).  The grouping G yielding the smallest makespan is chosen Scheduling Heuristics

LOGO 26 Climate Application Scheduling  The constraint of scheduling all multi-processor tasks on the same number of resources is tight  Eg. R=53, NS=10, NM=1800, found optimal grouping G = 7; –49 resources for main processing; –1 resource used for the corresponding post-processing –3 resources unused. however, 3 groups with 8 resources and 4 groups with 7 resources – 4.5% of gain Scheduling Heuristics

LOGO 27 Climate Application Scheduling  Possibilities for improvement :  Heuristic 1 distribute evenly the unused resources among the existing groups  Heuristic 2 use all resources for multi-processor tasks (evenly distributing the extra-resources among processor groups) all post-processing at the end  Heuristic 3 use all resources for multi-processor tasks and model the problem as an instance of the knapsack problem all post-processing at the end Scheduling Heuristics

LOGO 28 Climate Application Scheduling  Knapsack problem modelization  Items – the 8 possibilities (groupings of resources) for allocating processors to multi-processor tasks (4 → 11)  Cost of an item – the number of resources of that grouping  Value of a grouping G – 1/T[G] – the fraction of a multi- processor task that gets executed in a time unit on G resources  Unknowns n i (i=4 → 11) – number of groups with i resources in the final solution  Constraints  Goal : maximize Scheduling Heuristics

LOGO 29 Climate Application Scheduling Scheduling Heuristics

LOGO 30 Generic Scheduling Heuristics Scheduling Heuristics  We propose generic scheduling heuristics for a class of applications consisting of independent identical chains of identical DAGs

LOGO 31 Generic Scheduling Heuristics  First approach  Create a composite DAG – link all entry nodes to a common entry node and all exit tasks to a common exit node  Apply mixed parallelism scheduling heuristics on the composite DAG CPA –reduced complexity (O(V(V+E)R)); –drawback of being a 2 step algorithm. Scheduling Heuristics

LOGO 32 Generic Scheduling Heuristics  Second approach  Exploit the knowledge on the specific structure of the application Exploit the pipelined structure of the application Separate the independent pre and post-processing tasks and schedule them with algorithms for independent malleable tasks (5/4 approximation in constant time) Scheduling Heuristics

LOGO 33 Generic Scheduling Heuristics Scheduling Heuristics

LOGO 34 Generic Scheduling Heuristics Scheduling Heuristics

LOGO 35 Generic Scheduling Heuristics  Heuristic 1  Schedule all pre-processing tasks at the beginning  Schedule inter and main processing tasks as interval on the same number of resources  Schedule all post-processing tasks at the end  Heuristic 2  Schedule all pre-processing tasks at the beginning  Schedule inter and main processing tasks separately as a pipeline  Schedule all post-processing tasks at the end Scheduling Heuristics

LOGO 36 Generic Scheduling Heuristics  Heuristic 3  Schedule inter and main processing tasks as an interval pipeline on the same number of resources  Schedule pre and post processing tasks simultaneously on resources specially reserved for them as well as resources unused by the pipeline  Schedule pre and post-processing tasks left at the beginning and end of pipeline respectively Scheduling Heuristics

LOGO 37 Generic Scheduling Heuristics  Heuristic 4  Schedule inter and main processing tasks separately as a pipeline  schedule pre and post processing tasks simultaneously with the pipeline on resources specially reserved for them as well as resources unused by the pipeline ;  schedule pre and post processing tasks left at the beginning and end of pipeline respectively; Scheduling Heuristics

LOGO 38 Contents Scheduling Heuristics 3 Introduction 1 Simulation Results 4 Related Works 2 Conclusions 5

LOGO 39 Simulation Results  Behavior of the 4 heuristics tested against CPA applied on the composite DAG  Tasks’ execution time modeled by Amdahl’s law:  Several configurations tested Simulation Results

LOGO 40 Simulation Results  Configuration 1  All tasks’ execution time on 1 processor identical (500)  All tasks’ coefficient α is identical (0.1) Simulation Results

LOGO 41 Simulation Results  Configuration 2  Same as before, with α interprocessing = 0.8 Simulation Results

LOGO 42 Simulation Results  Configuration 3  T1 pre-processing = T1 post-processing =50, T1 main-processing = T1 inter-processing =500  α= 0.1, α inter_processing =0.6 Simulation Results

LOGO 43 Simulation Results  Configuration 4  T1 pre-processing = T1 post-processing =50, T1 main-processing = T1 inter-processing =500  α= 0.1, α inter_processing =1.0 Simulation Results

LOGO 44 Contents Scheduling Heuristics 3 Introduction 1 Experimental Results 4 Related Works 2 Conclusions and Future Works 5

LOGO 45 Conclusions  We found a model for the given real application  We proposed a basic heuristic for this model and 3 improved versions  We proposed 4 pipeline- based heuristics for the generalized problem and compared them with the approach of applying a mixed-parallelism algorithm on the composite DAG of the application Conclusions and Future Works

LOGO 46 Future Works  Enhance the heuristics by taking into account a more precise communication model  Perform real experimentations on Grid’5000 in order to validate the theoretical results  Analyze other applications using a similar approach with the long term goal of deriving application dependent scheduling schemes that could finally be implemented as DIET plug-in schedulers Conclusions and Future Works