Of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam.

Slides:

Advertisements

Similar presentations

Porosity Aware Buffered Steiner Tree Construction C. Alpert G. Gandham S. Quay IBM Corp M. Hrkic Univ Illinois Chicago J. Hu Texas A&M Univ.

Advertisements

Energy-efficient Task Scheduling in Heterogeneous Environment 2013/10/25.

Load Balancing Parallel Applications on Heterogeneous Platforms.

Single Source Shortest Paths

CALTECH CS137 Fall DeHon 1 CS137: Electronic Design Automation Day 19: November 21, 2005 Scheduling Introduction.

Hadi Goudarzi and Massoud Pedram

1 EE5900 Advanced Embedded System For Smart Infrastructure RMS and EDF Scheduling.

Scheduling in Distributed Systems Gurmeet Singh CS 599 Lecture.

CPE555A: Real-Time Embedded Systems

Explicit Preemption Placement for Real- Time Conditional Code via Graph Grammars and Dynamic Programming Bo Peng, Nathan Fisher, and Marko Bertogna Department.

Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)

Constructing Minimal Spanning Steiner Trees with Bounded Path Length Presenter : Cheng-Yin Wu, NTUGIEE Some of the Slides in this Presentation are Referenced.

Martha Garcia.  Goals of Static Process Scheduling  Types of Static Process Scheduling  Future Research  References.

Distributed Process Scheduling Summery Distributed Process Scheduling Summery BY:-Yonatan Negash.

Towards Feasibility Region Calculus: An End-to-end Schedulability Analysis of Real- Time Multistage Execution William Hawkins and Tarek Abdelzaher Presented.

GridFlow: Workflow Management for Grid Computing Kavita Shinde.

Chapter 9: Graphs Summary Mark Allen Weiss: Data Structures and Algorithm Analysis in Java Lydia Sinapova, Simpson College.

Shortest Paths Definitions Single Source Algorithms –Bellman Ford –DAG shortest path algorithm –Dijkstra All Pairs Algorithms –Using Single Source Algorithms.

Fast Matching Algorithms for Repetitive Optimization Sanjay Shakkottai, UT Austin Joint work with Supratim Deb (Bell Labs) and Devavrat Shah (MIT)

Preemptive Behavior Analysis and Improvement of Priority Scheduling Algorithms Xiaoying Wang Northeastern University China.

Process Scheduling for Performance Estimation and Synthesis of Hardware/Software Systems Slide 1 Process Scheduling for Performance Estimation and Synthesis.

Ziliang Zong, Adam Manzanares, and Xiao Qin Department of Computer Science and Software Engineering Auburn University Energy Efficient Scheduling for High-Performance.

Shortest Paths Definitions Single Source Algorithms

1 Brief Announcement: Distributed Broadcasting and Mapping Protocols in Directed Anonymous Networks Michael Langberg: Open University of Israel Moshe Schwartz:

Present by Chen, Ting-Wei Adaptive Task Checkpointing and Replication: Toward Efficient Fault-Tolerant Grids Maria Chtepen, Filip H.A. Claeys, Bart Dhoedt,

ON-LINE SCHEDULING AND W.I.P. REGULATION Jean-Marie PROTH.

External Memory Algorithms Kamesh Munagala. External Memory Model Aggrawal and Vitter, 1988.

1 of 16 June 21, 2000 Schedulability Analysis for Systems with Data and Control Dependencies Paul Pop, Petru Eles, Zebo Peng Department of Computer and.

Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.

Lecture 8: Dispatch Rules

VOLTAGE SCHEDULING HEURISTIC for REAL-TIME TASK GRAPHS D. Roychowdhury, I. Koren, C. M. Krishna University of Massachusetts, Amherst Y.-H. Lee Arizona.

Lecture 3 – Parallel Performance Theory - 1 Parallel Performance Theory - 1 Parallel Computing CIS 410/510 Department of Computer and Information Science.

A Fast Algorithm for Enumerating Bipartite Perfect Matchings Takeaki Uno (National Institute of Informatics, JAPAN)

Scheduling of Parallel Jobs In a Heterogeneous Multi-Site Environment By Gerald Sabin from Ohio State Reviewed by Shengchao Yu 02/2005.

1 Scheduling CEG 4131 Computer Architecture III Miodrag Bolic Slides developed by Dr. Hesham El-Rewini Copyright Hesham El-Rewini.

ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,

Static Process Schedule Csc8320 Chapter 5.2 Yunmei Lu

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.

Scheduling policies for real- time embedded systems.

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013,

Fair Queueing. 2 First-Come-First Served (FIFO) Packets are transmitted in the order of their arrival Advantage: –Very simple to implement Disadvantage:

VIPIN VIJAYAN 11/11/03 A Performance Analysis of Two Distributed Computing Abstractions.

Static Process Scheduling Section 5.2 CSc 8320 Alex De Ruiter

Outline Introduction Minimizing the makespan Minimizing total flowtime

Contention-aware scheduling with task duplication J. Parallel Distrib. Comput. (2011) Oliver Sinnen ∗, Andrea To, Manpreet Kaur Tai, Yu-Chang 11/23/2012.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 03, 2005 Session 15.

Priority Queues Two kinds of priority queues: Min priority queue. Max priority queue. Nov 4,

1 Real-Time Scheduling. 2Today Operating System task scheduling –Traditional (non-real-time) scheduling –Real-time scheduling.

Static Process Scheduling

CDP Tutorial 3 Basics of Parallel Algorithm Design uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison.

Fault tolerance and related issues in distributed computing Shmuel Zaks GSSI - Feb

A stochastic scheduling algorithm for precedence constrained tasks on Grid Future Generation Computer Systems (2011) Xiaoyong Tang, Kenli Li, Guiping Liao,

Uses some of the slides for chapters 3 and 5 accompanying “Introduction to Parallel Computing”, Addison Wesley, 2003.

Genetic algorithms for task scheduling problem J. Parallel Distrib. Comput. (2010) Fatma A. Omara, Mona M. Arafa 2016/3/111 Shang-Chi Wu.

Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.

CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 32 – Multimedia OS Klara Nahrstedt Spring 2010.

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dr. Xiao Qin Auburn University

COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University

Scheduling Parallel DAG Jobs to Minimize the Average Flow Time K. Agrawal, J. Li, K. Lu, B. Moseley.

Pradeep Konduri Static Process Scheduling:  Proceedance process model  Communication system model  Application  Dicussion.

Scheduling Algorithms Performance Evaluation in Grid Environments R, Zhang, C. Koelbel, K. Kennedy.

Robust Task Scheduling in Non-deterministic Heterogeneous Computing Systems Zhiao Shi Asim YarKhan, Jack Dongarra Followed by GridSolve, FT-MPI, Open MPI.

Topological Sort In this topic, we will discuss: Motivations

Parallel Programming By J. H. Wang May 2, 2017.

A Dynamic Critical Path Algorithm for Scheduling Scientific Workflow Applications on Global Grids e-Science IEEE 2007 Report: Wei-Cheng Lee

ADT Heap data structure

ITEC 2620M Introduction to Data Structures

Hassan Khosravi / Geoffrey Tien

The End Of The Line For Static Cyclic Scheduling?

A Variation of Minimum Latency Problem on Path, Tree and DAG

Presentation transcript:

of 21 1 Low-Cost Task Scheduling for Distributed-Memory Machines Andrei Radulescu and Arjan J.C. Van Gemund Presented by Bahadır Kaan Özütam

of 21 2 Outline u Introduction u List Scheduling u Preliminaries u General Framework for LSSP u Complexity Analysis u Case Study u Extensions for LSDP u Conclusion

of 21 3 Introduction n Task Scheduling u Scheduling heuristics u Shared-memory - Distributed Memory u Bounded - unbounded number of processors u Multistep - singlestep methods u Duplicating - nonduplicating methods u Static - dynamic priorities

of 21 4 List Scheduling n LDSP and LSSP algorithms n LSSP (List Scheduling with Static Priorities); u Tasks are scheduled in the order of their previously computed priorities on the task’s “best” processor. u Best processor is... F The processor enabling the earliest start time, if the performance is the main concern F The processor becoming idle the earliest, if the speed is the main concern. n LSDP (List Scheduling with Dynamic Priorities); u Priorities for task-processor pairs u more complex

of 21 5 List Scheduling n Reducing LSSP time complexity u O(V log(V) + (E+V)P) => O(V log (P) + E) V = expected number of tasks E = expected number of dependencies P = number of processors 1. Considering only two processors 2. Maintaining partially-sorted task priority queue with a limited number of tasks

of 21 6 Preliminaries n Parallel programs u (DAG) G = (V,E) u Computation cost T w (t) u Communication cost T c (t, t’) u Communication and computation ratio (CCR) u The task graph width (W) EE EE E E E E E V VVV VVV V E

of 21 7 Preliminaries n Entry and exit tasks n The bottom level (T b ) of the task n Ready = parents scheduled n Start time T s (t) n Finish time T f (t) n Partial schedule n Processor ready time  T r (p) = max T f (t), t  V, P r (t)=p. n Processor becoming idle the earliest (p r )  T r (p r ) = min T r (p), p  P

of 21 8 Preliminaries n The last message arrival time  T m (t) = max { T f (t’) + T c (t’, t) } (t’, t)  E n The enabling processor p e (t); from which last message arrives n Effective message arrival time  T e (t,p) = max { T f (t’) + T c (t’, t) } (t’, t)  E, p t (t’) <> p n The start time of a ready task, once scheduled  T s (t, p) = max { T e (t, p), T r (p) }

of 21 9 General Framework for LSSP n General LSSP algorithm u Task’s priority computation, F O(E + V) u Task selection, F O(V log W) u Processor selection F O( (E + V) P)

of 2110 General Framework for LSSP n Processor Selection u selecting a processor 1. The enabling processor 2. Processor becoming idle first  T s (t) = max { T e (t, p), T r ( p ) }

of 2111 General Framework for LSSP n Lemma 1.  p <> p e (t) : T e (t, p) = T m (t) n Theorem 1. t is a ready task, one of the processors p  {p e (t), p r } satisfies  T s (t, p) = min T s (t, p x ), p x  P n O( (E + V) P )  O (V log (P) + E ) u O (E + V) to traverse the task graph u O (V log P) to maintain the processors sorted

of 2112 General Framework for LSSP n Task Selection u O (V log W) can be reduced by sorting only some of the tasks. u Task priority queue 1. A sorted list of size H 2. A FIFO list ( O ( 1 ) ) u decreases to O(V log H) F H needs to be adjusted F H = P is optimal ( O ( V log P ) )

of 2113 Complexity Analysis n Computing task priorities O ( E + V ) n Task selection O ( V log W )  O ( V log H ) for partially sorted priority queue  O ( V log (P) ) for queue of size P n Processor Selection O (E + V)  O (V log P) n Total complexity  O ( V ( log (W) + log (P) ) + E) fully sorted  O ( V ( log (P) + E ) partially sorted

of 2114 Case Study n MCP (Modified Critical Path) u The task having the highest bottom level has the highest priority n FCP (Fast Critical Path) n 3 Processors n Partially sorted priority queue of size 2 n 7 tasks t 0 / 2 t 1 / 2t 2 / 2t 3 / 2 t 6 / 2t 5 / 3t 4 / 3 t 7 / 2 2

of 2115 Case Study t 0 / 2 t 1 / 2t 2 / 2t 3 / 2 t 6 / 2t 5 / 3t 4 / 3 t 7 / 2 2

of 2116 Extensions for LSDP n Extend the approach to dynamic priorities  ETF : ready task starts the earliest  ERT : ready task finishes the earliest  DLS : task-processor having highest dynamic level u General formula   (t, p) =  ( t ) + max { T e (T, p), T r (p) } F  ETF ( t ) = 0 F  ERT ( t ) = T w ( t ) F  DLS ( t ) = - T b (t)

of 2117 Extensions for LSDP n EP case u on each processor, the tasks are sorted u the processors are sorted n non-EP case u the processor becoming idle first u if this is EP, it falls to the EP case

of 2118 Extensions for LSDP n 3 tries; u 1 for EP case, 1 for non-EP case n Task priority queues maintained; u P for EP case, 2 for non-EP case n Each task is added to 3 queues; u 1 for EP case, 2 for non-EP case n Processor queues; u 1 for EP case, 1 for non-EP case

of 2119 Complexity n Originally O ( W ( E + V ) P ) now O ( V (log (W) + log (P) ) + E ) can be further reduced using partially sorted priority queue. A size of P is required to maintain comparable performance O ( V log (P) + E )

of 2120 Conclusion n LSSP can be performed at a significantly lower cost... u Processor selection between only two processors; enabling processor or processor becoming idle first u Task selection, only a limited number of tasks are sorted n Using the extension of this method, LSDP complexity also can be reduced n For large program and processor dimensions, superior cost-performance trade-off.

of 2121 Thank You Questions?