An Efficient Algorithm for Scheduling Instructions with Deadline Constraints on ILP Machines Wu Hui Joxan Jaffar School of Computing National University.

Slides:



Advertisements
Similar presentations
Fast optimal instruction scheduling for single-issue processors with arbitrary latencies Peter van Beek, University of Waterloo Kent Wilken, University.
Advertisements

On allocations that maximize fairness Uriel Feige Microsoft Research and Weizmann Institute.
On the Complexity of Scheduling
Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
ECE 667 Synthesis and Verification of Digital Circuits
GRAPH BALANCING. Scheduling on Unrelated Machines J1 J2 J3 J4 J5 M1 M2 M3.
Greed is good. (Some of the time)
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 14: March 3, 2004 Scheduling Heuristics and Approximation.
Optimal Instruction Scheduling for Multi-Issue Processors using Constraint Programming Abid M. Malik and Peter van Beek David R. Cheriton School of Computer.
Mehdi Kargahi School of ECE University of Tehran
Online Scheduling with Known Arrival Times Nicholas G Hall (Ohio State University) Marc E Posner (Ohio State University) Chris N Potts (University of Southampton)
Greedy Algorithms Basic idea Connection to dynamic programming
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Greedy Algorithms Basic idea Connection to dynamic programming Proof Techniques.
The number of edge-disjoint transitive triples in a tournament.
Approximating Maximum Edge Coloring in Multigraphs
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
1 Pseudo-polynomial time algorithm (The concept and the terminology are important) Partition Problem: Input: Finite set A=(a 1, a 2, …, a n } and a size.
ECE Synthesis & Verification - Lecture 2 1 ECE 697B (667) Spring 2006 ECE 697B (667) Spring 2006 Synthesis and Verification of Digital Circuits Scheduling.
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
UET Multiprocessor Scheduling Problems Nan Zang
ECE Synthesis & Verification - LP Scheduling 1 ECE 667 ECE 667 Synthesis and Verification of Digital Circuits Scheduling Algorithms Analytical approach.
1 Introduction to Approximation Algorithms Lecture 15: Mar 5.
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Lecture 11. Matching A set of edges which do not share a vertex is a matching. Application: Wireless Networks may consist of nodes with single radios,
1 IOE/MFG 543 Chapter 7: Job shops Sections 7.1 and 7.2 (skip section 7.3)
Priority Models Sashka Davis University of California, San Diego June 1, 2003.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
The Theory of NP-Completeness 1. Nondeterministic algorithms A nondeterminstic algorithm consists of phase 1: guessing phase 2: checking If the checking.
Design Techniques for Approximation Algorithms and Approximation Classes.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 01, 2005 Session 14.
CALTECH CS137 Winter DeHon CS137: Electronic Design Automation Day 12: February 13, 2002 Scheduling Heuristics and Approximation.
Batch Scheduling of Conflicting Jobs Hadas Shachnai The Technion Based on joint papers with L. Epstein, M. M. Halldórsson and A. Levin.
Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 10: February 18, 2015 Architecture Synthesis (Provisioning, Allocation)
1 Short Term Scheduling. 2  Planning horizon is short  Multiple unique jobs (tasks) with varying processing times and due dates  Multiple unique jobs.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
Outline Introduction Minimizing the makespan Minimizing total flowtime
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 March 03, 2005 Session 15.
Computer Science & Engineering, ASU1/17 Pfair Scheduling of Periodic Tasks with Allocation Constraints on Multiple Processors Deming Liu and Yann-Hang.
CS223 Advanced Data Structures and Algorithms 1 Maximum Flow Neil Tang 3/30/2010.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Common Approaches to Real-Time Scheduling Clock-driven (time-driven) schedulers Priority-driven schedulers Examples of priority driven schedulers Effective.
Rounding scheme if r * j  1 then r j := 1  When the number of processors assigned in the continuous solution is between 0 and 1 for each task, the speed.
Introduction to Real-Time Systems
1 Online Scheduling With Precedence Constraints Yumei Huo Department of Computer Science College.
Mok & friends. Resource partition for real- time systems (RTAS 2001)
Introduction to Multiple-multicast Routing Chu-Fu Wang.
PRIMAL-DUAL APPROXIMATION ALGORITHMS FOR METRIC FACILITY LOCATION AND K-MEDIAN PROBLEMS K. Jain V. Vazirani Journal of the ACM, 2001.
Progress Report 07/30. Virtual Core Scheduling Problem For every time period, the hypervisor scheduler is given a set of virtual cores with their operating.
Approximation Algorithms based on linear programming.
::Network Optimization:: Minimum Spanning Trees and Clustering Taufik Djatna, Dr.Eng. 1.
Single Machine Scheduling Problem Lesson 5. Maximum Lateness and Related Criteria Problem 1|r j |L max is NP-hard.
Scheduling with Constraint Programming
The Theory of NP-Completeness
Scheduling Determines the precise start time of each task.
Some Topics in OR.
Chapter 8 Local Ratio II. More Example
Chapter 4: Using NP-Completeness to Analyze Subproblems
Chapter 6: CPU Scheduling
Integer Programming (정수계획법)
Lecture 11 Overview Self-Reducibility.
Integrated Systems Centre © Giovanni De Micheli – All rights reserved
Integer Programming (정수계획법)
NP-Completeness Reference: Computers and Intractability: A Guide to the Theory of NP-Completeness by Garey and Johnson, W.H. Freeman and Company, 1979.
The Theory of NP-Completeness
Maximum Flow Neil Tang 4/8/2008
Lecture 24 Vertex Cover and Hamiltonian Cycle
Presentation transcript:

An Efficient Algorithm for Scheduling Instructions with Deadline Constraints on ILP Machines Wu Hui Joxan Jaffar School of Computing National University of Singapore

2 What is an ILP machine? Multiple functional units of different types. Issue an instruction every machine cycle on each functional unit. Multiple instructions executed in parallel. Latencies exist between instructions. Two categories: Superscalar and VLIW (Very Long Instruction Word). Typical Example: Intel Itanium processor ( ex.htm)

3 What is the problem? Given a problem instance P: a set of n UET instructions in a basic block with the following constraints: precedence-latency constraints: DAG G = (V, E, W), where each latency l ij  -1, deadline constraints: individual pre-assigned deadlines, and m functional units with p different types, compute a feasible schedule which satisfies all constraints whenever one exists, or a valid schedule with minimum lateness if no feasible schedule exists.

4 v 1 [4] v 2 [4] v 4 [5] v 5 [5] Example 1. A problem instance P with two functional units of different types. 0 1 v 3 [4] v 6 [5] v 11 [6] v 12 [6] v 9 [6] v 7 [5] v 8 [6] v 10 [6] Table 1. A feasible schedule for P. FU1 FU2

5 What does our algorithm achieve? Our scheduling algorithm computes a feasible schedule whenever one exists for any problem instance of the following special cases. 1) Arbitrary DAG, latencies of 0 and two functional units of different types. 2) Monotone interval graph, latencies  -1 and multiple functional units of different types. 3) In-forest, equal latencies and multiple functional units of different types.

6 In the case that there is no feasible schedule, our algorithm computes a schedule with minimum lateness for all the above special cases. Furthermore, by setting all deadlines to a constant, our algorithm will compute a schedule with minimum completion time for any instance of the above special cases and any instance of the special case of out-forest, equal latencies and multiple functional units of different types.

7 An in-tree. An out-tree A monotone interval graph. v1v1 v3v3 v2v2 v4v4 v5v5 v6v6 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v3v3 v1v1 v2v2 v4v4 v5v5 v6v6 v7v7 3

8 What is the Time Complexity ? Given the transitive closure of the precedence graph, O(ne+nd) for the general model, where d is the maximum latency. O(min{ne, de}+nd) if no latency of -1 exists. O(n 2 ) if for each instruction the latencies between it and all its immediate successors are equal. Transitive closure can be computed in O(min(ne, n )) time.

9 What has been done in the past? Palem and Simon’s algorithm on identical processors [ACM TOPLAS, 1993]. Wu, Joxan and Yap’s algorithm on identical processors [PACT 2000]. Berstein, Rodeh and Gertner’s work on two processors of different types [IEEE TOC, 1989].

10 What are the contributions of our work? Propose an efficient polynomial algorithm which solves several special cases for each of which no polynomial algorithm was known before. Present the first approximation ratio, i.e. for any greedy algorithm, the length of any schedule computed never exceeds p+1, where p is the number of types of functional units.

11 What are the main ideas of our algorithm? Compute the l max (v i )-successor-tree-consistent deadline for each instruction v i, where l max (v i ) is the maximum latency between v i and all its immediate successors. Compute a schedule by using list scheduling, where the priority of each instruction is its successor-tree-consistent deadline and a smaller number implies higher priority.

12 What is the l max (v i )-successor-tree- consistent deadline? For each sink instruction, its l max (v i )-successor- tree-consistent deadline d´ i is equal to its pre- assigned deadline. For a non-sink instruction v i, d´ i is the upper bound on its latest completion time in any feasible schedule for the relaxed problem instance P(i).

13 What is P(i)? P(i) consists of a set V(i)={v i }  Succ(v i ) of instructions with following new constraints. Precedence-latency constraints: The l max ( v i )-successor- tree of v i. Deadline constraints: The deadline of each instruction v j in Succ(v i ) is its l max (v j )-successor-tree-consistent deadline and the deadline of v i is its pre-assigned deadline.

14 What is the k-successor-tree of v i ? Given a weighted graph G=(V, E, W), an integer k and v i  V, the k-successor-tree of v i is a subgraph G= (V, E, W), where V ={v i }  {v j : v j  Succ(v i )}, E={(v i, v j ): v j  Succ(v i )} and each edge weight l´ ij in W is defined as follows. 1) In the case that k= -1, if l + ij = -1, then l´ ij = -1; otherwise l´ ij = 0. 2) In the case that k  -1, if l + ij < k, then l´ ij = l + ij ; otherwise, l´ ij = k.

15 v1v1 v2v2 v3v3 v4v4 v5v5 v6v6 v7v7 v8v Figure 1: The precedence-latency constraints. v3v3 v6v6 v4v4 v7v7 v5v5 v8v Figure 2: The 4-successor tree of v 2. v2v2

16 How to compute l max (v i )- successor-tree-consistent deadline for v i ? Key idea: Backward Scheduling At any time t, among all ready instructions, an instruction v k with the largest latency in P(i) is chosen and scheduled as late as possible on a functional unit of the same type. In case of ties, among all instructions with the same latency, an instruction with the latest deadline is chosen. A schedule computed by backward scheduling is called a backward schedule.

17 v 2 [5] v 3 [6] v 4 [5] v 5 [3] v 6 [4] v 7 [3] v 1 [2] Example 2: A relaxed problem instance P(1). Table 2. A backward schedule for P(1). FU2 FU1

18 Scheduling Algorithm repeat choose an instruction v i satisfying that 1) its l max (v i )-successor-tree- consistent deadline d´ i has not been computed; and 2) either v i is a sink or the successor-tree-consistent deadlines of all its successors have been computed; if v i is a sink then d´ i = d i ; else { if v i has only one immediate successor v j and l ij  -1 then d´ i = min{d i, d j - l ij - 1}; else { compute a backward schedule  b for P(i); d´ i = min{d i, min{  b (v j ) - l ij : v j  Succ(v i ) }}; } } until the successor-tree-consistent deadlines of all instructions have been computed; use list scheduling to compute a schedule for P, where the priority of each instruction v i is d´ i and a smaller number implies higher priority;

19 Example 1. A problem instance P with two functional units of different types. V 5 [5]V 6 [5] V 8 [6]V 9 [6] V 11 [6] Figure 4: The relaxed problem P(1) V 4 [4] V 10 [6] V 1 [4] 01 v 4 [5, 4] v 5 [5, 5] 0 1 v 6 [5, 5] v 11 [6, 6] v 12 [6, 6] v 9 [6, 6] v 7 [5, 5] v 10 [6, 6] v 2 [4] v 3 [4] v 1 [4, ?] v 8 [6, 6] FU2 FU1

20 Since min{  b (v j ) - l 1j : v j  Succ(v 1 )}= 2, the l max (v 1 )- successor-tree-consistent deadline of v 1 is min{d 1, 2}= min{4, 2}= 2. Table 3: A backward schedule  b for Succ(v 1 ).

21 v 4 [5, 4] v 5 [5, 5] Example 1. A problem instance P with two functional units of different types. 0 1 v 6 [5, 5] v 11 [6, 6] v 12 [6, 6] v 9 [6, 6] v 7 [5, 5] v 10 [6, 6] v 2 [4, 3] v 3 [4, 3] v 1 [4, 2] v 8 [6, 6] Table 3. A feasible schedule computed by list scheduling. FU1 FU2

22 Conclusion K-successor-tree-consistency: A general technique for instruction scheduling problem. Approximating precedence-latency constraints by using priorities which are k-successor-tree consistent. Successfully used to solve several open instruction scheduling problems such as two processor scheduling with equal execution times and release time-deadline constraints. Open Problem: What is the tight worst-case approximation ratio of our algorithm (Conjecture: L ours / L opt = 4/3)?