“Rate-Optimal” Resource-Constrained Software Pipelining 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Objectives Establish good bounds Help compiler writers Help architects Study pragmatic issues when implemented as an option of compilers Study its payoffs 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
A Motivating Example for i = 0 to n do 0: a [i] = X + d [i - 2]; 1: b [i] = a [i] * F + f [i - 2] + e [i - 2]; 2: c [i] = Y - b [i]; 3: d [i] = 2 * c [i]; 4: e [i] = X - b [i]; 5: f [i] = Y + b [i] end 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Assume all statements have a delay = 1 2 L: for ( i = 0; i < n; i ++) { 3 S : a [i] = X + d [i - 2]; S : b [i] = a [i] * F + f [i - 2] + e [i - 2]; 1 S : c [i] = Y - b [i]; 1 2 2 S : d [i] = 2 * c [i]; 3 2 2 S : e [i] = X - b [i]; 4 4 5 S : f [i] = Y + b [i] 5 } Data Dependence Graph Program Representation Assume all statements have a delay = 1 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Question How fast can L run (I.e.optimal computation rate) without resource constraints? How many FUs it needs minimally to achieve the optimal rate? 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
. Schedule A : # of FUs = 4 Schedule A: t (i, Sj) = 2i + tsj where: ts2 = ts4 = ts5 = 2 ts3 = 3 iter #1 #2 #3 . . . S 1 S 1 2 S , S , S S 2 4 5 3 S S 3 1 4 S , S , S S 2 4 5 5 S S 3 1 6 S , S , S 2 4 5 7 S . 3 Schedule A : # of FUs = 4 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Can we do better? 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
. Schedule B : # of FUs = 3 Schedule B : t (i, Sj) = 2i + tsj where: ts2 = ts4 = 2 ts3 = ts5 = 3 iter #1 #2 #3 . . . S 1 S 1 2 S , S S 2 4 3 S S S 3, 5 1 4 S , S S 2 4 5 S S S 3, 5 1 6 S , S 2 4 7 S S . 3, 5 Schedule B : # of FUs = 3 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Problem Statements Assumptions: homogeneous function units Problem 0: Given a loop L, determine a “rate-optimal” schedule for L which uses minimum # of FUs. Problem I (FIxed Rate SofTware Pipelining with minimum Resource - FIRST): Given a loop L and a fixed initiation rate determine a schedule for L which uses minimum # of FUs. Problem II (REsource Constrained SofTware Pipelining - REST) Given a loop L and the # of Fus, determine an optimal schedule which can run under the given resource constraints. 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Problem Formulation of FIRST How to formulate resource constraints into a linear form? 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
all dependence constraints R is the maximum width R = Max (width) Min R Subject to: all dependence constraints R is the maximum width for a fixed rate (or period) The SWP kernel: --- The “frustum” 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Min (max ari) . R Frustum A Now R = max ari r [ 0, II - 1] time r . 1 2 . . . N-1 3 II-1 Min (max ari) R Frustum A Contribution of node i to step r’s resource requirement Now R = max ari r [ 0, II - 1] 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
FIRST: A Linear Programming Formulation Example 0 1 2 3 4 5 a00 a01 a02 a03 a04 a05 a10 a11 a12 a13 a14 a15 So for node I: ti = ki II + (aoi , a1i ) * 1 Where ari = 1 means node i start at step r = 0 otherwise 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
FIRST: A Linear Programming Formulation Minimize R Subject to R: Max T T = T Periodic Dependence Constraints and are integers 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Example Revisited . . . . . Min R R - ( a + a + a + a + a + a ) ³ R - Subjected to: R - ( a + a + a + a + a + a ) ³ 00 01 02 03 04 05 R - ( a + a + a + a + a + a ) ³ 10 11 12 13 14 15 2 × k + × a + 1 × a = t 00 10 2 × k 1 + × a + 1 × a = t 1 01 11 . 2 × k + × a + 1 × a = t 5 05 15 5 a + a = 1 00 10 a + a = 1 01 11 . a + a = 1 05 15 t - t . 1 ³ 1 . t - t 1 4 ³ - 3 t - t 1 5 ³ - 3 . t ³ , k ³ , a ³ 1 i ri 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Solution so, # of FUs = 3! Schedule C t(i, s) = 2i + ts where t0 = 0 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Linear Programming A Linear Program is a problem that can be expressed in the following form: minimize cx subject to Ax = b x >= 0 where x: vector of variables to be solved A: matrix of known coefficients c, b: vectors of known coefficients cx: it is called the objective function Ax=b is called the constraints 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Linear Programming “programming” actually means “planning” here. Importance of LP many applications the existence of good general-purpose techniques for finding optimal solutions 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Integer Linear Programming (ILP) Integer programming have proved valuable for modeling many and diverse types of problems in : planning, routing, scheduling, assignment, and design. Industry applications: transportation, energy, telecommunications, and manufacturing 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Integer Linear Programming Classification: mixed integer (part of variables) pure integer (all variables) zero-one (only takes 0 or 1) Much harder to solve Many existing tools and product solve the LP/ILP problem and get optimal solution help to model the problem, and then solve the problem 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
How to handle cases where di 1 for node i Quiz How to handle cases where di 1 for node i ? >= 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
a((r - 1)%II)i The “Trick” At time step r, the # of FUs required Contributed by node i {Started at steps in [0, (t + di-1)%II]} a((r - 1)%II)i Note: if actor i requires a FU at time r or 0 otherwise. So objective function becomes: Max Min 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
How to extend the FIRST formulation to heterogeneous function units? 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Heterogeneous Function Units Solution Hints: “weighted Sum”! For FU of type k: Mk = max " r Î [ , II - 1 ] and so, the objective should be min Ck Mk Where h is the total # of FU types Mk - 0 FU(i) = k " k Î [ , h - 1 ], " r Î [ , II - 1 ] 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Resource - constraint rate-optimal software pipelining problem Solution of the REST Resource - constraint rate-optimal software pipelining problem 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Hints Based on scheme for “FIRST” and play “trial-and improve” 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Solution Space for REST 1 2 MII = max { RecMII, ResMII} 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
MII = max {RecMII, ResMII } Note: The bounds of initiation interval 1. Loop-carried dependence constraints: 2. Resource constraints As a result: RecMII = Max cycles C d(C) m(C) ResMII = # of nodes # of FUs MII = max {RecMII, ResMII } 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt
Other Work Extend the technique to “more benchmarks” and establish “bounds” Extend the framework to include register constraints Extend the model for pipelined architectures (with structural “hazards”) Extend the model to consider superscalar and multithreaded with dynamic scheduling support. 5/26/2019 \course\cpeg421-08s\Topic-7a.ppt