Presentation is loading. Please wait.

Presentation is loading. Please wait.

Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints Mikhail Smelyanskiy, Scott Mahlke, Edward Davidson Department of EECS University.

Similar presentations


Presentation on theme: "Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints Mikhail Smelyanskiy, Scott Mahlke, Edward Davidson Department of EECS University."— Presentation transcript:

1 Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints Mikhail Smelyanskiy, Scott Mahlke, Edward Davidson Department of EECS University of Michigan Hsien-Hsin (Sean) Lee School of ECE Georgia Institute of Technology

2 2 Motivation  Predication eliminates branch instructions but increases resource requirements  Predicate-aware scheduling oversubscribes resources reduces resource requirements reduces schedule length A br cond 0: A 1: p1,p2=pred_def(cond) 2: B if p1 3: C if p2 4: D B D C F T 0: A 1: p1,p2=pred_def(cond) 2: B if p1 C if p2 3: D

3 3 Potential for Disjoint Operations  Combining reduces dynamic operation count by 13%

4 4 Outline  Motivation  Resource Pressure Problem in Predicated Code  PRAVO: PRedicate-Aware VLIW Processor  Predicate-aware Scheduling  Performance Results  Conclusion and Future Work

5 5 Modulo Scheduling Example for(i=0; i < im_size; i++) { if (q_im[i] ≥ 1) res[i] = q_im[i] * bin_size – correction; else if (q_im[i] ≤ -1) res[i] = q_im[i] * bin_size + correction; else res[i] = bin_size + correction; } op1:t1 = load(i1, q_im) if T op2:p1,p2=pred_def (t1 ≥ 1) if T op3:t2 = multsub(t1, tbs, tcor) if p1 op4:store(i1, res, t2) if p1 op5:p3,p4 = pred_def (t1 ≤ -1) if p2 op6:t2 = multadd(t1, tbs, tcor) if p3 op7:store(i1, res, t2) if p3 op8:t2 = add(tbs, tcor) if p4 op9:store(i1++, res, t2) if p4 op10:if (i++ < im_size) goto op1 if T Source Code Predicated Code  Three control paths: P T, P FT, P FF

6 6 Traditional Modulo Schedule (Rau 94) TimeIteration i Iteration i + 1 0op1 1 2op2 3op5 4op3 op10 5op6op1 6op8 7op4op2 8op7op5 9op9op3 op10 10op6 11op8 12op4 13op7 14op9 Modulo Schedule Modulo Scheduled Loop Kernel ALUMEMBR I0op6op1 I1op8 I2op2op4 I3op5op7 I4op3op9op10 II=5

7 7 Two Predicate-Aware Modulo Schedules Modulo Scheduled Loop Kernel 1 ALUMEMBR op3 op6op1 op8op7 op2op9 op5op4op10 FW = 3II = 4 Modulo Scheduled Loop Kernel 2 ALUMEMBR op3 op6 op8op1 op5op4 op7 op2op9op10 FW = 4II = 3  Resource oversubscription can produce more efficient schedules (if colored operations can share entry)  Larger Fetch Width (FW) allows more oversubscription and faster schedule

8 8 Must-use ResourcesMay-use Baseline Architecture Model  Predicate Register File is only accessed in EXECUTE stage  Resources from FETCH to EXECUTE are unconditionally reserved FETCHDISPATCH DECODE REGISTER READ WRITE BACK Predicate Register File PRED READ & EXECUTE

9 9 PRED READ & DISPATCH DECODE Must-use Resources May-use Resources FETCH REGISTER READ WRITE BACK Predicate Register File (PRF) EXECUTE Predicate-aware Architecture (PRAVO)  PRF is accessed early in DISPATCH stage increases predicate defining operation latency

10 10 PRED READ & DISPATCH DECODE Must-use Resources May-use Resources FETCH REGISTER READ WRITE BACK EXECUTE Predicate-aware Architecture (PRAVO)  DECODE and DISPATCH are reversed Predicate Register File (PRF)

11 11  Predicate defining operation edge latency adjustment  ResMII computation  Predicate-Aware Reservation Table Three Main Changes to Conventional Scheduler Build DDG Cyclic Scheduler Acyclic Scheduler Compute ResMII / RecMII Reservation Tables 5 1 2 3 4

12 12 Data Dependence Graph Latency Adjustment TimeAM 0p1,p2= pred_def 1+ 1 if p1ld if p2 2+ 3 if p2 3+ 4 if p2 4+ 2 if p1 TimeAM 0p1,p2= pred_def 1 2+ 1 if p1ld if p2 3+ 3 if p2 + 2 if p1 4+ 4 if p2 TimeAM 0p1,p2= pred_def 1ld if p2 2+ 1 if p1 + 3 if p2 3+ 2 if p1 + 4 if p2 OriginalBrute forceSelective p1,p2=pred_def + 1 if p1 ld if p2 + 3 if p2 + 4 if p2 11 + 2 if p1 1 1 1 p1,p2=pred_def + 1 if p1 ld if p2 + 3 if p2 + 4 if p2 22 + 2 if p1 1 1 1 p1,p2=pred_def + 1 if p1 ld if p2 + 3 if p2 + 4 if p2 21 + 2 if p1 1 1 1

13 13 M may Computation of Resource-Constrained Lower Bound  Predicate-aware ResMII computation “first-fit” combining Fetch Width (FW) resource constraint FW must Original (ResMII=5)Predicate-Aware (ResMII=3) MA + 3 if p2 + 4 if p2 + 1 if p1 + 2 if p1 A may p1,p2= + 3 if p2 + 4 if p2 ld if p FW ld if p2 + 2 if p1 p,p=p,p= + 1 if p1 p1,p2=pred_def + 1 if p1 ld if p2 + 3 if p2 + 4 if p2 11 + 2 if p1 1 1 1

14 14 Reservation Table (similar to [Warter 92])  One operation per RT entry TimeRes 1…Res nRes n+1 0op1 1op2 … rop3 TimeRes1 may …Res n must Res n+1 must 0op1op2p1 | p2op1op2 1 … rop3TRUEop3  Multiple disjoint operations per RT entry  Check disjointness (using PQS [Johnson96])

15 15 Performance Results  Compare the performance of baseline and predicate-aware scheduling  Compiler Support Trimaran and ELCOR [Trimaran99]  Mediabench [Lee97] benchmark suite was evaluated  Processor Models (BA – base, PA – predicate-aware) Fetch WidthInt ALUcmpp latencyMemory BA424211 PA42421 / 2 / 31 BA646412 PA64641 / 2 / 32

16 16 Predicate-aware Speedup over Baseline (PA42 vs. BA42)  Speedup is only due to improvable PA regions  Speedup decreases for higher latency and wider machine average

17 17 Average Speedup Breakdown  Only 68% of regions are PA scheduled  PA is more effective in modulo scheduled loops

18 18 Summary and Future Work  Summary Predicate-aware Scheduling reduces resource constraints in predicated code is supported by PRAVO architecture is effective in cyclic regions (16% speedup on 4-wide PRAVO)  Future work More resource sharing can be achieved by combining probabalistically disjoint operations

19 Q&A and Suggestions

20 Backup Foils

21 21 Modulo Scheduling Using PART TimeA may M may B may must IW1 must IW2 must IW3 0 op1 P T | P FT | P FF op1 1 2 op2 P T | P FT | P FF op2 3 op10 P T | P FT | P FF op10 4 op5 P T | P FT | P FF op5 5 op3PTPT 6 7 op6 op8 P FT | P FF op6 op8 8 9 op4 op9 P T | P FF op4 op9 10 op7 P FT op7 11

22 22 Speedup Analysis PA Potential ▬ Base Sched. Length ▬ PA Sched. Length ▬ PA Critical Path Length ▬ PA Resource Bound Predicate-Aware Acyclic RegionPredicate-Aware Cyclic Region 0 Cycles 3 6 9 12 18 15 21 24 27 30 0 Cycles 3 6 9 12 18 15 21 24 27 30 4-wide cmpplat=2 Case 1 6-wide cmpplat=2 Case 3 6-wide cmpplat=2 Case 6 4-wide cmpplat=3 Case 2 Case 5 4-wide cmpplat=3 4-wide cmpplat=2 Case 4


Download ppt "Predicate-Aware Scheduling: A Technique for Reducing Resource Constraints Mikhail Smelyanskiy, Scott Mahlke, Edward Davidson Department of EECS University."

Similar presentations


Ads by Google