Download presentation
Presentation is loading. Please wait.
Published byOmar Galford Modified over 9 years ago
1
Anshul Kumar, CSE IITD CS718 : VLIW - Software Driven ILP Introduction 23rd Mar, 2006
2
Anshul Kumar, CSE IITD slide 2 OutlineOutline Pipeline scheduling and loop unrolling Branch prediction with static scheduling Basic VLIW approach Detecting and enhancing loop level parallelism Software pipelining Global scheduling Hardware support Real examples
3
Anshul Kumar, CSE IITD slide 3 Approaches for multi-issue processors NameIssue Structure Hazard detection SchedulingFeaturesExamples Superscalar (static) DynamicHardwareStaticIn-order execution Ultrasparc Superscalar (dynamic) DynamicHardwareDynamicO-O-O Execution Power 2 Superscalar (speculative) DynamicHardwareDynamic + Speculation O-O-O with Speculation P4, HPPA, alpha,R10K VLIWStaticSoftwareStaticNo hazardsTrimedia, i860 EPICMostly Static Mostly Software Mostly Static Dependences marked Itanium
4
Anshul Kumar, CSE IITD slide 4 Pipeline scheduling example for (i=1000; i>0; i--) x[i] = x[i] + s; Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8 BNE R1, R2, Loop
5
Anshul Kumar, CSE IITD slide 5 Latency due to data hazards Producer instruction Consumer instruction Latency FP ALU op 3 Store double2 Load doubleFP ALU op1 Load doubleStore double0 Assume no structural hazards
6
Anshul Kumar, CSE IITD slide 6 Straight forward scheduling Loop: L.D F0, 0(R1) 1 stall 2 ADD.D F4, F0, F2 3 stall 4 stall 5 S.D F4, 0(R1) 6 DADDUI R1, R1, #-8 7 stall 8 BNE R1, R2, Loop 9 stall 10
7
Anshul Kumar, CSE IITD slide 7 A better schedule Loop: L.D F0, 0(R1) 1 DADDUI R1, R1, #-8 2 ADD.D F4, F0, F2 3 stall 4 BNE R1, R2, Loop 5 S.D F4, 0(R1) 6
8
Anshul Kumar, CSE IITD slide 8 A better schedule Loop: L.D F0, 0(R1) 1 DADDUI R1, R1, #-8 2 ADD.D F4, F0, F2 3 stall 4 BNE R1, R2, Loop 5 S.D F4, 8(R1) 6
9
Anshul Kumar, CSE IITD slide 9 Loop unrolling Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) 6 L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1) 12 L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1) 18 L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) 24 DADDUI R1, R1, #-32 BNE R1, R2, Loop 28 28/4=7
10
Anshul Kumar, CSE IITD slide 10 Removing false dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) 6 L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1) 12 L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1) 18 L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) 24 DADDUI R1, R1, #-32 BNE R1, R2, Loop 28 28/4=7
11
Anshul Kumar, CSE IITD slide 11 Re-schedulingRe-scheduling Loop: L.D F0, 0(R1) L.D F6, -8(R1) L.D F10, -16(R1) L.D F14, -24(R1) 4 ADD.D F4, F0, F2 ADD.D F8, F6, F2 ADD.D F12, F10, F2 ADD.D F16, F14, F2 8 S.D F4, 0(R1) S.D F8, -8(R1) 10 DADDUI R1, R1, #-32 S.D F12, -16(R1) 12 BNE R1, R2, Loop S.D F16, -24(R1) 14 14/4=3.5
12
Anshul Kumar, CSE IITD slide 12 Decisions and transformations Can S.D move after DADDUI and BNE ? Adjust S.D offset. Are loop iterations independent? Do register renaming. Remove extra loop termination tests, adjust the code. Analyze addresses. Can loads/stores be reordered? Schedule the code, preserving dependences.
13
Anshul Kumar, CSE IITD slide 13 Dependences in unrolled loop Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8; drop BNE L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1) DADDUI R1, R1, #-8 BNE R1, R2, Loop
14
Anshul Kumar, CSE IITD slide 14 Remove extra DADDUI Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1); drop DADDUI and BNE L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1); drop DADDUI and BNE L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop offsets in loads/stores adjusted
15
Anshul Kumar, CSE IITD slide 15 False dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F0, -8(R1) ADD.D F4, F0, F2 S.D F4, -8(R1); drop DADDUI and BNE L.D F0, -16(R1) ADD.D F4, F0, F2 S.D F4, -16(R1); drop DADDUI and BNE L.D F0, -24(R1) ADD.D F4, F0, F2 S.D F4, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop
16
Anshul Kumar, CSE IITD slide 16 Removing false dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1); drop DADDUI and BNE L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1); drop DADDUI and BNE L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop
17
Anshul Kumar, CSE IITD slide 17 True dependences Loop: L.D F0, 0(R1) ADD.D F4, F0, F2 S.D F4, 0(R1); drop DADDUI and BNE L.D F6, -8(R1) ADD.D F8, F6, F2 S.D F8, -8(R1); drop DADDUI and BNE L.D F10, -16(R1) ADD.D F12, F10, F2 S.D F12, -16(R1); drop DADDUI and BNE L.D F14, -24(R1) ADD.D F16, F14, F2 S.D F16, -24(R1) DADDUI R1, R1, #-32 BNE R1, R2, Loop
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.