Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pipelining – Loop unrolling and Multiple Issue

Similar presentations


Presentation on theme: "Pipelining – Loop unrolling and Multiple Issue"— Presentation transcript:

1 Pipelining – Loop unrolling and Multiple Issue
CS/COE 1541 (term 2174) Jarrett Billingsley

2 Class Announcements I'll make your study guide tonight!
Honest! I swear! ...I say, every single time... It'll explain the format, topics, and have some practice questions. I will have your homework and quizzes graded by Monday. I will probably send the grades for them out earlier. I can send out the quiz solutions too so you can study. Project 1 comes after the exam. Probably have a month to do it. 1/25/2017 CS/COE 1541 term 2174

3 But first... Finishing branch prediction
1/25/2017 CS/COE 1541 term 2174

4 Loop unrolling 1/25/2017 CS/COE 1541 term 2174

5 Reducing branch frequency
The fastest code is the code that never runs. One way to make branches faster is to... not branch as much. Loop unrolling is a compiler technique to reduce the number of branches. It does this by duplicating the loop body, reducing the number of iterations needed. for(i = 0; i < 100; i++) a[i] = b[i] + c[i]; Original loop for(i = 0; i < 100; i += 2){ a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; } Unrolled loop (2X) 1/25/2017 CS/COE 1541 term 2174

6 How far to unroll? The previous example doubled the code in the loop. Of course we can unroll 3X, 4X, 8X... what are the tradeoffs? Space vs. time is the big one. But memory today is big, network connections are fast... is this so much of a problem? Well...... Caching is the big bottleneck these days. The bigger the code is, the less of it will fit in the cache. This is bad, as we'll see. 1/25/2017 CS/COE 1541 term 2174

7 Multiple Issue (Superscalar) CPUs
1/25/2017 CS/COE 1541 term 2174

8 From CPI to IPC So far, the optimum CPI has been 1. One cycle to complete each instruction. But what if we could go below 1? (wat? half a cycle per instruction? well, no.) Instead of fetching just ONE instruction each cycle... Fetch two! Now we measure performance in IPC: Instructions per Cycle. 1/25/2017 CS/COE 1541 term 2174

9 Down the wrong pipe A common arrangement is to have multiple asymmetric pipelines in the CPU: for example, one to do ALU/Branches and one to do loads and stores. ALU Pipe I-Mem Ins. Decoder Register File D-Mem ALU Memory Pipe 1/25/2017 CS/COE 1541 term 2174

10 Keeping the pipelines full
It's now up to the compiler (once again!) to schedule instructions in such a way that the pipelines are well-utilized. lw $t0, 0($s1) lw $t1, -4($s1) addi $s1, $s1, -8 add $t0, $t0, $s2 add $t1, $t1, $s2 sw $t0, 8($s1) sw $t1, 4($s1) CC ALU Pipe Mem Pipe 1 lw t0 2 addi s1 lw t1 3 add t0 4 add t1 sw t0 5 sw t1 1/25/2017 CS/COE 1541 term 2174

11 I told you about the compiler, bro
What's wrong with the compiler doing this instruction scheduling? Well the code will run the same way every time, unlike branches. But the architecture could change, and updating compilers and recompiling code sucks. 1/25/2017 CS/COE 1541 term 2174

12 Oh no What about data dependencies? Oh dear lord.
What about pipeline flushes? I-Mem Ins. Decoder Register File D-Mem ALU 1/25/2017 CS/COE 1541 term 2174


Download ppt "Pipelining – Loop unrolling and Multiple Issue"

Similar presentations


Ads by Google