Pipelining – Loop unrolling and Multiple Issue

Pipelining – Loop unrolling and Multiple Issue
CS/COE 1541 (term 2174) Jarrett Billingsley

Class Announcements I'll make your study guide tonight!
Honest! I swear! ...I say, every single time... It'll explain the format, topics, and have some practice questions. I will have your homework and quizzes graded by Monday. I will probably send the grades for them out earlier. I can send out the quiz solutions too so you can study. Project 1 comes after the exam. Probably have a month to do it. 1/25/2017 CS/COE 1541 term 2174

But first... Finishing branch prediction
1/25/2017 CS/COE 1541 term 2174

Loop unrolling 1/25/2017 CS/COE 1541 term 2174

Reducing branch frequency
The fastest code is the code that never runs. One way to make branches faster is to... not branch as much. Loop unrolling is a compiler technique to reduce the number of branches. It does this by duplicating the loop body, reducing the number of iterations needed. for(i = 0; i < 100; i++) a[i] = b[i] + c[i]; Original loop for(i = 0; i < 100; i += 2){ a[i] = b[i] + c[i]; a[i+1] = b[i+1] + c[i+1]; } Unrolled loop (2X) 1/25/2017 CS/COE 1541 term 2174

How far to unroll? The previous example doubled the code in the loop. Of course we can unroll 3X, 4X, 8X... what are the tradeoffs? Space vs. time is the big one. But memory today is big, network connections are fast... is this so much of a problem? Well...... Caching is the big bottleneck these days. The bigger the code is, the less of it will fit in the cache. This is bad, as we'll see. 1/25/2017 CS/COE 1541 term 2174

Multiple Issue (Superscalar) CPUs
1/25/2017 CS/COE 1541 term 2174

From CPI to IPC So far, the optimum CPI has been 1. One cycle to complete each instruction. But what if we could go below 1? (wat? half a cycle per instruction? well, no.) Instead of fetching just ONE instruction each cycle... Fetch two! Now we measure performance in IPC: Instructions per Cycle. 1/25/2017 CS/COE 1541 term 2174

Down the wrong pipe A common arrangement is to have multiple asymmetric pipelines in the CPU: for example, one to do ALU/Branches and one to do loads and stores. ALU Pipe I-Mem Ins. Decoder Register File D-Mem ALU Memory Pipe 1/25/2017 CS/COE 1541 term 2174

Keeping the pipelines full
It's now up to the compiler (once again!) to schedule instructions in such a way that the pipelines are well-utilized. lw $t0, 0($s1) lw $t1, -4($s1) addi $s1, $s1, -8 add $t0, $t0, $s2 add $t1, $t1, $s2 sw $t0, 8($s1) sw $t1, 4($s1) CC ALU Pipe Mem Pipe 1 lw t0 2 addi s1 lw t1 3 add t0 4 add t1 sw t0 5 sw t1 1/25/2017 CS/COE 1541 term 2174

I told you about the compiler, bro
What's wrong with the compiler doing this instruction scheduling? Well the code will run the same way every time, unlike branches. But the architecture could change, and updating compilers and recompiling code sucks. 1/25/2017 CS/COE 1541 term 2174

Oh no What about data dependencies? Oh dear lord.
What about pipeline flushes? I-Mem Ins. Decoder Register File D-Mem ALU 1/25/2017 CS/COE 1541 term 2174

Pipelining – Loop unrolling and Multiple Issue

Similar presentations

Presentation on theme: "Pipelining – Loop unrolling and Multiple Issue"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Pipelining – Loop unrolling and Multiple Issue

Similar presentations

Presentation on theme: "Pipelining – Loop unrolling and Multiple Issue"— Presentation transcript:

Similar presentations

About project

Feedback