10/11: Lecture Topics Execution cycle Introduction to pipelining

Name: 10/11: Lecture Topics Execution cycle Introduction to pipelining
Uploaded: 2017-12-12T16:48:00+00:00
Duration: PTM11S3
Channel: Poppy Copeland
Description: 10/11: Lecture Topics Execution cycle Introduction to pipelining

10/11: Lecture Topics Execution cycle Introduction to pipelining
Data hazards

Office Hours Changing Th 8:30-9:30 to Mo 2-3 New office hours are
Tu 2:30-3:30

Execution Cycle Five steps to executing an instruction: 1. Fetch
IF ID EX MEM WB Five steps to executing an instruction: 1. Fetch Get the next instruction to execute from memory onto the chip 2. Decode Figure out what the instruction says to do Get values from registers 3. Execute Do what the instruction says; for example, On a memory reference, add up base and offset On an arithmetic instruction, do the math

More Execution Cycle 4. Memory Access 5. Write back IF ID EX MEM WB
If it’s a load or store, access memory If it’s a branch, replace the PC with the destination address Otherwise do nothing 5. Write back Place the result of the operation in the appropriate register

add $s0, $s1, $s2 IF get instruction at PC from memory
it’s ID determine what … is … is add get contents of $s1 and $s2 ($s1=7, $s2=12) EX add 7 and 12 = 19 MEM do nothing WB store 19 in $s0

lw $t2, 16($s0) IF get instruction at PC from memory
it’s ID determine what is is lw get contents of $s0 and $t2 (we don’t know that we don’t care about $t2) $s0=0x200D1C00, $t2=77763 EX add 16 to 0x200D1C00 = 0x200D1C10 MEM load the word stored at 0x200D1C10 WB store loaded value in $t2

Latency & Throughput IF ID EX MEM WB 1 2 3 4 5 6 7 8 9 10 inst 1 inst 2 Latency—the time it takes for an individual instruction to execute What’s the latency for this implementation? Throughput—the number of instructions that execute per minute What’s the throughput of this implementation?

A case for pipelining The functional units are being underutilized
the instruction fetcher is used once every five clock cycles why not have it fetch a new instruction every clock cycle? Pipelining overlaps the stages of execution so every stage has something to due each cycle A pipeline with N stages could speedup by N times, but each stage must take the same amount of time each stage must always have work to do Also, latency for each instruction may go up, but why don’t we care?

Unpipelined Assembly Line
What is the latency of this assembly line, i.e. for how many cycles is the plane on the assembly line? What is the throughput of this assembly line, i.e. how many planes are manufactured each cycle?

Pipelined Assembly Line
The assembly line has 5 stages If a plane isn’t ready to go to the next stage then the pipeline stalls that stage and all stages before it freeze The gap in the assembly line is known as a bubble

Pipelined Analysis What is the latency? What is the throughput?
What is the speed up? (Speed up = Old Time / New Time)

Pipeline Example 1 2 3 4 5 6 7 8 9 10 IF ID EX MEM WB
add $s0, $s1, $s2 sub $s3, $s2, $s3 lw $s2, 20($t0) sw $s0, 16($s1) and $t1, $t2, $t3 IF ID EX MEM WB

Pipelined Xput and Latency
1 2 3 4 5 6 7 8 9 IF ID EX MEM WB inst 1 inst 2 inst 3 inst 4 inst 5 What’s the throughput of this implementation? What’s the latency of this implementation?

Data Hazards What happens in the following code?
IF ID EX MEM WB add $s0, $s1, $s2 IF ID EX MEM WB add $s4, $s3, $s0 $s0 is read here $s0 is written here This is called as a data dependency When it causes a pipeline stall it is called a data hazard

Solution: Stall Stall the pipeline until the result is available
add s0,s1,s2 IF ID EX MEM WB add s4,s3,s0 IF stall ID EX MEM WB Stall the pipeline until the result is available

Solution: Read & Write in same Cycle
Write the register in the first part of the clock cycle Read it in the second part of the clock cycle add s0,s1,s2 add s4,s3,s0 IF stall ID EX MEM WB write $s0 read $s0 A stall of two cycles is still required

Solution: Forwarding The value of $s0 is known after cycle 3 (after the first instruction’s EX stage) The value of $s0 isn’t needed until cycle 4 (before the second instruction’s EX stage) If we forward the result there isn’t a stall add s0,s1,s2 add s4,s3,s0 IF ID EX MEM WB

Another data hazard What if the first instruction is lw?
lw s0,0(s2) add s4,s3,s0 IF ID EX MEM WB s0 isn’t known until after the MEM stage We can’t forward back into the past Either stall or reorder instructions

Solutions to the lw hazard
We can stall for one cycle, but we hate to stall lw s0,0(s2) add s4,s3,s0 IF ID EX MEM WB Try to execute an unrelated instruction between the two instructions lw s0,0(s2) IF ID EX MEM WB IF ID EX MEM WB sub t4,t2,t3 IF ID EX MEM WB add s4,s3,s0 sub t4,t2,t3

Reordering Instructions
Reordering instructions is a common technique for avoiding pipeline stalls Sometimes the compiler does the reordering statically Almost all modern processors do this reordering dynamically they can see several instructions and they execute anyone that has no dependency this is known as out-of-order execution and is very complicated to implement

Control Hazards Branch instructions cause control hazards because we don’t know which instruction to execute next IF ID EX MEM WB bne $s0, $s1, next add $s4, $s3, $s0 ... IF ID EX MEM WB next: sub $s4, $s3, $s0 do we fetch add or sub? we don’t know until here

10/11: Lecture Topics Execution cycle Introduction to pipelining

Similar presentations

Presentation on theme: "10/11: Lecture Topics Execution cycle Introduction to pipelining"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

10/11: Lecture Topics Execution cycle Introduction to pipelining

Similar presentations

Presentation on theme: "10/11: Lecture Topics Execution cycle Introduction to pipelining"— Presentation transcript:

Similar presentations

About project

Feedback