Lecture 15: Pipelined Datapath Soon Tee Teoh CS 147
Control and Datapath (Figure 9-15, page 471) Branch Control VCNZVCNZ PC Extend Address Instruction Memory Instruction Instruction Decoder Zero fill 1 0 MUX B D Register File A B Function Unit F Data in Address Data Memory Data Out 0 1 MUX D MB MD MW FS V C N Z Constant in D B A M F M R M P J B A A A B S D W W L B C P J B L B C RW DA AA BA
Instructions in this Architecture Instruction Format Description Add RD, RA, RB R[DR] R[SA] + R[SB] OR RD, RA, RB R[DR] R[SA] v R[SB] Shift Right RD, RB R[DR] sr R[SB] Load Immediate RD, OP R[DR] zf OP Load RD, RA R[DR] M[SA] Store RD, RA M[SA] R[SB] Branch on Zero RA, AD if (R[SA]==0) PC PC + se AD Jump RA PC R[SA] zf means zero-fill se means sign-extend sr means shift right
Timing Assumptions Branch Control: 1ns PC Read: 1ns PC Write: 2ns Instruction Memory Read: 3ns Instruction Decoder: 1ns Extend: 1ns Zero Fill: 1ns Register File Read: 1ns Register File Write: 2ns 2-to-1 MUX: 1ns Function Unit: 6ns Memory Read: 3ns Memory Write: 3ns Note: “Register File Read” refers to the propagation delay time of a register. “Register File Write” refers to the set-up time of a register.
Delay for each component in Control and Datapath Branch Control VCNZVCNZ PC Extend Address Instruction Memory Instruction Instruction Decoder Zero fill 1 0 MUX B D Register File A B Function Unit F Data in Address Data Memory Data Out 0 1 MUX D MB MD MW FS V C N Z Constant in D B A M F M R M P J B A A A B S D W W L B C P J B L B C RW DA AA BA 3ns 1ns 6ns 1ns 2ns 1ns 3ns 2ns
Time taken in Longest Path Branch Control VCNZVCNZ PC Extend Address Instruction Memory Instruction Instruction Decoder Zero fill 1 0 MUX B D Register File A B Function Unit F Data in Address Data Memory Data Out 0 1 MUX D MB MD MW FS V C N Z Constant in D B A M F M R M P J B A A A B S D W W L B C P J B L B C RW DA AA BA * * 3ns 1ns 6ns 1ns 2ns Total time for ADD instruction = = 16ns 1ns 2ns 1ns 3ns
Pipelining Concept Separate laundry process into 3 steps wash/rinse/dry. Each step uses a different machine. Suppose you have many loads. While first load is drying, second load is rinsing, and first load is washing. Simultaneously utilizing all resources. Like Henry Ford’s assembly line
Pipelined Computer Architecture Separate the process into separate parts For our example, we separate into 4 parts: –1. Instruction Fetch –2. Decode, Operand Fetch –3. Execute –4. Write Back Insert new registers between each stage to hold some control signals
Pipelined Computer (Figure 11-4 pg 550, ignore branch control for now) PC Address Instruction Memory Instruction Instruction Decoder Zero fill 1 0 MUX B Register File A B Function Unit F Address Data Memory Data Out 0 1 MUX D MB MD MW FS B A M A A B AA BA IR Data In Address Data Memory D Register File FS MW DA MD RW RW DA Write Read/ Write Read IF DOF EX WB Stages: Instruction Fetch, Decode/Operand Fetch, Execute, Write Back registers registers/ memory
Timing in Pipelined Computer (assume register read is 1ns and register write is 1ns) PC Address Instruction Memory Instruction Instruction Decoder Zero fill 1 0 MUX B Register File A B Function Unit F Address Data Memory Data Out 0 1 MUX D MB MD MW FS B A M A A B AA BA IR Data In Address Data Memory D Register File FS MW DA MD RW RW DA Write Read/ Write Read IF DOF EX WB Stages: Instruction Fetch, Decode/Operand Fetch, Execute, Write Back registers registers/ memory 5ns 8ns 4ns
Clock cycle time In non-pipelined computer, clock cycle needs to be at least 16ns. Clock frequency = 1/16ns = 62.5 MHz In pipelined computer, clock cycle needs to be at least 8ns, the time for the slowest stage. Clock frequency = 1/8ns = 125 MHz If we need to execute 7 instructions using the non-pipelined computer, we need 7 x 16 = 112 ns. How much time do we need using the pipelined computer?
Pipelined Computer timing IF DOF EX WB 8ns We need 80ns. In general, # cycles needed = # instructions + # stages - 1 Instruction 2
Pipelined Computer timing IF DOF EX WB Filling: Not all stages active All stages active Emptying: Not all stages active 8ns