Pipelined Datapath and Control (Lecture #13) ECE 445 – Computer Organization The slides included herein were taken from the materials accompanying Computer Organization and Design, 4 th Edition, by Patterson and Hennessey, and were used with permission from Morgan Kaufmann Publishers.
Fall 2010ECE Computer Organization2 Material to be covered... Chapter 4: Sections 5 – 9, 13 – 14
Fall 2010ECE Computer Organization3 Performance of the Single-Cycle MIPS
Fall 2010ECE Computer Organization4
Fall 2010ECE Computer Organization5 Example: MIPS Clock Rate Determine the clock rate for the MIPS architecture, assuming the following: The MIPS is a Single Cycle Machine 1 clock cycle per instruction CPI = 1 Access time for memory units = 200 ps Operation time for ALU and adders = 100 ps Access time for register file = 50 ps
Fall 2010ECE Computer Organization6 Example: MIPS Clock Rate Instruction ClassFunctional Units used by the Instruction Class ALU InstructionInst. FetchRegisterALURegister Load WordInst. FetchRegisterALUMemoryRegister Store WordInst. FetchRegisterALUMemory BranchInst. FetchRegisterALU JumpInst. Fetch
Fall 2010ECE Computer Organization7 Example: MIPS Clock Rate Instruction ClassInstr Memory Register read ALU operation Data Memory Register write Total ALU Instruction ps Load Word ps Store Word ps Branch ps Jump ps
Fall 2010ECE Computer Organization8 Example: MIPS Clock Rate The clock cycle time for a machine with a single clock cycle per instruction will be determined by the longest instruction. In this example, the load word instruction requires 600 ps. The clock rate is then Clock rate = 1 / Clock Cycle Time Clock rate = 1 / 600 ps = 1.67 GHz
Fall 2010ECE Computer Organization9 Performance Issues Longest delay determines clock period Critical path: load word (lw) instruction Instruction memory register file ALU data memory register file Not feasible to vary clock period for different instructions Violates design principle Making the common case fast Improve performance by pipelining
Fall 2010ECE Computer Organization10 How does pipelining work?
Fall 2010ECE Computer Organization11 Pipelining Analogy Pipelined laundry: overlapping execution Parallelism improves performance §4.5 An Overview of Pipelining Four loads: Speedup = 8/3.5 = 2.3 Non-stop: Speedup = 2n/0.5n ≈ 4 = number of stages
Fall 2010ECE Computer Organization12 Objective: Keep all stages of the pipeline busy at all times.
Fall 2010ECE Computer Organization13 Pipelining: Improving Performance LatencyMax. Throughput Non-Pipelined2 hours0.5 Pipelined2 hours2 Latency = time from start of one load to the end of same load. Maximum Throughput = # of loads completed per hour. Assuming all stages of pipeline are busy at all times. Length of time for each load does not change.
Fall 2010ECE Computer Organization14 Pipelining: Improving Performance Pipelining improves performance by increasing instruction throughput, rather than decreasing execution time of an individual instruction.
Fall 2010ECE Computer Organization15 The MIPS Pipeline
Fall 2010ECE Computer Organization16 MIPS Pipeline Five stages, one step per stage – IF: Instruction fetch from memory – ID: Instruction decode & register read – EX: Execute operation or calculate address – MEM: Access memory operand – WB: Write result back to register
Fall 2010ECE Computer Organization17 MIPS Pipeline
Fall 2010ECE Computer Organization18 Pipeline Performance Assume time for stages is 100ps for register read or write 200ps for other stages Compare pipelined datapath with single-cycle datapath InstrInstr fetchRegister read ALU opMemory access Register write Total time lw200ps100 ps200ps 100 ps800ps sw200ps100 ps200ps 700ps R-format200ps100 ps200ps100 ps600ps beq200ps100 ps200ps500ps
Fall 2010ECE Computer Organization19 Pipeline Performance Single-cycle (T c = 800ps) Pipelined (T c = 200ps) Why is the clock period 800ps? Why is the clock period 200ps?
Fall 2010ECE Computer Organization20 Pipeline Speedup If all stages are balanced i.e., all take the same time Time between instructions pipelined = Time between instructions nonpipelined Number of stages If not balanced, speedup is less Speedup due to increased throughput Latency (time for each instruction) does not decrease
Fall 2010ECE Computer Organization21 Pipelining and ISA Design MIPS ISA designed for pipelining All instructions are 32-bits Easier to fetch and decode in one cycle c.f. x86: 1- to 17-byte instructions Few and regular instruction formats Can decode and read registers in one step Load/store addressing Can calculate address in 3 rd stage, access memory in 4 th stage Alignment of memory operands i.e. on word boundaries Memory access takes only one cycle
Fall 2010ECE Computer Organization22 Pipeline Summary Pipelining improves performance by increasing instruction throughput Executes multiple instructions in parallel Each instruction has the same latency Subject to hazards Structure, data, control Instruction set design affects complexity of pipeline implementation The BIG Picture hazards will be discussed in upcoming lectures
Fall 2010ECE Computer Organization23 MIPS Pipelined Datapath §4.6 Pipelined Datapath and Control
Fall 2010ECE Computer Organization24 Pipeline registers Need registers between stages To hold information produced in previous cycle Why?
Fall 2010ECE Computer Organization25 Pipeline Operation Cycle-by-cycle flow of instructions through the pipelined datapath “Single-clock-cycle” pipeline diagram Shows pipeline usage in a single cycle Highlight resources used “Multi-clock-cycle” diagram Graph of operation over time We’ll look at “single-clock-cycle” diagrams for load word and store word.
Fall 2010ECE Computer Organization26 IF for Load, Store, …
Fall 2010ECE Computer Organization27 ID for Load, Store, …
Fall 2010ECE Computer Organization28 EX for Load
Fall 2010ECE Computer Organization29 MEM for Load
Fall 2010ECE Computer Organization30 WB for Load Wrong register number Why?
Fall 2010ECE Computer Organization31 Corrected Datapath for Load
Fall 2010ECE Computer Organization32 EX for Store
Fall 2010ECE Computer Organization33 MEM for Store
Fall 2010ECE Computer Organization34 WB for Store
Fall 2010ECE Computer Organization35 Multi-Cycle Pipeline Diagram Form showing resource usage
Fall 2010ECE Computer Organization36 Multi-Cycle Pipeline Diagram Traditional form
Fall 2010ECE Computer Organization37 Single-Cycle Pipeline Diagram State of pipeline in a given cycle
Fall 2010ECE Computer Organization38 Questions?