Pipelining: Basic Concepts
1.1 Basic Concepts Consider a task that can be divided into k subtasks The k subtasks are executed on k different stages Each subtask requires one time unit The total execution time of the task is k time units Pipelining is to overlap the execution The k stages work in parallel on k different tasks Tasks enter/leave pipeline at the rate of one task per time unit
1.2 Synchronous Pipeline Uses clocked registers between stages Upon arrival of a clock edge … All registers hold the results of previous stages simultaneously The pipeline stages are combinational logic circuits It is desirable to have balanced stages Approximately equal delay in all stages Clock period is determined by the maximum stage delay
1.3 Pipeline Performance
Pipeline Performance (continue) The University of Adelaide, School of Computer Science 6 April 2019 Pipeline Performance (continue) Introduction Compare Single-Cycle, Multi-Cycle, Versus Pipelined Performance: Example: Single-Cycle, Multi-Cycle, Versus Pipelined performance Consider a 5-stage instruction execution pipeline. The operation time are: 200 ps for Memory 200 ps for ALU 150 ps for Register (read and write) Compare Single-Cycle, Multi-Cycle, Versus Pipelined performance assuming: 20% load, 10% store, 40% ALU ,and 30% branch. Instruction class Instruction fetch Register read ALU operation Data access Register write Total time Load word (lw) 200 ps 150 ps 900 ps Store word (sw) 750 ps R-format 700 ps Branch (beq) 550 ps Chapter 2 — Instructions: Language of the Computer
(continue)
(continue)
More Examples Problem ??: (?? marks) Consider executing the following code on the MIPS pipelined datapath: add $t5, $t6, $t8 add $t9, $t5, $t4 lw $t3, 100($t9) sub $t2, $t3, $t4 Using the following diagram for the MIPS pipeline, draw the pipeline execution diagram and show the forwarding paths needed to execute the above code while incorporating any stalls or forwarding to resolve the dependencies.
More Examples Problem ??: (?? marks) Given the following code sequence: LW $t2, 0($t1) Label1: BEQ $t2, $t0, Label2 # Not Taken once, then Taken LW $t3, 0($t2) BEQ $t3, $t0, Label1 # Taken ADD $t1, $t3, $t1 Label2: SW $t1, 0(St2) Assume that this sequence is executed on a pipelined processor with a 5-stage MIPS pipeline using forwarding and a predict-taken branch prediction method. Draw the pipeline execution diagram for this sequence, assuming that branch instructions are resolved in the EX stage. How many clock cycles are needed to execute this sequence? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 LW $t2, 0($t1) IF ID EX MEM WB BEQ $t2,$t0, Label2(NT) **** LW $t3, 0($t2) BEQ $t3, $t0, Label1(T) BEQ $t2, $t0, Label2(T) SW $t1, 0(St2)