Ch6a- 2 EE/CS/CPE Computer Organization Seattle Pacific University Automobile Manufacturing 1. Build frame. 60 min. 2. Add engine. 50 min. 3. Build body. 80 min. 4. Paint. 40 min. 5. Finish.80 min. 310 min. Latency: Time from start to finish for one car. Throughput: Number of finished cars per time unit. 1 car/310 min = 0.19 cars/hour 310 minutes per car. Issues: How can we make the process better by adding more workers? (smaller is better) (larger is better) 6.1
Ch6a- 3 EE/CS/CPE Computer Organization Seattle Pacific University An Assembly line Short stages can’t produce faster than one car/80 min or a backlog will occur at longer stages. 80 Latency: 400 min/car Throughput: 4 cars/640 min (1 car/160 min) time Will approach 1 car/80 min as time goes on
Ch6a- 4 EE/CS/CPE Computer Organization Seattle Pacific University Applying Assembly Lines to CPUs The single-cycle design did everything “at once” Can we break the single-cycle design up into stages? Use the multi-cycle design to help us decide what can go together 6.1 Issues: Why not base the design on multi-cycle? Car assembly works well. Will it be so easy to do the same technique to a CPU?
Ch6a- 5 EE/CS/CPE Computer Organization Seattle Pacific University Instruction Memory Data Memory Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Sh. Left sign extend PC Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A 0 1 Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] 1 0 Instr. Fetch, PC=PC+4 Instr. Decode Register Fetch Execute, Address Calc. Memory Reg. Write- back Breaking up the Single-Cycle Datapath 6.2 Stages from multi-cycle design
Ch6a- 6 EE/CS/CPE Computer Organization Seattle Pacific University Instruction Memory Data Memory Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Sh. Left sign extend PC Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A 0 1 Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] 1 0 Instr. Fetch, PC=PC+4 Instr. Decode Register Fetch Execute, Address Calc. Memory Reg. Write- back The Key - Pipeline Registers 6.2 clock PC+4 If only one instruction is processed at a time, this is similar to multi-cycle
Ch6a- 7 EE/CS/CPE Computer Organization Seattle Pacific University Instruction Memory Data Memory Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Sh. Left sign extend PC Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A 0 1 Imm: [15-0] Rs:[25-21] Rt:[20-16] Rd: [15-11] 1 0 Example: ADD Instruction 6.2 PC+4 Writes the correct data to the wrong register In general, arrows that go backwards across pipeline stages may be bad news... A new instruction enters the IF stage each cycle ADD $Rd, $Rs, $Rt
Ch6a- 8 EE/CS/CPE Computer Organization Seattle Pacific University Instruction Memory Data Memory Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Sh. Left sign extend PC Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Imm: [15-0] Rs:[25-21] Rt:[20-16] 0 1 Rd: [15-11] 1 0 Correcting the Write Register Problem 6.2 PC+4 Rt:[20-16] Rd:[15-11]
Ch6a- 9 EE/CS/CPE Computer Organization Seattle Pacific University Assembly-line Control Signals In an assembly line, the manufacturing instructions can be attached to the car. The instructions then move along with the car. F: Standard E: 135 HP B: 2-door P: Green F: Leather E: 190 HP B: 4-door P: Blue F: Cotton B: 2-door P: Lavender F: Leather P: Green F: Vinyl F: Leather 2 By separating the control signals by stages, only the signals needed for the current stage must be decoded. All signals for later stages must be passed along. 6.1
Ch6a- 10 EE/CS/CPE Computer Organization Seattle Pacific University Instruction Memory Data Memory Add 4 Read address Instruction [31-0] Read address Write address Write data Read data Result Zero Result Sh. Left sign extend PC Read reg. num A Registers Read reg num B Write reg num Write reg data Read reg data A Read reg data B Read reg num A Imm: [15-0] Rs:[25-21] Rt:[20-16] 1 0 The Pipelined Control Logic 6.3 PC Rt:[20-16] Rd:[15-11] ALU control ALUOp RegWrite MemToReg MemWrite MemRead ALUSrc PCSrc RegDest Op:[31-26] W M E Control W M W Branch
Ch6a- 11 EE/CS/CPE Computer Organization Seattle Pacific University How’d we do? Compared to Single-cycle 5 stages --> Potentially 5x speedup Not likely Stages won’t all be equally long Pipeline registers will cause some delays Latency --> Greater than in single-cycle design More complexity, but nicely divided up Compared to Multi-cycle Smaller speedup since some multi-cycle instructions are shorter Complexity may be simpler (but wait…)